At the cellular level, life is a network of molecular reactions. In Reactome, these processes are systematically described in molecular detail to generate an ordered network of molecular transformations (Fabregat et al. 2015). This amounts to millions of interconnected terms naturally forming a graph of biological knowledge. The Reactome Graph provides an intuitive way for data retrieval as well as interpretation and analysis of pathway knowledge.
Retrieving, and especially analysing such complex data becomes tedious when using relational databases. Queries across the pathway knowledgebase are composed by a number of expensive join operations resulting in poor performance and a hard-to-maintain project. Due to the schema-based approach, relational databases are limited in how information is stored and thus are difficult to scale for new requirements. In order to overcome these problems the Reactome database is imported in Neo4j, creating one large interconnected graph. Graph database technology is an effective tool for modelling highly connected data.
Storing Reactome data in this form has many benefits. No denormalisation is required so data can be stored in its natural form. Nodes in the vicinity of a starting point can quickly be traversed giving the user the possibility to not only retrieve data but also perform fast analysis of these neighbour networks. Thus, knowledge that previously was unavailable due to the limitations of relational data storage can now be retrieved.
To easily access and benefit from the graph database, we have developed the GraphCore; an open source library implemented in Java. This project uses Spring Data Neo4j, which provides an automatic object graph mapping on top of Neo4j and tightly integrates with other parts of the spring framework used across the project.
A backup of the Reactome Graph Database is available in our download data section. It is possible to use it in your local environment by following these steps:
- Download and install Neo4j.
- Download the Reactome Graph Database backup for the latest data release.
- Install the Reactome Graph Database. Please follow the instructions in the Neo4j operations tutorial, specifically the sections “file locations” and “restoring a backup“.
- If the standard procedure has been followed, the graph database should be accessible via the Neo4j browser at your localhost.
Great! Now you have your own copy of the current version of the Reactome data content in your instance of Neo4j, so let’s see how you can take advantage of it either with direct queries to the graph database or using our GraphCore java library.
Directly querying to the Reactome Graph Database
The Neo4j browser offers a nice interface to submit your own queries to the graph database. We recommend using this platform for the first interaction with the Reactome Graph database to see how easy is to use the Cypher query language.
Please refer to our extracting pathway participating molecules tutorial to introduce yourself to using Cypher to query the Reactome Graph Database.
Using the Reactome GraphCore java library
The Reactome GraphCore java library will soon be available! Follow us on Twitter to get the latest news about it!
The API for the Reactome GraphCore java library will soon be available! Follow us on Twitter to get the latest news about it!