Pathway databases

From "A B C"
Jump to navigation Jump to search

Pathway and Network Databases


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


This page discusses BioPax: the abstraction standard for the interchange of pathway and network information, and on the databases that curate and store biological pathways. Two of the most important databases KEGG and BioCyc have their own pages here.



 

Introductory reading

Ooi et al. (2010) Biomolecular pathway databases. Methods Mol Biol 609:129-44. (pmid: 20221917)

PubMed ] [ DOI ] From the database point of view, biomolecular pathways are sets of proteins and other biomacromolecules that represent spatio-temporally organized cascades of interactions with the involvement of low-molecular compounds and are responsible for achieving specific phenotypic biological outcomes. A pathway is usually associated with certain subcellular compartments. In this chapter, we analyze the major public biomolecular pathway databases. Special attention is paid to database scope, completeness, issues of annotation reliability, and pathway classification. In addition, systems for information retrieval, tools for mapping user-defined gene sets onto the information in pathway databases, and their typical research applications are reviewed. Whereas today, pathway databases contain almost exclusively qualitative information, the desired trend is toward quantitative description of interactions and reactions in pathways, which will gradually enable predictive modeling and transform the pathway databases into analytical workbenches.


 

Contents

 

Standards

Demir et al. (2010) The BioPAX community standard for pathway data sharing. Nat Biotechnol 28:935-42. (pmid: 20829833)

PubMed ] [ DOI ] Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.


 

Databases

Bader et al. (2006) Pathguide: a pathway resource list. Nucleic Acids Res 34:D504-6. (pmid: 16381921)

PubMed ] [ DOI ] Pathguide: the Pathway Resource List (http://pathguide.org) is a meta-database that provides an overview of more than 190 web-accessible biological pathway and network databases. These include databases on metabolic pathways, signaling pathways, transcription factor targets, gene regulatory networks, genetic interactions, protein-compound interactions, and protein-protein interactions. The listed databases are maintained by diverse groups in different locations and the information in them is derived either from the scientific literature or from systematic experiments. Pathguide is useful as a starting point for biological pathway analysis and for content aggregation in integrated biological information systems.


   

Further reading and resources

Mao et al. (2012) CINPER: an interactive web system for pathway prediction for prokaryotes. PLoS ONE 7:e51252. (pmid: 23236458)

PubMed ] [ DOI ] We present a web-based network-construction system, CINPER (CSBL INteractive Pathway BuildER), to assist a user to build a user-specified gene network for a prokaryotic organism in an intuitive manner. CINPER builds a network model based on different types of information provided by the user and stored in the system. CINPER's prediction process has four steps: (i) collection of template networks based on (partially) known pathways of related organism(s) from the SEED or BioCyc database and the published literature; (ii) construction of an initial network model based on the template networks using the P-Map program; (iii) expansion of the initial model, based on the association information derived from operons, protein-protein interactions, co-expression modules and phylogenetic profiles; and (iv) computational validation of the predicted models based on gene expression data. To facilitate easy applications, CINPER provides an interactive visualization environment for a user to enter, search and edit relevant data and for the system to display (partial) results and prompt for additional data. Evaluation of CINPER on 17 well-studied pathways in the MetaCyc database shows that the program achieves an average recall rate of 76% and an average precision rate of 90% on the initial models; and a higher average recall rate at 87% and an average precision rate at 28% on the final models. The reduced precision rate in the final models versus the initial models reflects the reality that the final models have large numbers of novel genes that have no experimental evidences and hence are not yet collected in the MetaCyc database. To demonstrate the usefulness of this server, we have predicted an iron homeostasis gene network of Synechocystis sp. PCC6803 using the server. The predicted models along with the server can be accessed at http://csbl.bmb.uga.edu/cinper/.