BIN-FUNC-Databases
Molecular Function Databases
  
Keywords: EC numbers, BioCyc, Reactome, Wikigenes, KEGG
Contents
Abstract
This unit provides a brief introduction to key data resources for functional data.
This unit ...
Prerequisites
You need to complete the following units before beginning this one:
Objectives
This unit will ...
- ... introduce key data resources for functional data.
Outcomes
After working through this unit you ...
- ... can access a variety of databases with functional information to retrieve single molecule, pathway and network information.
Deliverables
- Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
- Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
- Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.
Contents
Task:
- Read the introductory notes on databases that provide gene function information.
E.C. Numbers
Obvious function annotations for individual genes can be derived from enzymatic activity. The Enzyme Commission publishes E.C. codes for this purpose and many database annotate E.C. codes if such information is available.
Task:
- Read theWikipedia article on E.C. numbers for a first overview.
GO (Gene Ontology)
Not here. GO is so important, it has its own learning unit.
Pathway Databases
Task:
MetaCyc collects BioCyc metabolic pathway databases. Read:
| Caspi et al. (2016) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 44:D471-80. (pmid: 26527732) | 
| [ PubMed ] [ DOI ] The MetaCyc database (MetaCyc.org) is a freely accessible comprehensive database describing metabolic pathways and enzymes from all domains of life. The majority of MetaCyc pathways are small-molecule metabolic pathways that have been experimentally determined. MetaCyc contains more than 2400 pathways derived from >46,000 publications, and is the largest curated collection of metabolic pathways. BioCyc (BioCyc.org) is a collection of 5700 organism-specific Pathway/Genome Databases (PGDBs), each containing the full genome and predicted metabolic network of one organism, including metabolites, enzymes, reactions, metabolic pathways, predicted operons, transport systems, and pathway-hole fillers. The BioCyc website offers a variety of tools for querying and analyzing PGDBs, including Omics Viewers and tools for comparative analysis. This article provides an update of new developments in MetaCyc and BioCyc during the last two years, including addition of Gibbs free energy values for compounds and reactions; redesign of the primary gene/protein page; addition of a tool for creating diagrams containing multiple linked pathways; several new search capabilities, including searching for genes based on sequence patterns, searching for databases based on an organism's phenotypes, and a cross-organism search; and a metabolite identifier translation service. | 
- Visit MetaCyc
- Click on "Change Organism Database" and select Saccharomyces cerevisiae.
- Select Metbolism → Cellular overview and exlore the pathway map.
Task:
Reactome is a very large, well curated knowledgebase of human pathways. Read the current overview of Reactome database offerings:
| Fabregat et al. (2016) The Reactome pathway Knowledgebase. Nucleic Acids Res 44:D481-7. (pmid: 26656494) | 
| [ PubMed ] [ DOI ] The Reactome Knowledgebase (www.reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations-an extended version of a classic metabolic map, in a single consistent data model. Reactome functions both as an archive of biological processes and as a tool for discovering unexpected functional relationships in data such as gene expression pattern surveys or somatic mutation catalogues from tumour cells. Over the last two years we redeveloped major components of the Reactome web interface to improve usability, responsiveness and data visualization. A new pathway diagram viewer provides a faster, clearer interface and smooth zooming from the entire reaction network to the details of individual reactions. Tool performance for analysis of user datasets has been substantially improved, now generating detailed results for genome-wide expression datasets within seconds. The analysis module can now be accessed through a RESTFul interface, facilitating its inclusion in third party applications. A new overview module allows the visualization of analysis results on a genome-wide Reactome pathway hierarchy using a single screen page. The search interface now provides auto-completion as well as a faceted search to narrow result lists efficiently. | 
- Visit Reactome
- Click on "Browse Pathways".
- In the left-hand menu, expand the "Cell Cycle" topic.
- Click on "Cell Cycle, Mitotic", then on the round button with the data-model icon that has open pathway diagram as the hover-text.
- Click on "MITOTIC G1/G1-S PHASES".
- Read the definition, then click on the "Molecules" tab - you can get lists of participating molecules for download.
- Next click on the "Expression" tab. This gets you tissue-specific expression levels.
Task:
Read the current overview of the WikiPathway database offerings:
| Kutmon et al. (2016) WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res 44:D488-94. (pmid: 26481357) | 
| [ PubMed ] [ DOI ] WikiPathways (http://www.wikipathways.org) is an open, collaborative platform for capturing and disseminating models of biological pathways for data visualization and analysis. Since our last NAR update, 4 years ago, WikiPathways has experienced massive growth in content, which continues to be contributed by hundreds of individuals each year. New aspects of the diversity and depth of the collected pathways are described from the perspective of researchers interested in using pathway information in their studies. We provide updates on extensions and services to support pathway analysis and visualization via popular standalone tools, i.e. PathVisio and Cytoscape, web applications and common programming environments. We introduce the Quick Edit feature for pathway authors and curators, in addition to new means of publishing pathways and maintaining custom pathway collections to serve specific research topics and communities. In addition to the latest milestones in our pathway collection and curation effort, we also highlight the latest means to access the content as publishable figures, as standard data files, and as linked data, including bulk and programmatic access. | 
- Visit WikiPathways
- Enter Mbp1into the search box. Explore the pathway you retrieve. Note that the individual genes are clickable and display links to other pathways for the respective proteins. For example Ubiquitin (Uba) is linked to twelve other pathways
Task:
Read the current overview of KEGG database offerings:
| Kanehisa et al. (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45:D353-D361. (pmid: 27899662) | 
| [ PubMed ] [ DOI ] KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an encyclopedia of genes and genomes. Assigning functional meanings to genes and genomes both at the molecular and higher levels is the primary objective of the KEGG database project. Molecular-level functions are stored in the KO (KEGG Orthology) database, where each KO is defined as a functional ortholog of genes and proteins. Higher-level functions are represented by networks of molecular interactions, reactions and relations in the forms of KEGG pathway maps, BRITE hierarchies and KEGG modules. In the past the KO database was developed for the purpose of defining nodes of molecular networks, but now the content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases. The newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined. Furthermore, the DISEASE and DRUG databases have been improved by systematic analysis of drug labels for better integration of diseases and drugs with the KEGG molecular networks. KEGG is moving towards becoming a comprehensive knowledge base for both functional interpretation and practical application of genomic information. | 
- Visit KEGG and navigate to the KEGG Pathway Database.
- Click on "Organism", enter sceinto the organism code box, and click Select.
- Enter Mbp1into the keyword box and click go.
- Consider the pathway map. The KEGG maps are generally considered to be the gold standard in this field.
- Click on Mbp1 to open the annotation data for this protein ... you can easily fetch data for all other proteins in the map in the same way.
Further reading, links and resources
Notes
Self-evaluation
If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
About ... 
 
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2017-10-07
Version:
- 1.0
Version history:
- 1.0 First live version
- 0.1 First stub
 This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.
 This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.
