Molecular Function Databases

Contents
Questions, comments
References

Expected Preparations:

	[BIN] Databases		[BIN-FUNC] Concepts
	The units listed above are part of this course and contain important preparatory material.

Keywords: EC numbers; BioCyc; Reactome; Wikigenes; KEGG

Objectives:

This unit will …

… introduce key data resources for functional data.

Outcomes:

After working through this unit you …

… can access a variety of databases with functional information to retrieve single molecule, pathway and network information.

Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.

Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these.

Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Evaluation:

NA: This unit is not evaluated for course marks.

This unit provides a brief introduction to key data resources for functional data.

Task…

Read the introductory notes on databases that provide gene function informationPDF.

E.C. Numbers

Obvious function annotations for individual genes can be derived from enzymatic activity. The Enzyme Commission publishes E.C. codes for this purpose and many database annotate E.C. codes if such information is available.

Task…

Read theWikipedia article on E.C. numbers(W) for a first overview.

GO (Gene Ontology)

Not here. GO is so important, it has its own learning unit.

Pathway Databases

Task…

MetaCyc collects BioCyc metabolic pathway databases. Read:

Caspi, Ron et al.. (2020). “The MetaCyc database of metabolic pathways and enzymes - a 2019 update”. Nucleic Acids Research 48(D1):D445–D453 .
[PMID: 31586394] [DOI: 10.1093/nar/gkz862]

Abstract …

MetaCyc (MetaCyc.org) is a comprehensive reference database of metabolic pathways and enzymes from all domains of life. It contains 2749 pathways derived from more than 60 000 publications, making it the largest curated collection of metabolic pathways. The data in MetaCyc are evidence-based and richly curated, resulting in an encyclopedic reference tool for metabolism. MetaCyc is also used as a knowledge base for generating thousands of organism-specific Pathway/Genome Databases (PGDBs), which are available in BioCyc.org and other genomic portals. This article provides an update on the developments in MetaCyc during September 2017 to August 2019, up to version 23.1. Some of the topics that received intensive curation during this period include cobamides biosynthesis, sterol metabolism, fatty acid biosynthesis, lipid metabolism, carotenoid metabolism, protein glycosylation, antibiotics and cytotoxins biosynthesis, siderophore biosynthesis, bioluminescence, vitamin K metabolism, brominated compound metabolism, plant secondary metabolism and human metabolism. Other additions include modifications to the GlycanBuilder software that enable displaying glycans using symbolic representation, improved graphics and fonts for web displays, improvements in the PathoLogic component of Pathway Tools, and the optional addition of regulatory information to pathway diagrams.

Visit MetaCyc
Click on “Change Organism Database” (top right, under the login section) and select Saccharomyces cerevisiae.
Select Metabolism ▶ Cellular overview and exlore the pathway map.

Task…

Reactome is a very large, well curated knowledgebase of human pathways. Read the current overview of Reactome database offerings:

Jassal, Bijay et al.. (2020). “The reactome pathway knowledgebase”. Nucleic Acids Research 48(D1):D498–D503 .
[PMID: 31691815] [DOI: 10.1093/nar/gkz1031]

Abstract …

The Reactome Knowledgebase (https://reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations in a single consistent data model, an extended version of a classic metabolic map. Reactome functions both as an archive of biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. To extend our ability to annotate human disease processes, we have implemented a new drug class and have used it initially to annotate drugs relevant to cardiovascular disease. Our annotation model depends on external domain experts to identify new areas for annotation and to review new content. New web pages facilitate recruitment of community experts and allow those who have contributed to Reactome to identify their contributions and link them to their ORCID records. To improve visualization of our content, we have implemented a new tool to automatically lay out the components of individual reactions with multiple options for downloading the reaction diagrams and associated data, and a new display of our event hierarchy that will facilitate visual interpretation of pathway analysis results.

Visit Reactome
Click on “Pathway Browser”.
In the left-hand menu, expand the “Cell Cycle” topic (click on the ⊞ ).
Click on “Cell Cycle, Mitotic”, then double-click on the blueish icon that has a hovertext: “Pathway and an enhanced diagram”.
Click on “MITOTIC G1/G1-S PHASES”.
Read the definition, then click on the “Molecules” tab - you can get lists of participating molecules for download.
Next click on the “Expression” tab. This gets you tissue-specific expression levels.

Task…

Read the current overview of the WikiPathway database offerings:

Slenter, Denise N et al.. (2018). “WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research”. Nucleic Acids Research 46(D1):D661–D667 .
[PMID: 29136241] [DOI: 10.1093/nar/gkx1064]

Abstract …

WikiPathways (wikipathways.org) captures the collective knowledge represented in biological pathways. By providing a database in a curated, machine readable way, omics data analysis and visualization is enabled. WikiPathways and other pathway databases are used to analyze experimental data by research groups in many fields. Due to the open and collaborative nature of the WikiPathways platform, our content keeps growing and is getting more accurate, making WikiPathways a reliable and rich pathway database. Previously, however, the focus was primarily on genes and proteins, leaving many metabolites with only limited annotation. Recent curation efforts focused on improving the annotation of metabolism and metabolic pathways by associating unmapped metabolites with database identifiers and providing more detailed interaction knowledge. Here, we report the outcomes of the continued growth and curation efforts, such as a doubling of the number of annotated metabolite nodes in WikiPathways. Furthermore, we introduce an OpenAPI documentation of our web services and the FAIR (Findable, Accessible, Interoperable and Reusable) annotation of resources to increase the interoperability of the knowledge encoded in these pathways and experimental omics data. New search options, monthly downloads, more links to metabolite databases, and new portals make pathway knowledge more effortlessly accessible to individual researchers and research communities.

Visit WikiPathways
Enter Mbp1 into the search box. Explore the pathway you retrieve. Note that the individual genes are clickable and display links to other pathways for the respective proteins. For example Ubiquitin (Uba1) is linked to thirteen other pathways

Task…

Read the two current overviews of the KEGG database:

Kanehisa, Minoru et al.. (2017). “KEGG: new perspectives on genomes, pathways, diseases and drugs”. Nucleic Acids Research 45(D1):D353–D361 .
[PMID: 27899662] [DOI: 10.1093/nar/gkw1092]

Abstract …

KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an encyclopedia of genes and genomes. Assigning functional meanings to genes and genomes both at the molecular and higher levels is the primary objective of the KEGG database project. Molecular-level functions are stored in the KO (KEGG Orthology) database, where each KO is defined as a functional ortholog of genes and proteins. Higher-level functions are represented by networks of molecular interactions, reactions and relations in the forms of KEGG pathway maps, BRITE hierarchies and KEGG modules. In the past the KO database was developed for the purpose of defining nodes of molecular networks, but now the content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases. The newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined. Furthermore, the DISEASE and DRUG databases have been improved by systematic analysis of drug labels for better integration of diseases and drugs with the KEGG molecular networks. KEGG is moving towards becoming a comprehensive knowledge base for both functional interpretation and practical application of genomic information.

Kanehisa, Minoru et al.. (2019). “New approach for understanding genome variations in KEGG”. Nucleic Acids Research 47(D1):D590–D595 .
[PMID: 30321428] [DOI: 10.1093/nar/gky962]

Abstract …

KEGG (Kyoto Encyclopedia of Genes and Genomes; https://www.kegg.jp/ or https://www.genome.jp/kegg/) is a reference knowledge base for biological interpretation of genome sequences and other high-throughput data. It is an integrated database consisting of three generic categories of systems information, genomic information and chemical information, and an additional human-specific category of health information. KEGG pathway maps, BRITE hierarchies and KEGG modules have been developed as generic molecular networks with KEGG Orthology nodes of functional orthologs so that KEGG pathway mapping and other procedures can be applied to any cellular organism. Unfortunately, however, this generic approach was inadequate for knowledge representation in the health information category, where variations of human genomes, especially disease-related variations, had to be considered. Thus, we have introduced a new approach where human gene variants are explicitly incorporated into what we call ‘network variants’ in the recently released KEGG NETWORK database. This allows accumulation of knowledge about disease-related perturbed molecular networks caused not only by gene variants, but also by viruses and other pathogens, environmental factors and drugs. We expect that KEGG NETWORK will become another reference knowledge base for the basic understanding of disease mechanisms and practical use in clinical sequencing and drug development.

Visit KEGG and navigate to the KEGG Pathway Database.
Click on “Organism”, enter sce into the organism code box, and click Select.
Enter Mbp1 into the keyword box and click go.
Consider the pathway map. The KEGG maps are generally considered to be the gold standard in this field.
Click on Mbp1 to open the annotation data for this protein … you can easily fetch data for all other proteins in the map in the same way.

Questions, comments

If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.

Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.

References

About this page …

[END]

Molecular Function Databases

Boris Steipe

Contents

E.C. Numbers

GO (Gene Ontology)

Pathway Databases

Questions, comments

References