BIN-FUNC-Databases

From "A B C"
Jump to navigation Jump to search

Molecular Function Databases

(EC numbers, BioCyc, Reactome, Wikigenes, KEGG)


 


Abstract:

This unit provides a brief introduction to key data resources for functional data.


Objectives:
This unit will ...

  • ... introduce key data resources for functional data.

Outcomes:
After working through this unit you ...

  • ... can access a variety of databases with functional information to retrieve single molecule, pathway and network information.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

  • Prerequisites:
    This unit builds on material covered in the following prerequisite units:


     



     



     


    Evaluation

    Evaluation: NA

    This unit is not evaluated for course marks.

    Contents

    Task:


     

    E.C. Numbers

    Obvious function annotations for individual genes can be derived from enzymatic activity. The Enzyme Commission publishes E.C. codes for this purpose and many database annotate E.C. codes if such information is available.

    Task:


     

    GO (Gene Ontology)

    Not here. GO is so important, it has its own learning unit.


     

    Pathway Databases

     

    Task:
    MetaCyc collects BioCyc metabolic pathway databases. Read:

    Caspi et al. (2020) The MetaCyc database of metabolic pathways and enzymes - a 2019 update. Nucleic Acids Res 48:D445-D453. (pmid: 31586394)

    PubMed ] [ DOI ] MetaCyc (MetaCyc.org) is a comprehensive reference database of metabolic pathways and enzymes from all domains of life. It contains 2749 pathways derived from more than 60 000 publications, making it the largest curated collection of metabolic pathways. The data in MetaCyc are evidence-based and richly curated, resulting in an encyclopedic reference tool for metabolism. MetaCyc is also used as a knowledge base for generating thousands of organism-specific Pathway/Genome Databases (PGDBs), which are available in BioCyc.org and other genomic portals. This article provides an update on the developments in MetaCyc during September 2017 to August 2019, up to version 23.1. Some of the topics that received intensive curation during this period include cobamides biosynthesis, sterol metabolism, fatty acid biosynthesis, lipid metabolism, carotenoid metabolism, protein glycosylation, antibiotics and cytotoxins biosynthesis, siderophore biosynthesis, bioluminescence, vitamin K metabolism, brominated compound metabolism, plant secondary metabolism and human metabolism. Other additions include modifications to the GlycanBuilder software that enable displaying glycans using symbolic representation, improved graphics and fonts for web displays, improvements in the PathoLogic component of Pathway Tools, and the optional addition of regulatory information to pathway diagrams.

    • Visit MetaCyc
    • Click on "Change Organism Database" (top right, under the login section) and select Saccharomyces cerevisiae.
    • Select MetabolismCellular overview and exlore the pathway map.


     

    Task:
    Reactome is a very large, well curated knowledgebase of human pathways. Read the current overview of Reactome database offerings:

    Jassal et al. (2020) The reactome pathway knowledgebase. Nucleic Acids Res 48:D498-D503. (pmid: 31691815)

    PubMed ] [ DOI ] The Reactome Knowledgebase (https://reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations in a single consistent data model, an extended version of a classic metabolic map. Reactome functions both as an archive of biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. To extend our ability to annotate human disease processes, we have implemented a new drug class and have used it initially to annotate drugs relevant to cardiovascular disease. Our annotation model depends on external domain experts to identify new areas for annotation and to review new content. New web pages facilitate recruitment of community experts and allow those who have contributed to Reactome to identify their contributions and link them to their ORCID records. To improve visualization of our content, we have implemented a new tool to automatically lay out the components of individual reactions with multiple options for downloading the reaction diagrams and associated data, and a new display of our event hierarchy that will facilitate visual interpretation of pathway analysis results.

    • Visit Reactome
    • Click on "Pathway Browser".
    • In the left-hand menu, expand the "Cell Cycle" topic (click on the ⊞ ).
    • Click on "Cell Cycle, Mitotic", then double-click on the blueish icon that has a hovertext: "Pathway and an enhanced diagram".
    • Click on "MITOTIC G1/G1-S PHASES".
    • Read the definition, then click on the "Molecules" tab - you can get lists of participating molecules for download.
    • Next click on the "Expression" tab. This gets you tissue-specific expression levels.


     

    Task:
    Read the current overview of the WikiPathway database offerings:

    Slenter et al. (2018) WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res 46:D661-D667. (pmid: 29136241)

    PubMed ] [ DOI ] WikiPathways (wikipathways.org) captures the collective knowledge represented in biological pathways. By providing a database in a curated, machine readable way, omics data analysis and visualization is enabled. WikiPathways and other pathway databases are used to analyze experimental data by research groups in many fields. Due to the open and collaborative nature of the WikiPathways platform, our content keeps growing and is getting more accurate, making WikiPathways a reliable and rich pathway database. Previously, however, the focus was primarily on genes and proteins, leaving many metabolites with only limited annotation. Recent curation efforts focused on improving the annotation of metabolism and metabolic pathways by associating unmapped metabolites with database identifiers and providing more detailed interaction knowledge. Here, we report the outcomes of the continued growth and curation efforts, such as a doubling of the number of annotated metabolite nodes in WikiPathways. Furthermore, we introduce an OpenAPI documentation of our web services and the FAIR (Findable, Accessible, Interoperable and Reusable) annotation of resources to increase the interoperability of the knowledge encoded in these pathways and experimental omics data. New search options, monthly downloads, more links to metabolite databases, and new portals make pathway knowledge more effortlessly accessible to individual researchers and research communities.

    • Visit WikiPathways
    • Enter Mbp1 into the search box. Explore the pathway you retrieve. Note that the individual genes are clickable and display links to other pathways for the respective proteins. For example Ubiquitin (Uba1) is linked to thirteen other pathways


     

    Task:
    Read the two current overviews of the KEGG database:

    Kanehisa et al. (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45:D353-D361. (pmid: 27899662)

    PubMed ] [ DOI ] KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an encyclopedia of genes and genomes. Assigning functional meanings to genes and genomes both at the molecular and higher levels is the primary objective of the KEGG database project. Molecular-level functions are stored in the KO (KEGG Orthology) database, where each KO is defined as a functional ortholog of genes and proteins. Higher-level functions are represented by networks of molecular interactions, reactions and relations in the forms of KEGG pathway maps, BRITE hierarchies and KEGG modules. In the past the KO database was developed for the purpose of defining nodes of molecular networks, but now the content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases. The newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined. Furthermore, the DISEASE and DRUG databases have been improved by systematic analysis of drug labels for better integration of diseases and drugs with the KEGG molecular networks. KEGG is moving towards becoming a comprehensive knowledge base for both functional interpretation and practical application of genomic information.

    Kanehisa et al. (2019) New approach for understanding genome variations in KEGG. Nucleic Acids Res 47:D590-D595. (pmid: 30321428)

    PubMed ] [ DOI ] KEGG (Kyoto Encyclopedia of Genes and Genomes; https://www.kegg.jp/ or https://www.genome.jp/kegg/) is a reference knowledge base for biological interpretation of genome sequences and other high-throughput data. It is an integrated database consisting of three generic categories of systems information, genomic information and chemical information, and an additional human-specific category of health information. KEGG pathway maps, BRITE hierarchies and KEGG modules have been developed as generic molecular networks with KEGG Orthology nodes of functional orthologs so that KEGG pathway mapping and other procedures can be applied to any cellular organism. Unfortunately, however, this generic approach was inadequate for knowledge representation in the health information category, where variations of human genomes, especially disease-related variations, had to be considered. Thus, we have introduced a new approach where human gene variants are explicitly incorporated into what we call 'network variants' in the recently released KEGG NETWORK database. This allows accumulation of knowledge about disease-related perturbed molecular networks caused not only by gene variants, but also by viruses and other pathogens, environmental factors and drugs. We expect that KEGG NETWORK will become another reference knowledge base for the basic understanding of disease mechanisms and practical use in clinical sequencing and drug development.

    • Visit KEGG and navigate to the KEGG Pathway Database.
    • Click on "Organism", enter sce into the organism code box, and click Select.
    • Enter Mbp1 into the keyword box and click go.
    • Consider the pathway map. The KEGG maps are generally considered to be the gold standard in this field.
    • Click on Mbp1 to open the annotation data for this protein ... you can easily fetch data for all other proteins in the map in the same way.

    Notes


     


    About ...
     
    Author:

    Boris Steipe <boris.steipe@utoronto.ca>

    Created:

    2017-08-05

    Modified:

    2020-09-23

    Version:

    1.1

    Version history:

    • 1.1 2020 updates
    • 1.0 First live version
    • 0.1 First stub

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.