… introduce key data resources for functional data.
Outcomes:
After working through this unit you …
… can access a variety of databases with functional information to
retrieve single molecule, pathway and network information.
Deliverables:
Time management: Before you begin,
estimate how long it will take you to complete this unit. Then, record
in your course journal: the number of hours you estimated, the number of
hours you worked on the unit, and the amount of time that passed between
start and completion of this unit.
Journal: Document your progress in
your Course
Journal. Some tasks may ask you to include specific items in your
journal. Don’t overlook these.
Insights: If you find something
particularly noteworthy about this unit, make a note in your insights!
page.
Evaluation:
NA: This unit is not evaluated for course marks.
Contents
This unit provides a brief introduction
to key data resources for functional data.
Obvious function annotations for individual genes can be derived from
enzymatic activity. The Enzyme Commission publishes E.C. codes for this
purpose and many database annotate E.C. codes if such information is
available.
Caspi, Ronet al.. (2020). “The MetaCyc database of metabolic pathways and
enzymes - a 2019 update”. Nucleic Acids Research48(D1):D445–D453 . [PMID:
31586394][DOI: 10.1093/nar/gkz862]
MetaCyc (MetaCyc.org) is a comprehensive reference database of metabolic
pathways and enzymes from all domains of life. It contains 2749 pathways
derived from more than 60 000 publications, making it the largest
curated collection of metabolic pathways. The data in MetaCyc are
evidence-based and richly curated, resulting in an encyclopedic
reference tool for metabolism. MetaCyc is also used as a knowledge base
for generating thousands of organism-specific Pathway/Genome Databases
(PGDBs), which are available in BioCyc.org and other genomic portals.
This article provides an update on the developments in MetaCyc during
September 2017 to August 2019, up to version 23.1. Some of the topics
that received intensive curation during this period include cobamides
biosynthesis, sterol metabolism, fatty acid biosynthesis, lipid
metabolism, carotenoid metabolism, protein glycosylation, antibiotics
and cytotoxins biosynthesis, siderophore biosynthesis, bioluminescence,
vitamin K metabolism, brominated compound metabolism, plant secondary
metabolism and human metabolism. Other additions include modifications
to the GlycanBuilder software that enable displaying glycans using
symbolic representation, improved graphics and fonts for web displays,
improvements in the PathoLogic component of Pathway Tools, and the
optional addition of regulatory information to pathway diagrams.
The Reactome Knowledgebase (https://reactome.org) provides molecular details of
signal transduction, transport, DNA replication, metabolism and other
cellular processes as an ordered network of molecular transformations in
a single consistent data model, an extended version of a classic
metabolic map. Reactome functions both as an archive of biological
processes and as a tool for discovering functional relationships in data
such as gene expression profiles or somatic mutation catalogs from tumor
cells. To extend our ability to annotate human disease processes, we
have implemented a new drug class and have used it initially to annotate
drugs relevant to cardiovascular disease. Our annotation model depends
on external domain experts to identify new areas for annotation and to
review new content. New web pages facilitate recruitment of community
experts and allow those who have contributed to Reactome to identify
their contributions and link them to their ORCID records. To improve
visualization of our content, we have implemented a new tool to
automatically lay out the components of individual reactions with
multiple options for downloading the reaction diagrams and associated
data, and a new display of our event hierarchy that will facilitate
visual interpretation of pathway analysis results.
In the left-hand menu, expand the “Cell Cycle” topic (click on the ⊞
).
Click on “Cell Cycle, Mitotic”, then double-click on the blueish
icon that has a hovertext: “Pathway and an enhanced diagram”.
Click on “MITOTIC G1/G1-S PHASES”.
Read the definition, then click on the “Molecules” tab - you can get
lists of participating molecules for download.
Next click on the “Expression” tab. This gets you tissue-specific
expression levels.
Task…
Read the current overview of the WikiPathway database offerings:
Slenter, Denise
Net al.. (2018). “WikiPathways: a multifaceted pathway
database bridging metabolomics to other omics research”. Nucleic
Acids Research46(D1):D661–D667 . [PMID: 29136241][DOI: 10.1093/nar/gkx1064]
WikiPathways (wikipathways.org) captures the collective knowledge
represented in biological pathways. By providing a database in a
curated, machine readable way, omics data analysis and visualization is
enabled. WikiPathways and other pathway databases are used to analyze
experimental data by research groups in many fields. Due to the open and
collaborative nature of the WikiPathways platform, our content keeps
growing and is getting more accurate, making WikiPathways a reliable and
rich pathway database. Previously, however, the focus was primarily on
genes and proteins, leaving many metabolites with only limited
annotation. Recent curation efforts focused on improving the annotation
of metabolism and metabolic pathways by associating unmapped metabolites
with database identifiers and providing more detailed interaction
knowledge. Here, we report the outcomes of the continued growth and
curation efforts, such as a doubling of the number of annotated
metabolite nodes in WikiPathways. Furthermore, we introduce an OpenAPI
documentation of our web services and the FAIR (Findable, Accessible,
Interoperable and Reusable) annotation of resources to increase the
interoperability of the knowledge encoded in these pathways and
experimental omics data. New search options, monthly downloads, more
links to metabolite databases, and new portals make pathway knowledge
more effortlessly accessible to individual researchers and research
communities.
Enter Mbp1 into the search box. Explore the pathway you
retrieve. Note that the individual genes are clickable and display links
to other pathways for the respective proteins. For example Ubiquitin
(Uba1) is linked to thirteen
other pathways
Task…
Read the two current overviews of the KEGG database:
Kanehisa,
Minoruet al.. (2017). “KEGG: new perspectives on genomes,
pathways, diseases and drugs”. Nucleic Acids Research45(D1):D353–D361 . [PMID:
27899662][DOI: 10.1093/nar/gkw1092]
KEGG (http://www.kegg.jp/
or http://www.genome.jp/kegg/) is an encyclopedia of genes
and genomes. Assigning functional meanings to genes and genomes both at
the molecular and higher levels is the primary objective of the KEGG
database project. Molecular-level functions are stored in the KO (KEGG
Orthology) database, where each KO is defined as a functional ortholog
of genes and proteins. Higher-level functions are represented by
networks of molecular interactions, reactions and relations in the forms
of KEGG pathway maps, BRITE hierarchies and KEGG modules. In the past
the KO database was developed for the purpose of defining nodes of
molecular networks, but now the content has been expanded and the
quality improved irrespective of whether or not the KOs appear in the
three molecular network databases. The newly introduced addendum
category of the GENES database is a collection of individual proteins
whose functions are experimentally characterized and from which an
increasing number of KOs are defined. Furthermore, the DISEASE and DRUG
databases have been improved by systematic analysis of drug labels for
better integration of diseases and drugs with the KEGG molecular
networks. KEGG is moving towards becoming a comprehensive knowledge base
for both functional interpretation and practical application of genomic
information.
Kanehisa,
Minoruet al.. (2019). “New approach for understanding
genome variations in KEGG”. Nucleic Acids Research47(D1):D590–D595 . [PMID:
30321428][DOI: 10.1093/nar/gky962]
KEGG (Kyoto Encyclopedia of Genes and Genomes; https://www.kegg.jp/ or https://www.genome.jp/kegg/) is a reference knowledge
base for biological interpretation of genome sequences and other
high-throughput data. It is an integrated database consisting of three
generic categories of systems information, genomic information and
chemical information, and an additional human-specific category of
health information. KEGG pathway maps, BRITE hierarchies and KEGG
modules have been developed as generic molecular networks with KEGG
Orthology nodes of functional orthologs so that KEGG pathway mapping and
other procedures can be applied to any cellular organism. Unfortunately,
however, this generic approach was inadequate for knowledge
representation in the health information category, where variations of
human genomes, especially disease-related variations, had to be
considered. Thus, we have introduced a new approach where human gene
variants are explicitly incorporated into what we call ‘network
variants’ in the recently released KEGG NETWORK database. This allows
accumulation of knowledge about disease-related perturbed molecular
networks caused not only by gene variants, but also by viruses and other
pathogens, environmental factors and drugs. We expect that KEGG NETWORK
will become another reference knowledge base for the basic understanding
of disease mechanisms and practical use in clinical sequencing and drug
development.
Click on “Organism”, enter sce into the organism code
box, and click Select.
Enter Mbp1 into the keyword box and click
go.
Consider the pathway
map. The KEGG maps are generally considered to be the gold standard
in this field.
Click on Mbp1 to open the
annotation data for this protein … you can easily fetch data for all
other proteins in the map in the same way.
Questions, comments
If in doubt, ask! If anything about this contents is
not clear to you, do not proceed but ask for clarification. If you have
ideas about how to make this material better, let’s hear them. We are
aiming to compile a list of FAQs for all learning units, and your
contributions will count towards your participation marks.
Improve this page! If you have questions or
comments, please post them on the Quercus Discussion board with a
subject line that includes the name of the unit.