BIN-FUNC-GO

Gene Ontology

(Ontologies in knowledge engineering, GO and GOA)

Abstract:

Introduction to the Gene Ontology (GO) and Gene Ontology Annotations (GOA).

Objectives:
This unit will ...

... introduce Go and associated data and services;

Outcomes:
After working through this unit you ...

... are familar with the concept of an ontology and the terms and ontologies of the GO project;
... can search for a gene of interest, identify associations, evaluate the term graph, find relevant ancestor nodes in the inferred tree, and discover proteins with related function.

Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Prerequisites:
This unit builds on material covered in the following prerequisite units:

BIN-FUNC-Databases (Molecular Function Databases)

Introduction

The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.

Task:

Read the introductory notes on the Gene Ontology project to define and annotate gene function.

Browse through the papar describing the 2017 update on the GO database and tools; in particular take note of the LEGO initiative that aims to build systems and pathway models by combining suitable GO terms.

The Gene Ontology Consortium (2017) Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res 45:D331-D338. (pmid: 27899567)

[ PubMed ] [ DOI ] Abstract

Remember the three separate component ontologies of GO:

Molecular function
Biological Process
Cellular component

GO evidence codes

Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.

The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold below: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions. The codes are ubiquitous and important, you need to know what they mean and imply when working with GOA data.

Automatically-assigned Evidence Codes

IEA: Inferred from Electronic Annotation

Curator-assigned Evidence Codes

Experimental Evidence Codes
- EXP: Inferred from Experiment
- IGI: Inferred from Genetic Interaction

Computational Analysis Evidence Codes
- ISS: Inferred from Sequence or Structural Similarity
- ISO: Inferred from Sequence Orthology
- ISA: Inferred from Sequence Alignment
- ISM: Inferred from Sequence Model
- IGC: Inferred from Genomic Context
- IBA: Inferred from Biological aspect of Ancestor
- IBD: Inferred from Biological aspect of Descendant
- IKR: Inferred from Key Residues
- IRD: Inferred from Rapid Divergence
- RCA: inferred from Reviewed Computational Analysis

Author Statement Evidence Codes
- TAS: Traceable Author Statement
- NAS: Non-traceable Author Statement

Curator Statement Evidence Codes
- IC: Inferred by Curator
- ND: No biological Data available

For further details, see the Guide to GO Evidence Codes and the GO Evidence Code Decision Tree.

GO tools

For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation.

Bioconducter has a large number of packages that supply and analyze GO and GOA data.

AmiGO

AmiGO is a convenient online GO browser developed by the Gene Ontology consortium and hosted on their website.

AmiGO - Gene products

Task:

Navigate to the GO homepage.
Enter Mbp1 into the search box to initiate a search for the yeast Mbp1 transcription factor (as gene or protein name).
There are a three catgories of hits - Ontology terms directly associated with the search string, Genes and gene products annoted to terms in GOA, and Annotations of terms to any of the genes. As usual, we need to be wary of keyword searches since they rarely identify a unique gene, so we check the Genes... category first. Follow the link.
From the table you find you can easily identify the correct gene. Follow its link to the associated Gene Information page. Study the information on that page.
Note that this page lists Associations - i.e. GO terms that haven been associated with Mbp1 in GOA.

AmiGO - Associations

GO annotations for a protein are called associations.

Task:

Expand the reuslt count to show all Associations on the page. Note the evidence codes. Also note that there are two associations referring to a negative experiment, with the red annotation qualifier NOT. This means an experiment has demonstrated the gene not to have that activity.
Note that you can expand the left hand menu for detailed filtering. Click on Ontology (aspect) to display or undisplay the terms for the three different component ontologies - the GO "aspects": F, C, and P (what were these again? This is one thing you must remember.).
The most specifc annotation on the page seems to be "positive regulation of transcription involved in G1/S transition of mitotic cell cycle". Follow the link.
Note that you can now filter for organisms. Restrict the organism to Saccaromyces cerevisiae S288C by clicking on the green (+) sign. Note that you now see all yeast genes that are annotated to this term! This is an effective way to build system membership information from the bottom up.
There are a number of tabs available for different views on the data: Annotations, Graph Views, Inferred Tree View, Neigborhood, and Mappings. Visit them.
1. The link to QuickGo from the Graph Views tab gives you the entire ancestor chart of the term, with clicakble term nodes. You need to consider the ancestor terms to expand searches for related, collaborating genes. For example, if a term is annotaed with "positive regulation ...", you will need to consider genes associete to the cognate "negative regulation ..." or just "regulation ..." terms as well to get a complete picture of the gene's activities.
2. Neigborhood refers to the ancestors and children of a term.
Study the information available on that page and through the tabs on the page, especially the graph view.
In the Inferred Tree View tab, find the ancestor node "GO:0000082 G1/S transition of mitotic cell cycle" and follow it. On the associations page, go again to Annotations and filter the list to S. cerevisiae genes. As of today there are 137 annotated genes. Are these genes specifically annotated to that term, or does the list include genes that are annotated to descendants of the term?

GO Slims

GO is large and very detailed and the need for somehwat more high-level descriptions in model organisms is met by the GoSlim datasets that are curated by some of the main model-organism databases and consortia. Follow the link and read more about GO slims (short).

Self-evaluation

Notes

Further reading, links and resources

The Gene Ontology Consortium (2017) Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res 45:D331-D338. (pmid: 27899567)

[ PubMed ] [ DOI ] Abstract

Gene Ontology Consortium (2012) The Gene Ontology: enhancements for 2011. Nucleic Acids Res 40:D559-64. (pmid: 22102568)

[ PubMed ] [ DOI ] Abstract

Gene Ontology Consortium (2010) The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res 38:D331-5. (pmid: 19920128)

[ PubMed ] [ DOI ] Abstract

Bastos et al. (2011) Application of gene ontology to gene identification. Methods Mol Biol 760:141-57. (pmid: 21779995)

[ PubMed ] [ DOI ] Abstract

du Plessis et al. (2011) The what, where, how and why of gene ontology--a primer for bioinformaticians. Brief Bioinformatics 12:723-35. (pmid: 21330331)

[ PubMed ] [ DOI ] Abstract

Hackenberg & Matthiesen (2010) Algorithms and methods for correlating experimental results with annotation databases. Methods Mol Biol 593:315-40. (pmid: 19957156)

[ PubMed ] [ DOI ] Abstract

Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2...

Goble & Wroe (2004) The Montagues and the Capulets. Comp Funct Genomics 5:623-32. (pmid: 18629186)

[ PubMed ] [ DOI ] Abstract

Harris (2008) Developing an ontology. Methods Mol Biol 452:111-24. (pmid: 18563371)

[ PubMed ] [ DOI ] Abstract

Dimmer et al. (2007) Methods for gene ontology annotation. Methods Mol Biol 406:495-520. (pmid: 18287709)

[ PubMed ] [ DOI ] Abstract

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.

About ...

Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-11-12

Version:

1.0

Version history:

1.0 First live
0.1 First stub

This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.

BIN-FUNC-GO

Contents

Contents

Introduction

GO evidence codes

GO tools

AmiGO

AmiGO - Gene products

AmiGO - Associations

GO Slims

Self-evaluation

Notes

Further reading, links and resources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools