Expected Preparations:
|
|||||||
|
|||||||
Keywords: Ontologies in knowledge engineering; GO and GOA | |||||||
|
|||||||
Objectives:
To introduce the Gene Ontology project and associated data and services; |
Outcomes:
You are familar with the concept of an ontology and the terms and
ontologies of the GO project; You can search for a gene of interest, identify associations, evaluate the term graph, find relevant ancestor nodes in the inferred tree, and discover proteins with related function. |
||||||
|
|||||||
Deliverables: Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these. Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page. |
|||||||
|
|||||||
Evaluation: NA: This unit is not evaluated for course marks. |
Introduction to the Gene Ontology (GO) and Gene Ontology Annotations (GOA).
The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.
Task…
Read the introductory notes on the Gene Ontology project to define and annotate gene functionPDF.
Browse through the paper describing the 2019 update on the GO database and tools:
The Gene Ontology Consortium.
(2019). “The Gene Ontology Resource: 20 years and still GOing strong”.
Nucleic Acids Research 47(D1):D330–D338 .
[PMID: 30395331]
[DOI: 10.1093/nar/gky1055]
Memorize the three separate component ontologies of GO – and how they are defined:
Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold below: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions. The codes are ubiquitous and important, you need to know what they mean and imply when working with GOA data.
Automatically-assigned Evidence Codes
Curator-assigned Evidence Codes
–
Experimental Evidence Codes
– Computational Analysis Evidence Codes
– Author Statement Evidence Codes
– Curator Statement Evidence Codes
For further details, see the Guide to GO Evidence Codes.
For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation.
Bioconducter has a large number of packages that supply and analyze GO and GOA data.
AmiGO 2 is a convenient online GO browser developed by the Gene Ontology consortium and hosted on their website.
Task…
Mbp1
into the search box to initiate a search for
the yeast Mbp1 transcription factor (as gene or protein
name).
GO annotations for a protein are called associations.
Task…
GO is large and very detailed and the need for somehwat more high-level descriptions in model organisms is met by the GoSlim datasets that are curated by some of the main model-organism databases and consortia. Follow the link and read more about GO slims (short).
The Gene Ontology Consortium.
(2017). “Expansion of the Gene Ontology knowledgebase and resources”.
Nucleic Acids Research 45(D1):D331–D338 .
[PMID: 27899567]
[DOI: 10.1093/nar/gkw1108]
Gene Ontology
Consortium. (2012). “The Gene Ontology: enhancements for 2011”.
Nucleic Acids Research 40(Database
issue):D559–64 .
[PMID: 22102568]
[DOI: 10.1093/nar/gkr1028]
Gene Ontology
Consortium. (2010). “The Gene Ontology in 2010: extensions and
refinements”. Nucleic Acids Research 38(Database
issue):D331–5 .
[PMID: 19920128]
[DOI: 10.1093/nar/gkp1018]
Bastos, Hugo
P et al.. (2011). “Application of gene ontology to gene
identification”. Methods in Molecular Biology (Clifton, N.j.)
760:141–57 .
[PMID: 21779995]
[DOI: 10.1007/978-1-61779-176-5_9]
Plessis, Louis
d, Nives Skunca, and Christophe Dessimoz. (2011). “The what,
where, how and why of gene ontology–a primer for bioinformaticians”.
Briefings in Bioinformatics 12(6):723–35 .
[PMID: 21330331]
[DOI: 10.1093/bib/bbr002]
Hackenberg,
Michael and Rune Matthiesen. (2010). “Algorithms and methods for
correlating experimental results with annotation databases”. Methods
in Molecular Biology (Clifton, USA) 593:315–40
.
[PMID: 19957156]
[DOI: 10.1007/978-1-60327-194-3_15]
Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2…
Goble,
Carole and Chris Wroe. (2004). “The Montagues and the Capulets”.
Comparative and Functional Genomics
5(8):623–32 .
[PMID: 18629186]
[DOI: 10.1002/cfg.442]
Harris, Midori
A. (2008). “Developing an ontology”. Methods in Molecular
Biology (Clifton, N.j.) 452:111–24 .
[PMID: 18563371]
[DOI: 10.1007/978-1-60327-159-2_5]
Dimmer,
Emily et al.. (2007). “Methods for gene ontology
annotation”. Methods in Molecular Biology (Clifton, N.j.)
406:495–520 .
[PMID: 18287709]
[DOI: 10.1007/978-1-59745-535-0_24]
If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.
Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.
[END]