Difference between revisions of "BIN-FUNC-GO"

From "A B C"
Jump to navigation Jump to search
m
m
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
 
Gene Ontology
 
Gene Ontology
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 
+
(Ontologies in knowledge engineering, GO and GOA)
  {{Vspace}}
+
</div>
 
 
<div class="keywords">
 
<b>Keywords:</b>&nbsp;
 
Ontologies in knowledge engineering, GO and GOA
 
 
</div>
 
</div>
  
{{Vspace}}
+
{{Smallvspace}}
 
 
 
 
__TOC__
 
 
 
{{Vspace}}
 
 
 
 
 
{{DEV}}
 
 
 
{{Vspace}}
 
  
  
 +
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
 +
<div style="font-size:118%;">
 +
<b>Abstract:</b><br />
 +
<section begin=abstract />
 +
Introduction to the Gene Ontology (GO) and Gene Ontology Annotations (GOA).
 +
<section end=abstract />
 +
</div>
 +
<!-- ============================  -->
 +
<hr>
 +
<table>
 +
<tr>
 +
<td style="padding:10px;">
 +
<b>Objectives:</b><br />
 +
This unit will ...
 +
* ... introduce Go and associated data and services;
 +
</td>
 +
<td style="padding:10px;">
 +
<b>Outcomes:</b><br />
 +
After working through this unit you ...
 +
* ... are familar with the concept of an ontology and the terms and ontologies of the GO project;
 +
* ... can search for a gene of interest, identify associations, evaluate the term graph, find relevant ancestor nodes in the inferred tree, and discover proteins with related function.
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================  -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 +
*[[BIN-FUNC-Databases|BIN-FUNC-Databases (Molecular Function Databases)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 
</div>
 
</div>
<div id="ABC-unit-framework">
 
== Abstract ==
 
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "abstract" -->
 
...
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
== This unit ... ==
 
=== Prerequisites ===
 
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "prerequisites" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
 
You need to complete the following units before beginning this one:
 
*[[BIN-FUNC-Databases]]
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Objectives ===
+
__TOC__
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "objectives" -->
 
...
 
 
 
{{Vspace}}
 
 
 
 
 
=== Outcomes ===
 
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "outcomes" -->
 
...
 
 
 
{{Vspace}}
 
 
 
 
 
=== Deliverables ===
 
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "deliverables" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|course journal]].
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|insights! page]].
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 70: Line 65:
  
 
=== Evaluation ===
 
=== Evaluation ===
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
 
<b>Evaluation: NA</b><br />
 
<b>Evaluation: NA</b><br />
:This unit is not evaluated for course marks.
+
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
 +
== Contents ==
  
 
{{Vspace}}
 
{{Vspace}}
  
 
+
==Introduction==
</div>
+
{{Smallvspace}}
<div id="BIO">
+
The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.
== Contents ==
+
{{Smallvspace}}
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "contents" -->
 
 
 
 
 
 
{{Task|1=
 
{{Task|1=
 
*Read the introductory notes on {{ABC-PDF|BIN-FUNC-GO|the Gene Ontology project to define and annotate gene function}}.
 
*Read the introductory notes on {{ABC-PDF|BIN-FUNC-GO|the Gene Ontology project to define and annotate gene function}}.
}}
 
  
 +
* Browse through the paper describing the 2019 update on the GO database and tools.
 +
{{#pmid: 30395331}}
  
 +
}}
  
==GO==
 
  
==Introduction==
+
Remember the three separate component ontologies of GO:
  
{{#pmid: 18563371}}
+
* '''Molecular function'''
{{#pmid: 19957156}}
+
* '''Biological Process'''
 +
* '''Cellular component'''
  
==GO==
+
{{Vspace}}
The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.
 
  
{{WWW|WWW_GO}}
+
==GO evidence codes==
{{#pmid: 21330331}}
+
{{Smallvspace}}
 
 
The GO actually comprises three separate ontologies:
 
 
 
;Molecular function
 
:...
 
 
 
 
 
;Biological Process
 
:...
 
 
 
 
 
;Cellular component:
 
: ...
 
 
 
 
 
===GO terms===
 
GO terms comprise the core of the information in the ontology: a carefully crafted definition of a term in any of GO's separate ontologies.
 
 
 
 
 
 
 
===GO relationships===
 
The nature of the relationships is as much a part of the ontology as the terms themselves. GO uses three categories of relationships:
 
 
 
* is a
 
* part of
 
* regulates
 
 
 
 
 
===GO annotations===
 
The GO terms are conceptual in nature, and while they represent our interpretation of biological phenomena, they do not intrinsically represent biological objects, such a specific genes or proteins. In order to link molecules with these concepts, the ontology is used to '''annotate''' genes. The annotation project is referred to as GOA.
 
 
 
{{#pmid:18287709}}
 
 
 
 
 
===GO evidence codes===
 
 
Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
 
Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
  
The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions.
+
The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold below: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions. The codes are '''ubiquitous and important''', you need to know what they mean and imply when working with GOA data.
  
 
;Automatically-assigned Evidence Codes
 
;Automatically-assigned Evidence Codes
 
*IEA: Inferred from Electronic Annotation
 
*IEA: Inferred from Electronic Annotation
 +
 
;Curator-assigned Evidence Codes
 
;Curator-assigned Evidence Codes
 
*'''Experimental Evidence Codes'''
 
*'''Experimental Evidence Codes'''
**EXP: Inferred from Experiment
+
**<b>EXP: Inferred from Experiment
 
**IDA: Inferred from Direct Assay
 
**IDA: Inferred from Direct Assay
 
**IPI: Inferred from Physical Interaction
 
**IPI: Inferred from Physical Interaction
 
**IMP: Inferred from Mutant Phenotype
 
**IMP: Inferred from Mutant Phenotype
**IGI: Inferred from Genetic Interaction
+
**IEP: Inferred from Expression Pattern
**IEP: Inferred from Expression Pattern</b>
+
**IGI: Inferred from Genetic Interaction</b>
 +
 
 
*'''Computational Analysis Evidence Codes'''
 
*'''Computational Analysis Evidence Codes'''
 
**ISS: Inferred from Sequence or Structural Similarity
 
**ISS: Inferred from Sequence or Structural Similarity
Line 162: Line 121:
 
**IRD: Inferred from Rapid Divergence
 
**IRD: Inferred from Rapid Divergence
 
**RCA: inferred from Reviewed Computational Analysis
 
**RCA: inferred from Reviewed Computational Analysis
 +
 
*'''Author Statement Evidence Codes'''
 
*'''Author Statement Evidence Codes'''
 
**TAS: Traceable Author Statement
 
**TAS: Traceable Author Statement
 
**NAS: Non-traceable Author Statement
 
**NAS: Non-traceable Author Statement
 +
 
*'''Curator Statement Evidence Codes'''
 
*'''Curator Statement Evidence Codes'''
 
**IC: Inferred by Curator
 
**IC: Inferred by Curator
 
**ND: No biological Data available
 
**ND: No biological Data available
  
For further details, see the [http://www.geneontology.org/GO.evidence.shtml Guide to GO Evidence Codes] and the [http://www.geneontology.org/GO.evidence.tree.shtml GO Evidence Code Decision Tree].
+
For further details, see the [http://www.geneontology.org/GO.evidence.shtml Guide to GO Evidence Codes].
  
  
&nbsp;
+
{{Vspace}}
  
===GO tools===
+
==GO tools==
  
For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. For details, see [[Computing with GO]] on this wiki.
+
For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation.
  
 +
Bioconducter has a [https://www.bioconductor.org/packages/release/BiocViews.html#___GO large number of packages] that supply and analyze GO and GOA data.
  
 +
{{Vspace}}
  
 +
===AmiGO===
 +
{{Smallvspace}}
 +
[http://amigo.geneontology.org/amigo '''AmiGO 2'''] is a convenient online [http://www.geneontology.org/ '''GO'''] browser developed by the Gene Ontology consortium and hosted on their website.
  
 
+
{{Vspace}}
===AmiGO===
 
practical work with GO: at first via the AmiGO browser
 
[http://amigo.geneontology.org/cgi-bin/amigo/go.cgi '''AmiGO'''] is a [http://www.geneontology.org/ '''GO'''] browser developed by the Gene Ontology consortium and hosted on their website.
 
  
 
====AmiGO - Gene products====
 
====AmiGO - Gene products====
 +
{{Smallvspace}}
 
{{task|1=
 
{{task|1=
 
# Navigate to the [http://www.geneontology.org/ '''GO'''] homepage.
 
# Navigate to the [http://www.geneontology.org/ '''GO'''] homepage.
# Enter <code>SOX2</code> into the search box to initiate a search for the human SOX2 transcription factor ({{WP|SOX2|WP}}, [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=11195 HUGO]) (as ''gene or protein name'').
+
# Enter <code>Mbp1</code> into the search box to initiate a search for the yeast Mbp1 transcription factor (as ''gene or protein name'').
# There are a number of hits in various organisms: ''sulfhydryl oxidases'' and ''(sex determining region Y)-box'' genes. Check to see the various ways by which you could filter and restrict the results.
+
# There are a three catgories of hits - '''Ontology''' terms directly associated with the search string, '''Genes and gene products''' annoted to terms in GOA, and '''Annotations''' of terms to any of the genes. As usual, we need to be wary of keyword searches since they rarely identify a unique gene, so we check the [http://amigo.geneontology.org/amigo/search/bioentity?q=Mbp1 '''Genes...''']  category first. Follow the link.
# Select ''Homo sapiens'' as the '''species''' filter and set the filter. Note that this still does not give you a unique hit, but ...
+
# From the table you find you can easily identify the correct gene. Follow [http://amigo.geneontology.org/amigo/gene_product/SGD:S000002214 its link to the associated '''Gene Information''' page]. Study the information on that page.
# ... you can identify the '''[http://amigo.geneontology.org/cgi-bin/amigo/gp-details.cgi?gp=UniProtKB:P48431 Transcription factor SOX-2]''' and follow its gene product information link. Study the information on that page.
+
# Note that this page lists '''Associations''' - i.e. GO terms that haven been associated with Mbp1 in GOA.
# Later, we will need Entrez Gene IDs. The GOA information page provides these as '''GeneID''' in the '''External references''' section. Note it down.  With the same approach, find and record the Gene IDs (''a'') of the functionally related [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=9221 Oct4 (POU5F1)] protein, (''b'') the human cell-cycle transcription factor [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=3113 E2F1], (''c'') the human bone morphogenetic protein-4 transforming growth factor [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=1071 BMP4], (''d'') the human UDP glucuronosyltransferase 1 family protein 1, an enzyme that is differentially expressed in some cancers, [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=12530 UGT1A1], and (''d'') as a positive control, SOX2's interaction partner [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=20857 NANOG], which we would expect to be annotated as functionally similar to both Oct4 and SOX2.
 
 
}}
 
}}
  
 
+
{{Vspace}}
<!--
 
SOX2: 6657
 
POU5F1: 5460
 
E2F1: 1869
 
BMP4: 652
 
UGT1A1: 54658
 
NANOG: 79923
 
 
 
mgeneSim(c("6657", "5460", "1869", "652", "54658", "79923"), ont="BP", organism="human", measure="Wang")
 
-->
 
  
 
====AmiGO - Associations====
 
====AmiGO - Associations====
 +
{{Smallvspace}}
 
GO annotations for a protein are called ''associations''.
 
GO annotations for a protein are called ''associations''.
  
 
{{task|1=
 
{{task|1=
# Open the ''associations'' information page for the human SOX2 protein via the [http://amigo.geneontology.org/cgi-bin/amigo/gp-assoc.cgi?gp=UniProtKB:P48431 link in the right column] in a separate tab. Study the information on that page.
+
# Use the '''Results count''' selector and increase the number of annotations to show all '''Gene Product Associations''' on the page. Note the evidence codes.
# Note that you can filter the associations by ontology and evidence code. You have read about the three GO ontologies in your previous assignment, but you should also be familiar with the evidence codes. Click on any of the evidence links to access the Evidence code definition page and study the [http://www.geneontology.org/GO.evidence.shtml definitions of the codes]. '''Make sure you understand which codes point to experimental observation, and which codes denote computational inference, or say that the evidence is someone's opinion (TAS, IC ''etc''.).''' <small>Note: it is good practice - but regrettably not universally implemented standard - to clearly document database semantics and keep definitions associated with database entries easily accessible, as GO is doing here. You won't find this everywhere, but as a user please feel encouraged to complain to the database providers if you come across a database where the semantics are not clear. Seriously: opaque semantics make database annotations useless.</small>
+
# Note that you can expand the left hand menu for detailed filtering. Click on '''Ontology (aspect)''' to display or undisplay the terms for the three different component ontologies - the GO "aspects": F, C, and P (what were these again? This is one thing you '''must''' remember.).
# There are many associations (around 60) and a good way to select which ones to pursue is to follow the '''most specific''' ones. Set <code>IDA</code> as a filter and among the returned terms select <code>GO:0035019</code> &ndash; [http://amigo.geneontology.org/cgi-bin/amigo/term_details?term=GO:0035019 ''somatic stem cell maintenance''] in the '''Biological Process''' ontology. Follow that link.
+
# The most specific annotation on the page seems to be [http://amigo.geneontology.org/amigo/term/GO:0071931 ''"positive regulation of transcription involved in G1/S transition of mitotic cell cycle"'']. Follow the link.
 +
# Note that you can now filter for organisms. Restrict the organism to ''Saccaromyces cerevisiae S288C'' by clicking on the green (+) sign. Note that you now see all yeast genes that are annotated to this term! This is an effective way to build system membership information from the bottom up.
 +
# There are a number of tabs available for different views on the data: '''Annotations''', '''Graph Views''', '''Inferred Tree View''', '''Neigborhood''', and '''Mappings'''. Visit them.
 +
## The link to [https://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0071931 QuickGo] from the Graph Views tab gives you the entire ancestor chart of the term, with clickable term nodes. You need to consider the ancestor terms to expand searches for related, collaborating genes. For example, if a term is annotated with "positive regulation ...", you will need to consider genes associete to the cognate "negative regulation ..." or just "regulation ..." terms as well to get a complete picture of the gene's activities.
 +
## Neigborhood refers to the ancestors and children of a term.
 
# Study the information available on that page and through the tabs on the page, especially the graph view.
 
# Study the information available on that page and through the tabs on the page, especially the graph view.
# In the '''Inferred Tree View''' tab, find the genes annotated to this go term for ''homo sapiens''. There should be about 55. Click on [http://amigo.geneontology.org/cgi-bin/amigo/term-assoc.cgi?term=GO:0035019&speciesdb=all&taxid=9606 the number behind the term]. The resulting page will give you all human proteins that have been annotated with this particular term. Note that the great majority of these is via the <code>IEA</code> evidence code.
+
# Navigate to the '''Inferred Tree View''' tab. Note that terms are labelled with icons that signify the category of the relationship: P: "part-of", I: "is-a", and R: "regulates". Find the two-removed ancestor node: "GO:0000082 G1/S transition of mitotic cell cycle", of which GO:0071931 is a part. Follow the link.
 +
# On the annotations tab of  '''GO:0000082''',  filter the list to ''S. cerevisiae'' genes. As of today there are 143 annotated genes. '''Are these genes specifically annotated to that term, or does the list include genes that are annotated to descendants of the term?'''
 
}}
 
}}
  
  
 +
{{Vspace}}
  
 
+
==GO Slims==
 +
{{Smallvspace}}
 +
GO is large and very detailed and the need for somehwat more high-level descriptions in model organisms is met by the  '''GoSlim''' datasets that are curated by some of the main model-organism databases and consortia. Follow the link and read more about [http://geneontology.org/docs/go-subset-guide/ GO slims] (short).
  
 
{{Vspace}}
 
{{Vspace}}
  
 +
== Further reading, links and resources ==
  
== Further reading, links and resources ==
 
  
 +
{{#pmid: 27899567}}
 
{{#pmid: 22102568}}
 
{{#pmid: 22102568}}
 +
{{#pmid: 19920128}}
 +
{{Smallvspace}}
 
{{#pmid: 21779995}}
 
{{#pmid: 21779995}}
{{#pmid: 19920128}}
+
{{#pmid: 21330331}}
 +
{{#pmid: 19957156}}
 +
{{Smallvspace}}
 
Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2...
 
Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2...
 
{{#pmid: 18629186}}
 
{{#pmid: 18629186}}
{{Smallvspace}}
+
{{#pmid: 18563371}}
{{#pmid: 10679470}}
+
{{#pmid: 18287709}}
{{#pmid: 15808743}}
 
 
 
 
 
 
 
{{Vspace}}
 
  
  
 
== Notes ==
 
== Notes ==
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "notes" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
 
 
<references />
 
<references />
  
 
{{Vspace}}
 
{{Vspace}}
  
 
</div>
 
<div id="ABC-unit-framework">
 
== Self-evaluation ==
 
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "self-evaluation" -->
 
<!--
 
=== Question 1===
 
 
Question ...
 
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
Answer ...
 
<div class="mw-collapsible-content">
 
Answer ...
 
 
</div>
 
  </div>
 
 
  {{Vspace}}
 
 
-->
 
 
{{Vspace}}
 
 
 
 
{{Vspace}}
 
 
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
 
 
----
 
 
{{Vspace}}
 
 
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
 
 
----
 
 
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 298: Line 218:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-05
+
:2020-09-24
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:0.1
+
:1.1
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.1 2020 Updates
 +
*1.0 First live
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 03:00, 25 September 2020

Gene Ontology

(Ontologies in knowledge engineering, GO and GOA)


 


Abstract:

Introduction to the Gene Ontology (GO) and Gene Ontology Annotations (GOA).


Objectives:
This unit will ...

  • ... introduce Go and associated data and services;

Outcomes:
After working through this unit you ...

  • ... are familar with the concept of an ontology and the terms and ontologies of the GO project;
  • ... can search for a gene of interest, identify associations, evaluate the term graph, find relevant ancestor nodes in the inferred tree, and discover proteins with related function.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

  • Prerequisites:
    This unit builds on material covered in the following prerequisite units:


     



     



     


    Evaluation

    Evaluation: NA

    This unit is not evaluated for course marks.

    Contents

     

    Introduction

     

    The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.

     

    Task:

    • Browse through the paper describing the 2019 update on the GO database and tools.
    The Gene Ontology Consortium (2019) The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res 47:D330-D338. (pmid: 30395331)

    PubMed ] [ DOI ]


    Remember the three separate component ontologies of GO:

    • Molecular function
    • Biological Process
    • Cellular component


     

    GO evidence codes

     

    Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.

    The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold below: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions. The codes are ubiquitous and important, you need to know what they mean and imply when working with GOA data.

    Automatically-assigned Evidence Codes
    • IEA: Inferred from Electronic Annotation
    Curator-assigned Evidence Codes
    • Experimental Evidence Codes
      • EXP: Inferred from Experiment
      • IDA: Inferred from Direct Assay
      • IPI: Inferred from Physical Interaction
      • IMP: Inferred from Mutant Phenotype
      • IEP: Inferred from Expression Pattern
      • IGI: Inferred from Genetic Interaction
    • Computational Analysis Evidence Codes
      • ISS: Inferred from Sequence or Structural Similarity
      • ISO: Inferred from Sequence Orthology
      • ISA: Inferred from Sequence Alignment
      • ISM: Inferred from Sequence Model
      • IGC: Inferred from Genomic Context
      • IBA: Inferred from Biological aspect of Ancestor
      • IBD: Inferred from Biological aspect of Descendant
      • IKR: Inferred from Key Residues
      • IRD: Inferred from Rapid Divergence
      • RCA: inferred from Reviewed Computational Analysis
    • Author Statement Evidence Codes
      • TAS: Traceable Author Statement
      • NAS: Non-traceable Author Statement
    • Curator Statement Evidence Codes
      • IC: Inferred by Curator
      • ND: No biological Data available

    For further details, see the Guide to GO Evidence Codes.


     

    GO tools

    For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation.

    Bioconducter has a large number of packages that supply and analyze GO and GOA data.


     

    AmiGO

     

    AmiGO 2 is a convenient online GO browser developed by the Gene Ontology consortium and hosted on their website.


     

    AmiGO - Gene products

     

    Task:

    1. Navigate to the GO homepage.
    2. Enter Mbp1 into the search box to initiate a search for the yeast Mbp1 transcription factor (as gene or protein name).
    3. There are a three catgories of hits - Ontology terms directly associated with the search string, Genes and gene products annoted to terms in GOA, and Annotations of terms to any of the genes. As usual, we need to be wary of keyword searches since they rarely identify a unique gene, so we check the Genes... category first. Follow the link.
    4. From the table you find you can easily identify the correct gene. Follow its link to the associated Gene Information page. Study the information on that page.
    5. Note that this page lists Associations - i.e. GO terms that haven been associated with Mbp1 in GOA.


     

    AmiGO - Associations

     

    GO annotations for a protein are called associations.

    Task:

    1. Use the Results count selector and increase the number of annotations to show all Gene Product Associations on the page. Note the evidence codes.
    2. Note that you can expand the left hand menu for detailed filtering. Click on Ontology (aspect) to display or undisplay the terms for the three different component ontologies - the GO "aspects": F, C, and P (what were these again? This is one thing you must remember.).
    3. The most specific annotation on the page seems to be "positive regulation of transcription involved in G1/S transition of mitotic cell cycle". Follow the link.
    4. Note that you can now filter for organisms. Restrict the organism to Saccaromyces cerevisiae S288C by clicking on the green (+) sign. Note that you now see all yeast genes that are annotated to this term! This is an effective way to build system membership information from the bottom up.
    5. There are a number of tabs available for different views on the data: Annotations, Graph Views, Inferred Tree View, Neigborhood, and Mappings. Visit them.
      1. The link to QuickGo from the Graph Views tab gives you the entire ancestor chart of the term, with clickable term nodes. You need to consider the ancestor terms to expand searches for related, collaborating genes. For example, if a term is annotated with "positive regulation ...", you will need to consider genes associete to the cognate "negative regulation ..." or just "regulation ..." terms as well to get a complete picture of the gene's activities.
      2. Neigborhood refers to the ancestors and children of a term.
    6. Study the information available on that page and through the tabs on the page, especially the graph view.
    7. Navigate to the Inferred Tree View tab. Note that terms are labelled with icons that signify the category of the relationship: P: "part-of", I: "is-a", and R: "regulates". Find the two-removed ancestor node: "GO:0000082 G1/S transition of mitotic cell cycle", of which GO:0071931 is a part. Follow the link.
    8. On the annotations tab of GO:0000082, filter the list to S. cerevisiae genes. As of today there are 143 annotated genes. Are these genes specifically annotated to that term, or does the list include genes that are annotated to descendants of the term?


     

    GO Slims

     

    GO is large and very detailed and the need for somehwat more high-level descriptions in model organisms is met by the GoSlim datasets that are curated by some of the main model-organism databases and consortia. Follow the link and read more about GO slims (short).


     

    Further reading, links and resources

    The Gene Ontology Consortium (2017) Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res 45:D331-D338. (pmid: 27899567)

    PubMed ] [ DOI ]

    Gene Ontology Consortium (2012) The Gene Ontology: enhancements for 2011. Nucleic Acids Res 40:D559-64. (pmid: 22102568)

    PubMed ] [ DOI ]

    Gene Ontology Consortium (2010) The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res 38:D331-5. (pmid: 19920128)

    PubMed ] [ DOI ]

     
    Bastos et al. (2011) Application of gene ontology to gene identification. Methods Mol Biol 760:141-57. (pmid: 21779995)

    PubMed ] [ DOI ]

    du Plessis et al. (2011) The what, where, how and why of gene ontology--a primer for bioinformaticians. Brief Bioinformatics 12:723-35. (pmid: 21330331)

    PubMed ] [ DOI ]

    Hackenberg & Matthiesen (2010) Algorithms and methods for correlating experimental results with annotation databases. Methods Mol Biol 593:315-40. (pmid: 19957156)

    PubMed ] [ DOI ]

     

    Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2...

    Goble & Wroe (2004) The Montagues and the Capulets. Comp Funct Genomics 5:623-32. (pmid: 18629186)

    PubMed ] [ DOI ]

    Harris (2008) Developing an ontology. Methods Mol Biol 452:111-24. (pmid: 18563371)

    PubMed ] [ DOI ]

    Dimmer et al. (2007) Methods for gene ontology annotation. Methods Mol Biol 406:495-520. (pmid: 18287709)

    PubMed ] [ DOI ]


    Notes


     


    About ...
     
    Author:

    Boris Steipe <boris.steipe@utoronto.ca>

    Created:

    2017-08-05

    Modified:

    2020-09-24

    Version:

    1.1

    Version history:

    • 1.1 2020 Updates
    • 1.0 First live
    • 0.1 First stub

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.