Difference between revisions of "BIN-FUNC-GO"
m |
m |
||
Line 29: | Line 29: | ||
<section begin=abstract /> | <section begin=abstract /> | ||
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "abstract" --> | <!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "abstract" --> | ||
− | + | Introduction to the Gene Ontology (GO) and Gene Ontology Annotations (GOA). | |
<section end=abstract /> | <section end=abstract /> | ||
Line 47: | Line 47: | ||
=== Objectives === | === Objectives === | ||
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "objectives" --> | <!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "objectives" --> | ||
− | ... | + | This unit will ... |
+ | * ... introduce Go and associated data and services; | ||
{{Vspace}} | {{Vspace}} | ||
Line 54: | Line 55: | ||
=== Outcomes === | === Outcomes === | ||
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "outcomes" --> | <!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "outcomes" --> | ||
− | ... | + | After working through this unit you ... |
+ | * ... are familar with the concept of an ontology and the terms and ontologies of the GO project; | ||
+ | * ... can search for a gene of interest, identify associations, evaluate the term graph, find relevant ancestor nodes in the inferred tree, and discover proteins with related function. | ||
{{Vspace}} | {{Vspace}} | ||
Line 85: | Line 88: | ||
<!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "contents" --> | <!-- included from "../components/BIN-FUNC-GO.components.wtxt", section: "contents" --> | ||
+ | {{Vspace}} | ||
+ | ==Introduction== | ||
+ | {{Smallvspace}} | ||
+ | The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous. | ||
+ | {{Smallvspace}} | ||
{{Task|1= | {{Task|1= | ||
*Read the introductory notes on {{ABC-PDF|BIN-FUNC-GO|the Gene Ontology project to define and annotate gene function}}. | *Read the introductory notes on {{ABC-PDF|BIN-FUNC-GO|the Gene Ontology project to define and annotate gene function}}. | ||
− | |||
− | |||
− | + | * Browse through the papar describing the 2017 update on the GO database and tools; in particular take note of the LEGO initiative that aims to build systems and pathway models by combining suitable GO terms. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
{{#pmid: 27899567}} | {{#pmid: 27899567}} | ||
+ | }} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Remember the three separate component ontologies of GO: | |
− | + | * '''Molecular function''' | |
− | + | * '''Biological Process''' | |
+ | * '''Cellular component''' | ||
+ | {{Vspace}} | ||
− | + | ==GO evidence codes== | |
− | + | {{Smallvspace}} | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | {{ | ||
− | |||
− | |||
− | |||
Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data. | Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data. | ||
− | The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions. | + | The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold below: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions. The codes are '''ubiquitous and important''', you need to know what they mean and imply when working with GOA data. |
;Automatically-assigned Evidence Codes | ;Automatically-assigned Evidence Codes | ||
*IEA: Inferred from Electronic Annotation | *IEA: Inferred from Electronic Annotation | ||
+ | |||
;Curator-assigned Evidence Codes | ;Curator-assigned Evidence Codes | ||
*'''Experimental Evidence Codes''' | *'''Experimental Evidence Codes''' | ||
− | **EXP: Inferred from Experiment | + | **<b>EXP: Inferred from Experiment |
**IDA: Inferred from Direct Assay | **IDA: Inferred from Direct Assay | ||
**IPI: Inferred from Physical Interaction | **IPI: Inferred from Physical Interaction | ||
**IMP: Inferred from Mutant Phenotype | **IMP: Inferred from Mutant Phenotype | ||
− | ** | + | **IEP: Inferred from Expression Pattern |
− | ** | + | **IGI: Inferred from Genetic Interaction</b> |
+ | |||
*'''Computational Analysis Evidence Codes''' | *'''Computational Analysis Evidence Codes''' | ||
**ISS: Inferred from Sequence or Structural Similarity | **ISS: Inferred from Sequence or Structural Similarity | ||
Line 166: | Line 140: | ||
**IRD: Inferred from Rapid Divergence | **IRD: Inferred from Rapid Divergence | ||
**RCA: inferred from Reviewed Computational Analysis | **RCA: inferred from Reviewed Computational Analysis | ||
+ | |||
*'''Author Statement Evidence Codes''' | *'''Author Statement Evidence Codes''' | ||
**TAS: Traceable Author Statement | **TAS: Traceable Author Statement | ||
**NAS: Non-traceable Author Statement | **NAS: Non-traceable Author Statement | ||
+ | |||
*'''Curator Statement Evidence Codes''' | *'''Curator Statement Evidence Codes''' | ||
**IC: Inferred by Curator | **IC: Inferred by Curator | ||
Line 178: | Line 154: | ||
{{Vspace}} | {{Vspace}} | ||
− | + | ==GO tools== | |
− | |||
− | |||
+ | For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. Bioconducter has a [https://www.bioconductor.org/packages/release/BiocViews.html#___GO large number of packages] that supply and analyze GO and GOA data. | ||
+ | {{Vspace}} | ||
+ | ===AmiGO=== | ||
+ | {{Smallvspace}} | ||
+ | [http://amigo.geneontology.org/cgi-bin/amigo/go.cgi '''AmiGO'''] is a convenient online [http://www.geneontology.org/ '''GO'''] browser developed by the Gene Ontology consortium and hosted on their website. | ||
− | + | {{Vspace}} | |
− | |||
− | |||
− | |||
====AmiGO - Gene products==== | ====AmiGO - Gene products==== | ||
+ | {{Smallvspace}} | ||
{{task|1= | {{task|1= | ||
# Navigate to the [http://www.geneontology.org/ '''GO'''] homepage. | # Navigate to the [http://www.geneontology.org/ '''GO'''] homepage. | ||
− | # Enter <code> | + | # Enter <code>Mbp1</code> into the search box to initiate a search for the yeast Mbp1 transcription factor (as ''gene or protein name''). |
− | # There are a | + | # There are a three catgories of hits - '''Ontology''' terms directly associated with the search string, '''Genes and gene products''' annoted to terms in GOA, and '''Annotations''' of terms to any of the genes. As usual, we need to be wary of keyword searches since they rarely identify a unique gene, so we check the [http://amigo.geneontology.org/amigo/search/bioentity?q=Mbp1 '''Genes...'''] category first. Follow the link. |
− | + | # From the table you find you can easily identify the correct gene. Follow [http://amigo.geneontology.org/amigo/gene_product/SGD:S000002214 its link to the associated '''Gene Information''' page]. Study the information on that page. | |
− | + | # Note that this page lists '''Associations''' - i.e. GO terms that haven been associated with Mbp1 in GOA. | |
− | |||
}} | }} | ||
− | + | {{Vspace}} | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
====AmiGO - Associations==== | ====AmiGO - Associations==== | ||
+ | {{Smallvspace}} | ||
GO annotations for a protein are called ''associations''. | GO annotations for a protein are called ''associations''. | ||
{{task|1= | {{task|1= | ||
− | # | + | # Expand the reuslt count to show all '''Associations''' on the page. Note the evidence codes. Also note that there are two associations referring to a negative experiment, with the red annotation qualifier <span style="border:solid 1px #FF0000; color=#FF0000;"><b>NOT</b></span>. This means an experiment has demonstrated the gene '''not''' to have that activity. |
− | # Note that you can | + | # Note that you can expand the left hand menu for detailed filtering. Click on '''Ontology (aspect)''' to display or undisplay the terms for the three different component ontologies - the GO "aspects": F, C, and P (what were these again?). |
− | # There are | + | # The most specifc annotation on the page seems to be [http://amigo.geneontology.org/amigo/term/GO:0071931 ''"positive regulation of transcription involved in G1/S transition of mitotic cell cycle"'']. Follow the link. |
+ | # Note that you can now filter for organisms. Restrict the organism to ''Saccaromyces cerevisiae S288C'' by clicking on the green (+) sign. Note that you now see all yeast genes that are annotated to this term! This is an effective way to build system membership information from the bottom up. | ||
+ | # There are a number of tabs available for different views on the data: '''Annotations''', '''Graph Views''', '''Inferred Tree View''', '''Neigborhood''', and '''Mappings'''. Visit them. | ||
+ | ## The link to [https://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0071931 QuickGo] from the Graph Views tab gives you the entire ancestor chart of the term, with clicakble term nodes. You need to consider the ancestor terms to expand searches for related, collaborating genes. For example, if a term is annotaed with "positive regulation ...", you will need to consider genes associete to the cognate "negative regulation ..." or just "regulation ..." terms as well to get a complete picture of the gene's activities. | ||
+ | ## Neigborhood refers to the ancestors and children of a term. | ||
# Study the information available on that page and through the tabs on the page, especially the graph view. | # Study the information available on that page and through the tabs on the page, especially the graph view. | ||
− | # In the '''Inferred Tree View''' tab, find the | + | # In the '''Inferred Tree View''' tab, find the ancestor node "GO:0000082 G1/S transition of mitotic cell cycle" and follow it. On the associations page, go again to '''Annotations''' and filter the list to ''S. cerevisiae'' genes. As of today there are 137 annotated genes. '''Are these genes specifically annotated to that term, or does the list include genes that are annotated to descendants of the term?''' |
}} | }} | ||
− | |||
{{Vspace}} | {{Vspace}} | ||
==GO Slims== | ==GO Slims== | ||
− | + | {{Smallvspace}} | |
− | http://www.geneontology.org/page/go-slim-and-subset-guide | + | GO is large and very detailed and the need for somehwat more high-level descriptions in model organisms is met by the [http://www.geneontology.org/page/go-slim-and-subset-guide '''GoSlim'''] datasets that are curated by some of the main model-organism databases and consortia. Read more what that is about. |
{{Vspace}} | {{Vspace}} | ||
Line 238: | Line 208: | ||
== Further reading, links and resources == | == Further reading, links and resources == | ||
+ | |||
{{#pmid: 22102568}} | {{#pmid: 22102568}} | ||
{{#pmid: 21779995}} | {{#pmid: 21779995}} | ||
+ | {{#pmid: 21330331}} | ||
+ | {{#pmid: 19957156}} | ||
{{#pmid: 19920128}} | {{#pmid: 19920128}} | ||
Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2... | Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2... | ||
{{#pmid: 18629186}} | {{#pmid: 18629186}} | ||
+ | {{#pmid: 18563371}} | ||
+ | {{#pmid: 18287709}} | ||
{{Smallvspace}} | {{Smallvspace}} | ||
+ | {{#pmid: 15808743}} | ||
{{#pmid: 10679470}} | {{#pmid: 10679470}} | ||
− | |||
Line 309: | Line 284: | ||
:2017-08-05 | :2017-08-05 | ||
<b>Modified:</b><br /> | <b>Modified:</b><br /> | ||
− | :2017- | + | :2017-11-12 |
<b>Version:</b><br /> | <b>Version:</b><br /> | ||
− | :0 | + | :1.0 |
<b>Version history:</b><br /> | <b>Version history:</b><br /> | ||
+ | *1.0 First live | ||
*0.1 First stub | *0.1 First stub | ||
</div> | </div> |
Revision as of 21:54, 12 November 2017
Gene Ontology
Keywords: Ontologies in knowledge engineering, GO and GOA
Contents
This unit is under development. There is some contents here but it is incomplete and/or may change significantly: links may lead to nowhere, the contents is likely going to be rearranged, and objectives, deliverables etc. may be incomplete or missing. Do not work with this material until it is updated to "live" status.
Abstract
Introduction to the Gene Ontology (GO) and Gene Ontology Annotations (GOA).
This unit ...
Prerequisites
You need to complete the following units before beginning this one:
Objectives
This unit will ...
- ... introduce Go and associated data and services;
Outcomes
After working through this unit you ...
- ... are familar with the concept of an ontology and the terms and ontologies of the GO project;
- ... can search for a gene of interest, identify associations, evaluate the term graph, find relevant ancestor nodes in the inferred tree, and discover proteins with related function.
Deliverables
- Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
- Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
- Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.
Evaluation
Evaluation: NA
- This unit is not evaluated for course marks.
Contents
Introduction
The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.
Task:
- Read the introductory notes on the Gene Ontology project to define and annotate gene function.
- Browse through the papar describing the 2017 update on the GO database and tools; in particular take note of the LEGO initiative that aims to build systems and pathway models by combining suitable GO terms.
The Gene Ontology Consortium (2017) Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res 45:D331-D338. (pmid: 27899567) |
Remember the three separate component ontologies of GO:
- Molecular function
- Biological Process
- Cellular component
GO evidence codes
Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold below: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions. The codes are ubiquitous and important, you need to know what they mean and imply when working with GOA data.
- Automatically-assigned Evidence Codes
- IEA: Inferred from Electronic Annotation
- Curator-assigned Evidence Codes
- Experimental Evidence Codes
- EXP: Inferred from Experiment
- IDA: Inferred from Direct Assay
- IPI: Inferred from Physical Interaction
- IMP: Inferred from Mutant Phenotype
- IEP: Inferred from Expression Pattern
- IGI: Inferred from Genetic Interaction
- Computational Analysis Evidence Codes
- ISS: Inferred from Sequence or Structural Similarity
- ISO: Inferred from Sequence Orthology
- ISA: Inferred from Sequence Alignment
- ISM: Inferred from Sequence Model
- IGC: Inferred from Genomic Context
- IBA: Inferred from Biological aspect of Ancestor
- IBD: Inferred from Biological aspect of Descendant
- IKR: Inferred from Key Residues
- IRD: Inferred from Rapid Divergence
- RCA: inferred from Reviewed Computational Analysis
- Author Statement Evidence Codes
- TAS: Traceable Author Statement
- NAS: Non-traceable Author Statement
- Curator Statement Evidence Codes
- IC: Inferred by Curator
- ND: No biological Data available
For further details, see the Guide to GO Evidence Codes and the GO Evidence Code Decision Tree.
GO tools
For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. Bioconducter has a large number of packages that supply and analyze GO and GOA data.
AmiGO
AmiGO is a convenient online GO browser developed by the Gene Ontology consortium and hosted on their website.
AmiGO - Gene products
Task:
- Navigate to the GO homepage.
- Enter
Mbp1
into the search box to initiate a search for the yeast Mbp1 transcription factor (as gene or protein name). - There are a three catgories of hits - Ontology terms directly associated with the search string, Genes and gene products annoted to terms in GOA, and Annotations of terms to any of the genes. As usual, we need to be wary of keyword searches since they rarely identify a unique gene, so we check the Genes... category first. Follow the link.
- From the table you find you can easily identify the correct gene. Follow its link to the associated Gene Information page. Study the information on that page.
- Note that this page lists Associations - i.e. GO terms that haven been associated with Mbp1 in GOA.
AmiGO - Associations
GO annotations for a protein are called associations.
Task:
- Expand the reuslt count to show all Associations on the page. Note the evidence codes. Also note that there are two associations referring to a negative experiment, with the red annotation qualifier NOT. This means an experiment has demonstrated the gene not to have that activity.
- Note that you can expand the left hand menu for detailed filtering. Click on Ontology (aspect) to display or undisplay the terms for the three different component ontologies - the GO "aspects": F, C, and P (what were these again?).
- The most specifc annotation on the page seems to be "positive regulation of transcription involved in G1/S transition of mitotic cell cycle". Follow the link.
- Note that you can now filter for organisms. Restrict the organism to Saccaromyces cerevisiae S288C by clicking on the green (+) sign. Note that you now see all yeast genes that are annotated to this term! This is an effective way to build system membership information from the bottom up.
- There are a number of tabs available for different views on the data: Annotations, Graph Views, Inferred Tree View, Neigborhood, and Mappings. Visit them.
- The link to QuickGo from the Graph Views tab gives you the entire ancestor chart of the term, with clicakble term nodes. You need to consider the ancestor terms to expand searches for related, collaborating genes. For example, if a term is annotaed with "positive regulation ...", you will need to consider genes associete to the cognate "negative regulation ..." or just "regulation ..." terms as well to get a complete picture of the gene's activities.
- Neigborhood refers to the ancestors and children of a term.
- Study the information available on that page and through the tabs on the page, especially the graph view.
- In the Inferred Tree View tab, find the ancestor node "GO:0000082 G1/S transition of mitotic cell cycle" and follow it. On the associations page, go again to Annotations and filter the list to S. cerevisiae genes. As of today there are 137 annotated genes. Are these genes specifically annotated to that term, or does the list include genes that are annotated to descendants of the term?
GO Slims
GO is large and very detailed and the need for somehwat more high-level descriptions in model organisms is met by the GoSlim datasets that are curated by some of the main model-organism databases and consortia. Read more what that is about.
Further reading, links and resources
Gene Ontology Consortium (2012) The Gene Ontology: enhancements for 2011. Nucleic Acids Res 40:D559-64. (pmid: 22102568) |
Bastos et al. (2011) Application of gene ontology to gene identification. Methods Mol Biol 760:141-57. (pmid: 21779995) |
du Plessis et al. (2011) The what, where, how and why of gene ontology--a primer for bioinformaticians. Brief Bioinformatics 12:723-35. (pmid: 21330331) |
Hackenberg & Matthiesen (2010) Algorithms and methods for correlating experimental results with annotation databases. Methods Mol Biol 593:315-40. (pmid: 19957156) |
Gene Ontology Consortium (2010) The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res 38:D331-5. (pmid: 19920128) |
Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2...
Goble & Wroe (2004) The Montagues and the Capulets. Comp Funct Genomics 5:623-32. (pmid: 18629186) |
Harris (2008) Developing an ontology. Methods Mol Biol 452:111-24. (pmid: 18563371) |
Dimmer et al. (2007) Methods for gene ontology annotation. Methods Mol Biol 406:495-520. (pmid: 18287709) |
Aravind et al. (2005) The many faces of the helix-turn-helix domain: transcription regulation and beyond. FEMS Microbiol Rev 29:231-62. (pmid: 15808743) |
Gajiwala & Burley (2000) Winged helix proteins. Curr Opin Struct Biol 10:110-6. (pmid: 10679470) |
Notes
Self-evaluation
If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2017-11-12
Version:
- 1.0
Version history:
- 1.0 First live
- 0.1 First stub
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.