Difference between revisions of "CSB Ontologies"

From "A B C"
Jump to navigation Jump to search
m
Line 19: Line 19:
  
 
==GO==
 
==GO==
The Gene Ontology.
+
The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.
  
 +
{{WWW_resource|WWW_GO}}
 +
 +
 +
The GO actually comprises three separate ontologies:
 +
 +
;Molecular function
 +
:...
 +
 +
 +
;Biological Process
 +
:...
 +
 +
 +
;Cellular component:
 +
: ...
 +
 +
 +
===GO terms===
 +
GO terms comprise the core of the information in the ontology: a carefully crafted definition of a term in any of GO's separate ontologies.
 +
 +
 +
 +
===GO relationships===
 +
The nature of the relationships is as much a part of the ontology as the terms themselves. GO uses three categories of relationships:
 +
 +
* is a
 +
* part of
 +
* regulates
 +
 +
 +
===GO annotations===
 +
The GO terms are conceptual in nature, and while they represent our interpretation of biological phenomena, they do not intrinsically represent biological objects, such a specific genes or proteins. In order to link molecules with these concepts, the ontology is used to '''annotate''' genes. The annotation project is referred to as GOA.
 +
 +
 +
===GO evidence codes===
 +
Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
 +
 +
The following evidence codes are the most important:
 +
 +
 +
 +
 +
===GO tools===
 +
 +
For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. For details, see [[Computing with GO]] on this wiki.
  
 
==Introductory reading==
 
==Introductory reading==

Revision as of 17:00, 23 January 2012

Ontologies for Computational Systems Biology


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


Poorly structured data can be integrated via ontologies. This is especially important for phenotype and "function" data. The primary example is the Gene Ontology (GO). Other examples include the Disease Ontology, OMIM and WikiGene.



Introduction

...


GO

The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.

Template:WWW resource


The GO actually comprises three separate ontologies:

Molecular function
...


Biological Process
...


Cellular component
...


GO terms

GO terms comprise the core of the information in the ontology: a carefully crafted definition of a term in any of GO's separate ontologies.


GO relationships

The nature of the relationships is as much a part of the ontology as the terms themselves. GO uses three categories of relationships:

  • is a
  • part of
  • regulates


GO annotations

The GO terms are conceptual in nature, and while they represent our interpretation of biological phenomena, they do not intrinsically represent biological objects, such a specific genes or proteins. In order to link molecules with these concepts, the ontology is used to annotate genes. The annotation project is referred to as GOA.


GO evidence codes

Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.

The following evidence codes are the most important:



GO tools

For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. For details, see Computing with GO on this wiki.

Introductory reading



Exercises

Computing semantic similarity for gene-pairs
A: Gene identifiers


  1. Navigate to the Saccharomyces Genome Database and search for the gene name mbp1 using the search box. Review the information available on the result page. Find, and note down the UniProt ID.
  2. For comparison, review the gene information of the functionally related human E2F1 transcription factor at the NCBI. Here too, find, and note down the UniProt ID.
  3. To compare functional similarity, find the IDs of a protein of related, and of unrelated function in Uniprot.
    1. Find the UniProt ID of E2F1's human interaction partner TFDP1, which we would expect to be annotated as functionally similar to both E2F1 and MBP1;
    2. also find the UniProt ID of human MBP (myelin basic protein), which is functionally unrelated.
B: Semantic similarity scores


Next, we compute the semantic similarity of these two genes. The GO database lists a number of tools for this task (http://www.geneontology.org/GO.tools_by_type.semantic_similarity.shtml).

  1. Navigate to the ProteInOn site at Lisbon University in Portugal - the online tool to compute GO-based semantic similarity that was discussed in last weeks reading assignment. Select "compute protein semantic similarity", use "Measure: simGIC" and "GO type: Biological process". Enter your four UniProt IDs in the correct format and run the computation.
  2. Interpret the similarity score table. Does it correspond to your expectations?


C: Graphical view of the ontology


Finally, we'll use the GO's AmiGO browser to compare the genes graphically.


  1. Navigate to the AmiGO search interface, select "genes or proteins" and enter MBP1. Filter the results by the correct species and restrict the reults to the biological process ontology.
  2. This should return the GO annotation page for the yeast Mbp1 protein. Follw the "5 term associations" in the header bar.
  3. Click on "view in tree" for the GO term GO:0000083.
  4. This shows you the ontology of the term in text form, including the number of genes annotated to each term. In the right hand box you should find a link that you can follow for a graphical view.
  5. In a separate window, repeat the process for human E2F1 (choose the most specific term, i.e. the one that refers to the gene's role in the G1/S transition - GO:0000082).
  6. Roughly compare the two ontologies.
  7. Contrast this with the ontology for human MBP, specifically the axon ensheathment process.


References



Further reading and resources