Difference between revisions of "CSB Ontologies"

From "A B C"
Jump to navigation Jump to search
m
Line 58: Line 58:
 
Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
 
Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
  
The following evidence codes are the most important:
+
The following evidence codes are in current use; an analysis that wanted to exclude inferred anotations would restirct the codes it uses to the ones shown in bold: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions.
  
 
;Automatically-assigned Evidence Codes
 
;Automatically-assigned Evidence Codes
 
*IEA: Inferred from Electronic Annotation
 
*IEA: Inferred from Electronic Annotation
 
;Curator-assigned Evidence Codes
 
;Curator-assigned Evidence Codes
*Experimental Evidence Codes
+
*<b>Experimental Evidence Codes
 
**EXP: Inferred from Experiment
 
**EXP: Inferred from Experiment
 
**IDA: Inferred from Direct Assay
 
**IDA: Inferred from Direct Assay
Line 69: Line 69:
 
**IMP: Inferred from Mutant Phenotype
 
**IMP: Inferred from Mutant Phenotype
 
**IGI: Inferred from Genetic Interaction
 
**IGI: Inferred from Genetic Interaction
**IEP: Inferred from Expression Pattern
+
**IEP: Inferred from Expression Pattern</b>
 
*Computational Analysis Evidence Codes
 
*Computational Analysis Evidence Codes
 
**ISS: Inferred from Sequence or Structural Similarity
 
**ISS: Inferred from Sequence or Structural Similarity
Line 88: Line 88:
 
**ND: No biological Data available
 
**ND: No biological Data available
  
For further details, see the [http://www.geneontology.org/GO.evidence.shtml GUide to GO Evidence Codes] and the [http://www.geneontology.org/GO.evidence.tree.shtml GO Evidence Code Decision Tree].
+
For further details, see the [http://www.geneontology.org/GO.evidence.shtml Guide to GO Evidence Codes] and the [http://www.geneontology.org/GO.evidence.tree.shtml GO Evidence Code Decision Tree].
  
 +
 +
&nbsp;
  
 
===GO tools===
 
===GO tools===

Revision as of 22:06, 23 January 2012

Ontologies for Computational Systems Biology


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


Poorly structured data can be integrated via ontologies. This is especially important for phenotype and "function" data. The primary example is the Gene Ontology (GO). Other examples include the Disease Ontology, OMIM and WikiGene.



Introduction

...


GO

The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.

GO: the Gene Ontology project


link ] [ page ] Ontologies are important tools to organize and compute with non-standardized information, such as gene annotations. The Gene Ontology project (GO) constructs ontologies for gene and gene product attributes across numerous species. Three major ontologies are being developed: molecular process, biological function and cellular location. Each includes terms, their definition, and their relationships. In addition, genes and gene products are being been annotated with their GO terms and the type of evidence that underlies the annotation. A number of tools such as the AmiGO browser are available to analyse relationships, construct ontologies and curate annotations. Data can be freely downloaded in formats that are convenient for computation.
size=200px


The GO actually comprises three separate ontologies:

Molecular function
...


Biological Process
...


Cellular component
...


GO terms

GO terms comprise the core of the information in the ontology: a carefully crafted definition of a term in any of GO's separate ontologies.


GO relationships

The nature of the relationships is as much a part of the ontology as the terms themselves. GO uses three categories of relationships:

  • is a
  • part of
  • regulates


GO annotations

The GO terms are conceptual in nature, and while they represent our interpretation of biological phenomena, they do not intrinsically represent biological objects, such a specific genes or proteins. In order to link molecules with these concepts, the ontology is used to annotate genes. The annotation project is referred to as GOA.


GO evidence codes

Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.

The following evidence codes are in current use; an analysis that wanted to exclude inferred anotations would restirct the codes it uses to the ones shown in bold: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions.

Automatically-assigned Evidence Codes
  • IEA: Inferred from Electronic Annotation
Curator-assigned Evidence Codes
  • Experimental Evidence Codes
    • EXP: Inferred from Experiment
    • IDA: Inferred from Direct Assay
    • IPI: Inferred from Physical Interaction
    • IMP: Inferred from Mutant Phenotype
    • IGI: Inferred from Genetic Interaction
    • IEP: Inferred from Expression Pattern
  • Computational Analysis Evidence Codes
    • ISS: Inferred from Sequence or Structural Similarity
    • ISO: Inferred from Sequence Orthology
    • ISA: Inferred from Sequence Alignment
    • ISM: Inferred from Sequence Model
    • IGC: Inferred from Genomic Context
    • IBA: Inferred from Biological aspect of Ancestor
    • IBD: Inferred from Biological aspect of Descendant
    • IKR: Inferred from Key Residues
    • IRD: Inferred from Rapid Divergence
    • RCA: inferred from Reviewed Computational Analysis
  • Author Statement Evidence Codes
    • TAS: Traceable Author Statement
    • NAS: Non-traceable Author Statement
  • Curator Statement Evidence Codes
    • IC: Inferred by Curator
    • ND: No biological Data available

For further details, see the Guide to GO Evidence Codes and the GO Evidence Code Decision Tree.


 

GO tools

For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. For details, see Computing with GO on this wiki.

Introductory reading



Exercises

Computing semantic similarity for gene-pairs
A: Gene identifiers


  1. Navigate to the Saccharomyces Genome Database and search for the gene name mbp1 using the search box. Review the information available on the result page. Find, and note down the UniProt ID.
  2. For comparison, review the gene information of the functionally related human E2F1 transcription factor at the NCBI. Here too, find, and note down the UniProt ID.
  3. To compare functional similarity, find the IDs of a protein of related, and of unrelated function in Uniprot.
    1. Find the UniProt ID of E2F1's human interaction partner TFDP1, which we would expect to be annotated as functionally similar to both E2F1 and MBP1;
    2. also find the UniProt ID of human MBP (myelin basic protein), which is functionally unrelated.
B: Semantic similarity scores


Next, we compute the semantic similarity of these two genes. The GO database lists a number of tools for this task (http://www.geneontology.org/GO.tools_by_type.semantic_similarity.shtml).

  1. Navigate to the ProteInOn site at Lisbon University in Portugal - the online tool to compute GO-based semantic similarity that was discussed in last weeks reading assignment. Select "compute protein semantic similarity", use "Measure: simGIC" and "GO type: Biological process". Enter your four UniProt IDs in the correct format and run the computation.
  2. Interpret the similarity score table. Does it correspond to your expectations?


C: Graphical view of the ontology


Finally, we'll use the GO's AmiGO br