Difference between revisions of "CSB Ontologies"
m (→GO) |
m |
||
Line 60: | Line 60: | ||
The following evidence codes are the most important: | The following evidence codes are the most important: | ||
+ | ;Automatically-assigned Evidence Codes | ||
+ | *IEA: Inferred from Electronic Annotation | ||
+ | ;Curator-assigned Evidence Codes | ||
+ | *Experimental Evidence Codes | ||
+ | **EXP: Inferred from Experiment | ||
+ | **IDA: Inferred from Direct Assay | ||
+ | **IPI: Inferred from Physical Interaction | ||
+ | **IMP: Inferred from Mutant Phenotype | ||
+ | **IGI: Inferred from Genetic Interaction | ||
+ | **IEP: Inferred from Expression Pattern | ||
+ | *Computational Analysis Evidence Codes | ||
+ | **ISS: Inferred from Sequence or Structural Similarity | ||
+ | **ISO: Inferred from Sequence Orthology | ||
+ | **ISA: Inferred from Sequence Alignment | ||
+ | **ISM: Inferred from Sequence Model | ||
+ | **IGC: Inferred from Genomic Context | ||
+ | **IBA: Inferred from Biological aspect of Ancestor | ||
+ | **IBD: Inferred from Biological aspect of Descendant | ||
+ | **IKR: Inferred from Key Residues | ||
+ | **IRD: Inferred from Rapid Divergence | ||
+ | **RCA: inferred from Reviewed Computational Analysis | ||
+ | *Author Statement Evidence Codes | ||
+ | **TAS: Traceable Author Statement | ||
+ | **NAS: Non-traceable Author Statement | ||
+ | *Curator Statement Evidence Codes | ||
+ | **IC: Inferred by Curator | ||
+ | **ND: No biological Data available | ||
+ | For further details, see the [http://www.geneontology.org/GO.evidence.shtml GUide to GO Evidence Codes] and the [http://www.geneontology.org/GO.evidence.tree.shtml GO Evidence Code Decision Tree]. | ||
Line 104: | Line 132: | ||
− | Finally, we'll use the GO's AmiGO | + | Finally, we'll use the GO's AmiGO br |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Revision as of 22:02, 23 January 2012
Ontologies for Computational Systems Biology
Poorly structured data can be integrated via ontologies. This is especially important for phenotype and "function" data. The primary example is the Gene Ontology (GO). Other examples include the Disease Ontology, OMIM and WikiGene.
Contents
Introduction
...
GO
The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.
GO: the Gene Ontology project [ link ] [ page ] Ontologies are important tools to organize and compute with non-standardized information, such as gene annotations. The Gene Ontology project (GO) constructs ontologies for gene and gene product attributes across numerous species. Three major ontologies are being developed: molecular process, biological function and cellular location. Each includes terms, their definition, and their relationships. In addition, genes and gene products are being been annotated with their GO terms and the type of evidence that underlies the annotation. A number of tools such as the AmiGO browser are available to analyse relationships, construct ontologies and curate annotations. Data can be freely downloaded in formats that are convenient for computation. |
The GO actually comprises three separate ontologies:
- Molecular function
- ...
- Biological Process
- ...
- Cellular component
- ...
GO terms
GO terms comprise the core of the information in the ontology: a carefully crafted definition of a term in any of GO's separate ontologies.
GO relationships
The nature of the relationships is as much a part of the ontology as the terms themselves. GO uses three categories of relationships:
- is a
- part of
- regulates
GO annotations
The GO terms are conceptual in nature, and while they represent our interpretation of biological phenomena, they do not intrinsically represent biological objects, such a specific genes or proteins. In order to link molecules with these concepts, the ontology is used to annotate genes. The annotation project is referred to as GOA.
GO evidence codes
Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
The following evidence codes are the most important:
- Automatically-assigned Evidence Codes
- IEA: Inferred from Electronic Annotation
- Curator-assigned Evidence Codes
- Experimental Evidence Codes
- EXP: Inferred from Experiment
- IDA: Inferred from Direct Assay
- IPI: Inferred from Physical Interaction
- IMP: Inferred from Mutant Phenotype
- IGI: Inferred from Genetic Interaction
- IEP: Inferred from Expression Pattern
- Computational Analysis Evidence Codes
- ISS: Inferred from Sequence or Structural Similarity
- ISO: Inferred from Sequence Orthology
- ISA: Inferred from Sequence Alignment
- ISM: Inferred from Sequence Model
- IGC: Inferred from Genomic Context
- IBA: Inferred from Biological aspect of Ancestor
- IBD: Inferred from Biological aspect of Descendant
- IKR: Inferred from Key Residues
- IRD: Inferred from Rapid Divergence
- RCA: inferred from Reviewed Computational Analysis
- Author Statement Evidence Codes
- TAS: Traceable Author Statement
- NAS: Non-traceable Author Statement
- Curator Statement Evidence Codes
- IC: Inferred by Curator
- ND: No biological Data available
For further details, see the GUide to GO Evidence Codes and the GO Evidence Code Decision Tree.
GO tools
For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. For details, see Computing with GO on this wiki.
Introductory reading
Exercises
- Computing semantic similarity for gene-pairs
- A: Gene identifiers
- Navigate to the Saccharomyces Genome Database and search for the gene name mbp1 using the search box. Review the information available on the result page. Find, and note down the UniProt ID.
- For comparison, review the gene information of the functionally related human E2F1 transcription factor at the NCBI. Here too, find, and note down the UniProt ID.
- To compare functional similarity, find the IDs of a protein of related, and of unrelated function in Uniprot.
- Find the UniProt ID of E2F1's human interaction partner TFDP1, which we would expect to be annotated as functionally similar to both E2F1 and MBP1;
- also find the UniProt ID of human MBP (myelin basic protein), which is functionally unrelated.
- B: Semantic similarity scores
Next, we compute the semantic similarity of these two genes. The GO database lists a number of tools for this task (http://www.geneontology.org/GO.tools_by_type.semantic_similarity.shtml).
- Navigate to the ProteInOn site at Lisbon University in Portugal - the online tool to compute GO-based semantic similarity that was discussed in last weeks reading assignment. Select "compute protein semantic similarity", use "Measure: simGIC" and "GO type: Biological process". Enter your four UniProt IDs in the correct format and run the computation.
- Interpret the similarity score table. Does it correspond to your expectations?
- C: Graphical view of the ontology
Finally, we'll use the GO's AmiGO br