Difference between revisions of "CSB Ontologies"

Revision as of 22:06, 23 January 2012

Ontologies for Computational Systems Biology

This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.

Poorly structured data can be integrated via ontologies. This is especially important for phenotype and "function" data. The primary example is the Gene Ontology (GO). Other examples include the Disease Ontology, OMIM and WikiGene.

Introduction

...

GO

The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.

GO: the Gene Ontology project

[ link ] [ page ] Ontologies are important tools to organize and compute with non-standardized information, such as gene annotations. The Gene Ontology project (GO) constructs ontologies for gene and gene product attributes across numerous species. Three major ontologies are being developed: molecular process, biological function and cellular location. Each includes terms, their definition, and their relationships. In addition, genes and gene products are being been annotated with their GO terms and the type of evidence that underlies the annotation. A number of tools such as the AmiGO browser are available to analyse relationships, construct ontologies and curate annotations. Data can be freely downloaded in formats that are convenient for computation.

The GO actually comprises three separate ontologies:

Molecular function: ...

Biological Process: ...

Cellular component: ...

GO terms

GO terms comprise the core of the information in the ontology: a carefully crafted definition of a term in any of GO's separate ontologies.

GO relationships

The nature of the relationships is as much a part of the ontology as the terms themselves. GO uses three categories of relationships:

is a
part of
regulates

GO annotations

The GO terms are conceptual in nature, and while they represent our interpretation of biological phenomena, they do not intrinsically represent biological objects, such a specific genes or proteins. In order to link molecules with these concepts, the ontology is used to annotate genes. The annotation project is referred to as GOA.

GO evidence codes

Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.

The following evidence codes are in current use; an analysis that wanted to exclude inferred anotations would restirct the codes it uses to the ones shown in bold: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions.

Automatically-assigned Evidence Codes

IEA: Inferred from Electronic Annotation

Curator-assigned Evidence Codes

Experimental Evidence Codes
- IEP: Inferred from Expression Pattern
Computational Analysis Evidence Codes
- ISS: Inferred from Sequence or Structural Similarity
- ISO: Inferred from Sequence Orthology
- ISA: Inferred from Sequence Alignment
- ISM: Inferred from Sequence Model
- IGC: Inferred from Genomic Context
- IBA: Inferred from Biological aspect of Ancestor
- IBD: Inferred from Biological aspect of Descendant
- IKR: Inferred from Key Residues
- IRD: Inferred from Rapid Divergence
- RCA: inferred from Reviewed Computational Analysis
Author Statement Evidence Codes
- TAS: Traceable Author Statement
- NAS: Non-traceable Author Statement
Curator Statement Evidence Codes
- IC: Inferred by Curator
- ND: No biological Data available

For further details, see the Guide to GO Evidence Codes and the GO Evidence Code Decision Tree.

GO tools

For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. For details, see Computing with GO on this wiki.

Introductory reading

Exercises

Computing semantic similarity for gene-pairs: A: Gene identifiers

Navigate to the Saccharomyces Genome Database and search for the gene name mbp1 using the search box. Review the information available on the result page. Find, and note down the UniProt ID.
For comparison, review the gene information of the functionally related human E2F1 transcription factor at the NCBI. Here too, find, and note down the UniProt ID.
To compare functional similarity, find the IDs of a protein of related, and of unrelated function in Uniprot.
1. Find the UniProt ID of E2F1's human interaction partner TFDP1, which we would expect to be annotated as functionally similar to both E2F1 and MBP1;
2. also find the UniProt ID of human MBP (myelin basic protein), which is functionally unrelated.

B: Semantic similarity scores

Next, we compute the semantic similarity of these two genes. The GO database lists a number of tools for this task (http://www.geneontology.org/GO.tools_by_type.semantic_similarity.shtml).

Navigate to the ProteInOn site at Lisbon University in Portugal - the online tool to compute GO-based semantic similarity that was discussed in last weeks reading assignment. Select "compute protein semantic similarity", use "Measure: simGIC" and "GO type: Biological process". Enter your four UniProt IDs in the correct format and run the computation.
Interpret the similarity score table. Does it correspond to your expectations?

C: Graphical view of the ontology

Finally, we'll use the GO's AmiGO br

@@ Line 58: / Line 58: @@
 Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
-The following evidence codes are the most important:
+The following evidence codes are in current use; an analysis that wanted to exclude inferred anotations would restirct the codes it uses to the ones shown in bold: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions.
 ;Automatically-assigned Evidence Codes
 *IEA: Inferred from Electronic Annotation
 ;Curator-assigned Evidence Codes
-*Experimental Evidence Codes
+*<b>Experimental Evidence Codes
 **EXP: Inferred from Experiment
 **IDA: Inferred from Direct Assay
@@ Line 69: / Line 69: @@
 **IMP: Inferred from Mutant Phenotype
 **IGI: Inferred from Genetic Interaction
-**IEP: Inferred from Expression Pattern
+**IEP: Inferred from Expression Pattern</b>
 *Computational Analysis Evidence Codes
 **ISS: Inferred from Sequence or Structural Similarity
@@ Line 88: / Line 88: @@
 **ND: No biological Data available
-For further details, see the [http://www.geneontology.org/GO.evidence.shtml GUide to GO Evidence Codes] and the [http://www.geneontology.org/GO.evidence.tree.shtml GO Evidence Code Decision Tree].
+For further details, see the [http://www.geneontology.org/GO.evidence.shtml Guide to GO Evidence Codes] and the [http://www.geneontology.org/GO.evidence.tree.shtml GO Evidence Code Decision Tree].
+&nbsp;
 ===GO tools===

Difference between revisions of "CSB Ontologies"

Revision as of 22:06, 23 January 2012

Contents

Introduction

GO

GO terms

GO relationships

GO annotations

GO evidence codes

GO tools

Introductory reading

Exercises

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools