Difference between revisions of "CSB Ontologies"

From "A B C"
Jump to navigation Jump to search
 
(25 intermediate revisions by the same user not shown)
Line 15: Line 15:
  
 
==Introduction==
 
==Introduction==
...
 
  
 +
{{#pmid: 18563371}}
 +
{{#pmid: 19957156}}
  
 
==GO==
 
==GO==
Line 22: Line 23:
  
 
{{WWW|WWW_GO}}
 
{{WWW|WWW_GO}}
 
+
{{#pmid: 21330331}}
  
 
The GO actually comprises three separate ontologies:
 
The GO actually comprises three separate ontologies:
Line 53: Line 54:
 
===GO annotations===
 
===GO annotations===
 
The GO terms are conceptual in nature, and while they represent our interpretation of biological phenomena, they do not intrinsically represent biological objects, such a specific genes or proteins. In order to link molecules with these concepts, the ontology is used to '''annotate''' genes. The annotation project is referred to as GOA.
 
The GO terms are conceptual in nature, and while they represent our interpretation of biological phenomena, they do not intrinsically represent biological objects, such a specific genes or proteins. In order to link molecules with these concepts, the ontology is used to '''annotate''' genes. The annotation project is referred to as GOA.
 +
 +
{{#pmid:18287709}}
  
  
Line 58: Line 61:
 
Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
 
Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.
  
The following evidence codes are in current use; an analysis that wanted to exclude inferred anotations would restirct the codes it uses to the ones shown in bold: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions.
+
The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions.
  
 
;Automatically-assigned Evidence Codes
 
;Automatically-assigned Evidence Codes
 
*IEA: Inferred from Electronic Annotation
 
*IEA: Inferred from Electronic Annotation
 
;Curator-assigned Evidence Codes
 
;Curator-assigned Evidence Codes
*<b>Experimental Evidence Codes
+
*'''Experimental Evidence Codes'''
 
**EXP: Inferred from Experiment
 
**EXP: Inferred from Experiment
 
**IDA: Inferred from Direct Assay
 
**IDA: Inferred from Direct Assay
Line 70: Line 73:
 
**IGI: Inferred from Genetic Interaction
 
**IGI: Inferred from Genetic Interaction
 
**IEP: Inferred from Expression Pattern</b>
 
**IEP: Inferred from Expression Pattern</b>
*Computational Analysis Evidence Codes
+
*'''Computational Analysis Evidence Codes'''
 
**ISS: Inferred from Sequence or Structural Similarity
 
**ISS: Inferred from Sequence or Structural Similarity
 
**ISO: Inferred from Sequence Orthology
 
**ISO: Inferred from Sequence Orthology
Line 81: Line 84:
 
**IRD: Inferred from Rapid Divergence
 
**IRD: Inferred from Rapid Divergence
 
**RCA: inferred from Reviewed Computational Analysis
 
**RCA: inferred from Reviewed Computational Analysis
*Author Statement Evidence Codes
+
*'''Author Statement Evidence Codes'''
 
**TAS: Traceable Author Statement
 
**TAS: Traceable Author Statement
 
**NAS: Non-traceable Author Statement
 
**NAS: Non-traceable Author Statement
*Curator Statement Evidence Codes
+
*'''Curator Statement Evidence Codes'''
 
**IC: Inferred by Curator
 
**IC: Inferred by Curator
 
**ND: No biological Data available
 
**ND: No biological Data available
Line 104: Line 107:
 
==Exercises==
 
==Exercises==
 
<section begin=exercises />
 
<section begin=exercises />
;Computing semantic similarity for gene-pairs
 
:'''A: Gene identifiers'''
 
  
 +
In this set of exercises we dive into practical work with GO: at first via the AmiGO browser, and then via bioconductor.
 +
 +
 +
===AmiGO===
  
# Navigate to the [http://www.yeastgenome.org/ ''Saccharomyces'' Genome Database] and search for the gene name '''mbp1''' using the search box. Review the information available on the result page. Find, and note down the UniProt ID.
+
[http://amigo.geneontology.org/cgi-bin/amigo/go.cgi '''AmiGO'''] is a [http://www.geneontology.org/ '''GO'''] browser developed by the Gene Ontology consortium and hosted on their website.
# For comparison, review the gene information of the functionally related human [http://www.ncbi.nlm.nih.gov/gene/1869 E2F1 transcription factor] at the NCBI. Here too, find, and note down the UniProt ID.
 
# To compare functional similarity, find the IDs of a protein of related, and of unrelated function in Uniprot.
 
## Find the UniProt ID of E2F1's human interaction partner TFDP1, which we would expect to be annotated as functionally similar to both E2F1 and MBP1;
 
## also find the UniProt ID  of human MBP (myelin basic protein), which is functionally unrelated.
 
  
 +
====AmiGO - Gene products====
 +
{{task|1=
 +
# Navigate to the [http://www.geneontology.org/ '''GO'''] homepage.
 +
# Enter <code>SOX2</code> into the search box to initiate a search for the human SOX2 transcription factor ({{WP|SOX2|WP}}, [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=11195 HUGO]) (as ''gene or protein name'').
 +
# There are a number of hits in various organisms: ''sulfhydryl oxidases'' and ''(sex determining region Y)-box'' genes. Check to see the various ways by which you could filter and restrict the results.
 +
# Select ''Homo sapiens'' as the '''species''' filter and set the filter. Note that this still does not give you a unique hit, but ...
 +
# ... you can identify the '''[http://amigo.geneontology.org/cgi-bin/amigo/gp-details.cgi?gp=UniProtKB:P48431 Transcription factor SOX-2]''' and follow its gene product information link. Study the information on that page.
 +
# Later, we will need Entrez Gene IDs. The GOA information page provides these as '''GeneID''' in the '''External references''' section. Note it down.  With the same approach, find and record the Gene IDs (''a'') of the functionally related [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=9221 Oct4 (POU5F1)] protein, (''b'') the human cell-cycle transcription factor [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=3113 E2F1], (''c'') the human bone morphogenetic protein-4 transforming growth factor [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=1071 BMP4], (''d'') the human UDP glucuronosyltransferase 1 family protein 1, an enzyme that is differentially expressed in some cancers, [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=12530 UGT1A1], and (''d'') as a positive control, SOX2's interaction partner [http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=20857 NANOG], which we would expect to be annotated as functionally similar to both Oct4 and SOX2.
 +
}}
 
<!--
 
<!--
UniProt IDS:
+
SOX2: 6657
  Mbp1: P39678
+
POU5F1: 5460
  E2F1: Q01094
+
E2F1: 1869
  TFDP1: Q14186
+
BMP4: 652
  MBP: P02686
+
UGT1A1: 54658
  P39678, Q01094, Q14186, P02686
+
NANOG: 79923
 +
 
 +
mgeneSim(c("6657", "5460", "1869", "652", "54658", "79923"), ont="BP", organism="human", measure="Wang")
 +
 
 
-->
 
-->
:'''B: Semantic similarity scores'''
+
 
 +
====AmiGO - Associations====
 +
GO annotations for a protein are called ''associations''.
 +
 
 +
{{task|1=
 +
# Open the ''associations'' information page for the human SOX2 protein via the [http://amigo.geneontology.org/cgi-bin/amigo/gp-assoc.cgi?gp=UniProtKB:P48431 link in the right column] in a separate tab. Study the information on that page.
 +
# Note that you can filter the associations by ontology and evidence code. You have read about the three GO ontologies in your previous assignment, but you should also be familiar with the evidence codes. Click on any of the evidence links to access the Evidence code definition page and study the [http://www.geneontology.org/GO.evidence.shtml definitions of the codes]. '''Make sure you understand which codes point to experimental observation, and which codes denote computational inference, or say that the evidence is someone's opinion (TAS, IC ''etc''.).''' <small>Note: it is good practice - but regrettably not universally implemented standard - to clearly document database semantics and keep definitions associated with database entries easily accessible, as GO is doing here. You won't find this everywhere, but as a user please feel encouraged to complain to the database providers if you come across a database where the semantics are not clear. Seriously: opaque semantics make database annotations useless.</small> 
 +
# There are many associations (around 60) and a good way to select which ones to pursue is to follow the '''most specific''' ones. Set <code>IDA</code> as a filter and among the returned terms select <code>GO:0035019</code> &ndash; [http://amigo.geneontology.org/cgi-bin/amigo/term_details?term=GO:0035019 ''somatic stem cell maintenance''] in the '''Biological Process''' ontology. Follow that link.
 +
# Study the information available on that page and through the tabs on the page, especially the graph view.
 +
# In the '''Inferred Tree View''' tab, find the genes annotated to this go term for ''homo sapiens''. There should be about 55. Click on [http://amigo.geneontology.org/cgi-bin/amigo/term-assoc.cgi?term=GO:0035019&speciesdb=all&taxid=9606 the number behind the term]. The resulting page will give you all human proteins that have been annotated with this particular term. Note that the great majority of these is via the <code>IEA</code> evidence code.
 +
}}
 +
 
 +
 
 +
===Semantic similarity===
 +
 
 +
A good, recent overview of ontology based functional annotation is found in the following article. This is not a formal reading assignment, but do familiarize yourself with section 3: ''Derivation of Semantic Similarity between Terms in an Ontology'' as an introduction to the code-based annotations below.
 +
 
 +
{{#pmid: 23533360}}
 +
 
 +
 
 +
The bioconductor project hosts the GOSemSim package for semantic similarity.
 +
 
 +
{{task|1=
 +
# Work through the following R-code. If you have problems, discuss them on the mailing list. Don't go through the code mechanically but make sure you are clear about what it does.
 +
<source lang="R">
 +
# GOsemanticSimilarity.R
 +
# GO semantic similarity example
 +
# B. Steipe for BCB420, January 2014
 +
 
 +
setwd("~/your-R-project-directory")
 +
 
 +
# GOSemSim is an R-package in the bioconductor project. It is not installed via
 +
# the usual install.packages() comand (via CRAN) but via an installation script
 +
# that is run from the bioconductor Website.
 +
 
 +
source("http://bioconductor.org/biocLite.R")
 +
biocLite("GOSemSim")
 +
 
 +
library(GOSemSim)
 +
 
 +
# This loads the library and starts the Bioconductor environment.
 +
# You can get an overview of functions by executing ...
 +
browseVignettes()
 +
# ... which will open a listing in your Web browser. Open the
 +
# introduction to GOSemSim PDF. As the introduction suggests,
 +
# now is a good time to execute ...
 +
help(GOSemSim)
 +
 
 +
# The simplest function is to measure the semantic similarity of two GO
 +
# terms. For example, SOX2 was annotated with GO:0035019 (somatic stem cell
 +
# maintenance), QSOX2 was annotated with GO:0045454 (cell redox homeostasis),
 +
# and Oct4 (POU5F1) with GO:0009786 (regulation of asymmetric cell division),
 +
# among other associations. Lets calculate these similarities.
 +
goSim("GO:0035019", "GO:0009786", ont="BP", measure="Wang")
 +
goSim("GO:0035019", "GO:0045454", ont="BP", measure="Wang")
 +
 
 +
# Fair enough. Two numbers. Clearly we would appreciate an idea of the values
 +
# that high similarity and low similarity can take. But in any case -
 +
# we are really less interested in the similarity of GO terms - these
 +
# are a function of how the Ontology was constructed. We are more
 +
# interested in the functional similarity of our genes, and these
 +
# have a number of GO terms associated with them.
 +
 
 +
# GOSemSim provides the functions ...
 +
?geneSim()
 +
?mgeneSim()
 +
# ... to compute these values. Refer to the vignette for details, in
 +
# particular, consider how multiple GO terms are combined, and how to
 +
# keep/drop evidence codes.
 +
# Here is a pairwise similarity example: the gene IDs are the ones you
 +
# have recorded previously. Note that this will download a package
 +
# of GO annotations - you might not want to do this on a low-bandwidth
 +
# connection.
 +
geneSim("6657", "5460", ont = "BP", measure="Wang", combine = "BMA")
 +
# Another number. And the list of GO terms that were considered.
 +
 
 +
# Your task: use the mgeneSim() function to calculate the similarities
 +
# between all six proteins for which you have recorded the GeneIDs
 +
# previously (SOX2, POU5F1, E2F1, BMP4, UGT1A1 and NANOG) in the
 +
# biological process ontology.
 +
 
 +
# This will run for some time. On my machine, half an hour or so.
 +
 
 +
# Do the results correspond to your expectations?
 +
 
 +
</source>
 +
 
 +
}}
 +
<section end=exercises />
 +
 
 +
==References==
 +
<references />
 +
 
 +
==Further reading and resources==
 +
;General
 +
<div class="reference-box">[http://www.obofoundry.org/ '''OBO Foundry''' (Open Biological and Biomedical Ontologies)]</div>
 +
{{#pmid: 18793134}}
  
  
Next, we compute the semantic similarity of these two genes. The GO database lists a number of tools for this task (http://www.geneontology.org/GO.tools_by_type.semantic_similarity.shtml).
+
;Phenotype ''etc.'' Ontologies
 +
<div class="reference-box">[http://http://www.human-phenotype-ontology.org/ '''Human Phenotype Ontology''']<br/>
 +
See also: {{#pmid: 24217912}}</div>
 +
{{#pmid: 22080554}}
 +
{{#pmid: 21437033}}
 +
{{#pmid: 20004759}}
 +
{{#pmid: 16982638}}
  
# Navigate to the [http://xldb.di.fc.ul.pt/tools/proteinon/ ProteInOn] site at Lisbon University in Portugal - the online tool to compute GO-based semantic similarity that was discussed in last weeks reading assignment. Select "compute protein semantic similarity", use "Measure: simGIC" and "GO type: Biological process". Enter your four UniProt IDs in the correct format and '''run''' the computation.
 
# Interpret the similarity score table. Does it correspond to your expectations?
 
  
 +
;Semantic similarity
 +
{{#pmid: 23741529}}
 +
{{#pmid: 23533360}}
 +
{{#pmid: 22084008}}
 +
{{#pmid: 21078182}}
 +
{{#pmid: 20179076}}
  
:'''C: Graphical view of the ontology'''
+
;GO
 +
{{#pmid: 22102568}}
 +
{{#pmid: 21779995}}
 +
{{#pmid: 19920128}}
 +
Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2...
 +
{{#pmid: 18629186}}
  
 +
[[Category:Computational_Systems_Biology]]
  
Finally, we'll use the GO's AmiGO br
+
</div>

Latest revision as of 07:34, 17 January 2014

Ontologies for Computational Systems Biology


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


Poorly structured data can be integrated via ontologies. This is especially important for phenotype and "function" data. The primary example is the Gene Ontology (GO). Other examples include the Disease Ontology, OMIM and WikiGene.



Introduction

Harris (2008) Developing an ontology. Methods Mol Biol 452:111-24. (pmid: 18563371)

PubMed ] [ DOI ]

Hackenberg & Matthiesen (2010) Algorithms and methods for correlating experimental results with annotation databases. Methods Mol Biol 593:315-40. (pmid: 19957156)

PubMed ] [ DOI ]

GO

The Gene Ontology project is the most influential contributor to the definition of function in computational biology and the use of GO terms and GO annotations is ubiquitous.

GO: the Gene Ontology project


link ] [ page ]
size=200px
du Plessis et al. (2011) The what, where, how and why of gene ontology--a primer for bioinformaticians. Brief Bioinformatics 12:723-35. (pmid: 21330331)

PubMed ] [ DOI ]

The GO actually comprises three separate ontologies:

Molecular function
...


Biological Process
...


Cellular component
...


GO terms

GO terms comprise the core of the information in the ontology: a carefully crafted definition of a term in any of GO's separate ontologies.


GO relationships

The nature of the relationships is as much a part of the ontology as the terms themselves. GO uses three categories of relationships:

  • is a
  • part of
  • regulates


GO annotations

The GO terms are conceptual in nature, and while they represent our interpretation of biological phenomena, they do not intrinsically represent biological objects, such a specific genes or proteins. In order to link molecules with these concepts, the ontology is used to annotate genes. The annotation project is referred to as GOA.

Dimmer et al. (2007) Methods for gene ontology annotation. Methods Mol Biol 406:495-520. (pmid: 18287709)

PubMed ] [ DOI ]


GO evidence codes

Annotations can be made according to literature data or computational inference and it is important to note how an annotation has been justified by the curator to evaluate the level of trust we should have in the annotation. GO uses evidence codes to make this process transparent. When computing with the ontology, we may want to filter (exclude) particular terms in order to avoid tautologies: for example if we were to infer functional relationships between homologous genes, we should exclude annotations that have been based on the same inference or similar, and compute only with the actual experimental data.

The following evidence codes are in current use; if you want to exclude inferred anotations you would restrict the codes you use to the ones shown in bold: EXP, IDA, IPI, IMP, IEP, and perhaps IGI, although the interpretation of genetic interactions can require assumptions.

Automatically-assigned Evidence Codes
  • IEA: Inferred from Electronic Annotation
Curator-assigned Evidence Codes
  • Experimental Evidence Codes
    • EXP: Inferred from Experiment
    • IDA: Inferred from Direct Assay
    • IPI: Inferred from Physical Interaction
    • IMP: Inferred from Mutant Phenotype
    • IGI: Inferred from Genetic Interaction
    • IEP: Inferred from Expression Pattern
  • Computational Analysis Evidence Codes
    • ISS: Inferred from Sequence or Structural Similarity
    • ISO: Inferred from Sequence Orthology
    • ISA: Inferred from Sequence Alignment
    • ISM: Inferred from Sequence Model
    • IGC: Inferred from Genomic Context
    • IBA: Inferred from Biological aspect of Ancestor
    • IBD: Inferred from Biological aspect of Descendant
    • IKR: Inferred from Key Residues
    • IRD: Inferred from Rapid Divergence
    • RCA: inferred from Reviewed Computational Analysis
  • Author Statement Evidence Codes
    • TAS: Traceable Author Statement
    • NAS: Non-traceable Author Statement
  • Curator Statement Evidence Codes
    • IC: Inferred by Curator
    • ND: No biological Data available

For further details, see the Guide to GO Evidence Codes and the GO Evidence Code Decision Tree.


 

GO tools

For many projects, the simplest approach will be to download the GO ontology itself. It is a well constructed, easily parseable file that is well suited for computation. For details, see Computing with GO on this wiki.

Introductory reading



Exercises


In this set of exercises we dive into practical work with GO: at first via the AmiGO browser, and then via bioconductor.


AmiGO

AmiGO is a GO browser developed by the Gene Ontology consortium and hosted on their website.

AmiGO - Gene products

Task:

  1. Navigate to the GO homepage.
  2. Enter SOX2 into the search box to initiate a search for the human SOX2 transcription factor (WP, HUGO) (as gene or protein name).
  3. There are a number of hits in various organisms: sulfhydryl oxidases and (sex determining region Y)-box genes. Check to see the various ways by which you could filter and restrict the results.
  4. Select Homo sapiens as the species filter and set the filter. Note that this still does not give you a unique hit, but ...
  5. ... you can identify the Transcription factor SOX-2 and follow its gene product information link. Study the information on that page.
  6. Later, we will need Entrez Gene IDs. The GOA information page provides these as GeneID in the External references section. Note it down. With the same approach, find and record the Gene IDs (a) of the functionally related Oct4 (POU5F1) protein, (b) the human cell-cycle transcription factor E2F1, (c) the human bone morphogenetic protein-4 transforming growth factor BMP4, (d) the human UDP glucuronosyltransferase 1 family protein 1, an enzyme that is differentially expressed in some cancers, UGT1A1, and (d) as a positive control, SOX2's interaction partner NANOG, which we would expect to be annotated as functionally similar to both Oct4 and SOX2.

AmiGO - Associations

GO annotations for a protein are called associations.

Task:

  1. Open the associations information page for the human SOX2 protein via the link in the right column in a separate tab. Study the information on that page.
  2. Note that you can filter the associations by ontology and evidence code. You have read about the three GO ontologies in your previous assignment, but you should also be familiar with the evidence codes. Click on any of the evidence links to access the Evidence code definition page and study the definitions of the codes. Make sure you understand which codes point to experimental observation, and which codes denote computational inference, or say that the evidence is someone's opinion (TAS, IC etc.). Note: it is good practice - but regrettably not universally implemented standard - to clearly document database semantics and keep definitions associated with database entries easily accessible, as GO is doing here. You won't find this everywhere, but as a user please feel encouraged to complain to the database providers if you come across a database where the semantics are not clear. Seriously: opaque semantics make database annotations useless.
  3. There are many associations (around 60) and a good way to select which ones to pursue is to follow the most specific ones. Set IDA as a filter and among the returned terms select GO:0035019somatic stem cell maintenance in the Biological Process ontology. Follow that link.
  4. Study the information available on that page and through the tabs on the page, especially the graph view.
  5. In the Inferred Tree View tab, find the genes annotated to this go term for homo sapiens. There should be about 55. Click on the number behind the term. The resulting page will give you all human proteins that have been annotated with this particular term. Note that the great majority of these is via the IEA evidence code.


Semantic similarity

A good, recent overview of ontology based functional annotation is found in the following article. This is not a formal reading assignment, but do familiarize yourself with section 3: Derivation of Semantic Similarity between Terms in an Ontology as an introduction to the code-based annotations below.

Gan et al. (2013) From ontology to semantic similarity: calculation of ontology-based semantic similarity. ScientificWorldJournal 2013:793091. (pmid: 23533360)

PubMed ] [ DOI ]


The bioconductor project hosts the GOSemSim package for semantic similarity.

Task:

  1. Work through the following R-code. If you have problems, discuss them on the mailing list. Don't go through the code mechanically but make sure you are clear about what it does.
# GOsemanticSimilarity.R
# GO semantic similarity example
# B. Steipe for BCB420, January 2014

setwd("~/your-R-project-directory")

# GOSemSim is an R-package in the bioconductor project. It is not installed via
# the usual install.packages() comand (via CRAN) but via an installation script
# that is run from the bioconductor Website.

source("http://bioconductor.org/biocLite.R")
biocLite("GOSemSim")

library(GOSemSim)

# This loads the library and starts the Bioconductor environment.
# You can get an overview of functions by executing ...
browseVignettes()
# ... which will open a listing in your Web browser. Open the 
# introduction to GOSemSim PDF. As the introduction suggests,
# now is a good time to execute ...
help(GOSemSim)

# The simplest function is to measure the semantic similarity of two GO
# terms. For example, SOX2 was annotated with GO:0035019 (somatic stem cell
# maintenance), QSOX2 was annotated with GO:0045454 (cell redox homeostasis),
# and Oct4 (POU5F1) with GO:0009786 (regulation of asymmetric cell division),
# among other associations. Lets calculate these similarities.
goSim("GO:0035019", "GO:0009786", ont="BP", measure="Wang")
goSim("GO:0035019", "GO:0045454", ont="BP", measure="Wang")

# Fair enough. Two numbers. Clearly we would appreciate an idea of the values
# that high similarity and low similarity can take. But in any case - 
# we are really less interested in the similarity of GO terms - these
# are a function of how the Ontology was constructed. We are more
# interested in the functional similarity of our genes, and these
# have a number of GO terms associated with them.

# GOSemSim provides the functions ...
?geneSim()
?mgeneSim()
# ... to compute these values. Refer to the vignette for details, in
# particular, consider how multiple GO terms are combined, and how to
# keep/drop evidence codes.
# Here is a pairwise similarity example: the gene IDs are the ones you
# have recorded previously. Note that this will download a package
# of GO annotations - you might not want to do this on a low-bandwidth
# connection.
geneSim("6657", "5460", ont = "BP", measure="Wang", combine = "BMA")
# Another number. And the list of GO terms that were considered.

# Your task: use the mgeneSim() function to calculate the similarities
# between all six proteins for which you have recorded the GeneIDs
# previously (SOX2, POU5F1, E2F1, BMP4, UGT1A1 and NANOG) in the 
# biological process ontology. 

# This will run for some time. On my machine, half an hour or so. 

# Do the results correspond to your expectations?


References


Further reading and resources

General
Sauro & Bergmann (2008) Standards and ontologies in computational systems biology. Essays Biochem 45:211-22. (pmid: 18793134)

PubMed ] [ DOI ]


Phenotype etc. Ontologies
Human Phenotype Ontology
See also:
Köhler et al. (2014) The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res 42:D966-74. (pmid: 24217912)

PubMed ] [ DOI ]

Schriml et al. (2012) Disease Ontology: a backbone for disease semantic integration. Nucleic Acids Res 40:D940-6. (pmid: 22080554)

PubMed ] [ DOI ]

Evelo et al. (2011) Answering biological questions: querying a systems biology database for nutrigenomics. Genes Nutr 6:81-7. (pmid: 21437033)

PubMed ] [ DOI ]

Oti et al. (2009) The biological coherence of human phenome databases. Am J Hum Genet 85:801-8. (pmid: 20004759)

PubMed ] [ DOI ]

Groth et al. (2007) PhenomicDB: a new cross-species genotype/phenotype resource. Nucleic Acids Res 35:D696-9. (pmid: 16982638)

PubMed ] [ DOI ]


Semantic similarity
Wu et al. (2013) Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method. PLoS ONE 8:e66745. (pmid: 23741529)

PubMed ] [ DOI ]

Gan et al. (2013) From ontology to semantic similarity: calculation of ontology-based semantic similarity. ScientificWorldJournal 2013:793091. (pmid: 23533360)

PubMed ] [ DOI ]

Alvarez & Yan (2011) A graph-based semantic similarity measure for the gene ontology. J Bioinform Comput Biol 9:681-95. (pmid: 22084008)

PubMed ] [ DOI ]

Jain & Bader (2010) An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology. BMC Bioinformatics 11:562. (pmid: 21078182)

PubMed ] [ DOI ]

Yu et al. (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26:976-8. (pmid: 20179076)

PubMed ] [ DOI ]

GO
Gene Ontology Consortium (2012) The Gene Ontology: enhancements for 2011. Nucleic Acids Res 40:D559-64. (pmid: 22102568)

PubMed ] [ DOI ]

Bastos et al. (2011) Application of gene ontology to gene identification. Methods Mol Biol 760:141-57. (pmid: 21779995)

PubMed ] [ DOI ]

Gene Ontology Consortium (2010) The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res 38:D331-5. (pmid: 19920128)

PubMed ] [ DOI ]

Carol Goble on the tension between purists and pragmatists in life-science ontology construction. Plenary talk at SOFG2...

Goble & Wroe (2004) The Montagues and the Capulets. Comp Funct Genomics 5:623-32. (pmid: 18629186)

PubMed ] [ DOI ]