Difference between revisions of "BIN-FUNC-Semantic similarity"

From "A B C"
Jump to navigation Jump to search
m
m
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
 
Measuring "Semantic Similarity" in Ontologies
 
Measuring "Semantic Similarity" in Ontologies
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 
+
(Semantic similarity of terms in ontologies, using GO and GOA with R)
  {{Vspace}}
+
</div>
 
 
<div class="keywords">
 
<b>Keywords:</b>&nbsp;
 
Semantic similarity of terms in ontologies, using GO and GOA with R
 
 
</div>
 
</div>
  
{{Vspace}}
+
{{Smallvspace}}
 
 
 
 
__TOC__
 
 
 
{{Vspace}}
 
 
 
 
 
{{DEV}}
 
 
 
{{Vspace}}
 
  
  
 +
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
 +
<div style="font-size:118%;">
 +
<b>Abstract:</b><br />
 +
<section begin=abstract />
 +
This unit introduces the concept of "semantic similarity" between GO terms, which is a fundamental measure that allows comparing and categorizing genes by their function! We also introduce Bioconductor functions to put this into practice.
 +
<section end=abstract />
 +
</div>
 +
<!-- ============================  -->
 +
<hr>
 +
<table>
 +
<tr>
 +
<td style="padding:10px;">
 +
<b>Objectives:</b><br />
 +
This unit will ...
 +
* ... introduce the concept of semantic similarity;
 +
* ... demonstrate how to compute semantic similarity and GO term enrichment in R.
 +
</td>
 +
<td style="padding:10px;">
 +
<b>Outcomes:</b><br />
 +
After working through this unit you ...
 +
* ... are familar with the idea of "semantic similarity";
 +
* ... can load a Bioconductor model-organism annotation database, calculate GO term semantic similarities between Genes, and discover potentially collaborating genes from significantly enriched GO terms in a gene set.
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================  -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 +
*[[BIN-FUNC-GO|BIN-FUNC-GO (Gene Ontology)]]
 +
*[[FND-STA-Information_theory|FND-STA-Information_theory (Concepts of Information Theory)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 
</div>
 
</div>
<div id="ABC-unit-framework">
 
== Abstract ==
 
<!-- included from "../components/BIN-FUNC-Semantic_similarity.components.wtxt", section: "abstract" -->
 
...
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
== This unit ... ==
 
=== Prerequisites ===
 
<!-- included from "../components/BIN-FUNC-Semantic_similarity.components.wtxt", section: "prerequisites" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
 
You need to complete the following units before beginning this one:
 
*[[BIN-FUNC-GO]]
 
*[[FND-STA-Information_theory]]
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Objectives ===
+
__TOC__
<!-- included from "../components/BIN-FUNC-Semantic_similarity.components.wtxt", section: "objectives" -->
 
...
 
 
 
{{Vspace}}
 
 
 
 
 
=== Outcomes ===
 
<!-- included from "../components/BIN-FUNC-Semantic_similarity.components.wtxt", section: "outcomes" -->
 
...
 
 
 
{{Vspace}}
 
 
 
 
 
=== Deliverables ===
 
<!-- included from "../components/BIN-FUNC-Semantic_similarity.components.wtxt", section: "deliverables" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|course journal]].
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|insights! page]].
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 71: Line 67:
  
 
=== Evaluation ===
 
=== Evaluation ===
<!-- included from "../components/BIN-FUNC-Semantic_similarity.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
 
<b>Evaluation: NA</b><br />
 
<b>Evaluation: NA</b><br />
:This unit is not evaluated for course marks.
+
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
 
 
{{Vspace}}
 
 
 
 
 
</div>
 
<div id="BIO">
 
 
== Contents ==
 
== Contents ==
<!-- included from "../components/BIN-FUNC-Semantic_similarity.components.wtxt", section: "contents" -->
 
  
  
Line 89: Line 76:
 
}}
 
}}
  
 
+
{{Vspace}}
 
 
====Semantic similarity====
 
  
 
A good, recent overview of ontology based functional annotation is found in the following article. This is not a formal reading assignment, but do familiarize yourself with section 3: ''Derivation of Semantic Similarity between Terms in an Ontology'' as an introduction to the code-based annotations below.
 
A good, recent overview of ontology based functional annotation is found in the following article. This is not a formal reading assignment, but do familiarize yourself with section 3: ''Derivation of Semantic Similarity between Terms in an Ontology'' as an introduction to the code-based annotations below.
Line 97: Line 82:
 
{{#pmid: 23533360}}
 
{{#pmid: 23533360}}
  
 +
{{Vspace}}
  
Practical work with GO:  bioconductor.
+
{{ABC-unit|BIN-FUNC-Semantic_similarity.R}}
 
 
The bioconductor project hosts the GOSemSim package for semantic similarity.
 
 
 
{{task|1=
 
# Work through the following R-code. If you have problems, discuss them on the mailing list. Don't go through the code mechanically but make sure you are clear about what it does.
 
<source lang="R">
 
# GOsemanticSimilarity.R
 
# GO semantic similarity example
 
# B. Steipe for BCB420, January 2014
 
 
 
setwd("~/your-R-project-directory")
 
 
 
# GOSemSim is an R-package in the bioconductor project. It is not installed via
 
# the usual install.packages() comand (via CRAN) but via an installation script
 
# that is run from the bioconductor Website.
 
 
 
source("http://bioconductor.org/biocLite.R")
 
biocLite("GOSemSim")
 
 
 
library(GOSemSim)
 
 
 
# This loads the library and starts the Bioconductor environment.
 
# You can get an overview of functions by executing ...
 
browseVignettes()
 
# ... which will open a listing in your Web browser. Open the
 
# introduction to GOSemSim PDF. As the introduction suggests,
 
# now is a good time to execute ...
 
help(GOSemSim)
 
 
 
# The simplest function is to measure the semantic similarity of two GO
 
# terms. For example, SOX2 was annotated with GO:0035019 (somatic stem cell
 
# maintenance), QSOX2 was annotated with GO:0045454 (cell redox homeostasis),
 
# and Oct4 (POU5F1) with GO:0009786 (regulation of asymmetric cell division),
 
# among other associations. Lets calculate these similarities.
 
goSim("GO:0035019", "GO:0009786", ont="BP", measure="Wang")
 
goSim("GO:0035019", "GO:0045454", ont="BP", measure="Wang")
 
 
 
# Fair enough. Two numbers. Clearly we would appreciate an idea of the values
 
# that high similarity and low similarity can take. But in any case -
 
# we are really less interested in the similarity of GO terms - these
 
# are a function of how the Ontology was constructed. We are more
 
# interested in the functional similarity of our genes, and these
 
# have a number of GO terms associated with them.
 
 
 
# GOSemSim provides the functions ...
 
?geneSim()
 
?mgeneSim()
 
# ... to compute these values. Refer to the vignette for details, in
 
# particular, consider how multiple GO terms are combined, and how to
 
# keep/drop evidence codes.
 
# Here is a pairwise similarity example: the gene IDs are the ones you
 
# have recorded previously. Note that this will download a package
 
# of GO annotations - you might not want to do this on a low-bandwidth
 
# connection.
 
geneSim("6657", "5460", ont = "BP", measure="Wang", combine = "BMA")
 
# Another number. And the list of GO terms that were considered.
 
 
 
# Your task: use the mgeneSim() function to calculate the similarities
 
# between all six proteins for which you have recorded the GeneIDs
 
# previously (SOX2, POU5F1, E2F1, BMP4, UGT1A1 and NANOG) in the
 
# biological process ontology.
 
 
 
# This will run for some time. On my machine, half an hour or so.
 
 
 
# Do the results correspond to your expectations?
 
 
 
</source>
 
 
 
}}
 
 
 
 
 
 
 
{{Vspace}}
 
  
  
Line 180: Line 93:
 
{{#pmid: 22084008}}
 
{{#pmid: 22084008}}
 
{{#pmid: 21078182}}
 
{{#pmid: 21078182}}
{{#pmid: 20179076}}
 
w
 
 
{{Vspace}}
 
 
  
 
== Notes ==
 
== Notes ==
<!-- included from "../components/BIN-FUNC-Semantic_similarity.components.wtxt", section: "notes" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
 
 
<references />
 
<references />
  
 
{{Vspace}}
 
{{Vspace}}
  
 
</div>
 
<div id="ABC-unit-framework">
 
== Self-evaluation ==
 
<!-- included from "../components/BIN-FUNC-Semantic_similarity.components.wtxt", section: "self-evaluation" -->
 
<!--
 
=== Question 1===
 
 
Question ...
 
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
Answer ...
 
<div class="mw-collapsible-content">
 
Answer ...
 
 
</div>
 
  </div>
 
 
  {{Vspace}}
 
 
-->
 
 
{{Vspace}}
 
 
 
 
{{Vspace}}
 
 
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
 
 
----
 
 
{{Vspace}}
 
 
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
 
 
----
 
 
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 242: Line 108:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-05
+
:2020-09-24
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:0.1
+
:1.1
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.1 2020 Maintenance
 +
*1.0 First live version
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 03:20, 25 September 2020

Measuring "Semantic Similarity" in Ontologies

(Semantic similarity of terms in ontologies, using GO and GOA with R)


 


Abstract:

This unit introduces the concept of "semantic similarity" between GO terms, which is a fundamental measure that allows comparing and categorizing genes by their function! We also introduce Bioconductor functions to put this into practice.


Objectives:
This unit will ...

  • ... introduce the concept of semantic similarity;
  • ... demonstrate how to compute semantic similarity and GO term enrichment in R.

Outcomes:
After working through this unit you ...

  • ... are familar with the idea of "semantic similarity";
  • ... can load a Bioconductor model-organism annotation database, calculate GO term semantic similarities between Genes, and discover potentially collaborating genes from significantly enriched GO terms in a gene set.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

  • Prerequisites:
    This unit builds on material covered in the following prerequisite units:


     



     



     


    Evaluation

    Evaluation: NA

    This unit is not evaluated for course marks.

    Contents


     

    A good, recent overview of ontology based functional annotation is found in the following article. This is not a formal reading assignment, but do familiarize yourself with section 3: Derivation of Semantic Similarity between Terms in an Ontology as an introduction to the code-based annotations below.

    Gan et al. (2013) From ontology to semantic similarity: calculation of ontology-based semantic similarity. ScientificWorldJournal 2013:793091. (pmid: 23533360)

    PubMed ] [ DOI ]


     

    Task:

     
    • Open RStudio and load the ABC-units R project. If you have loaded it before, choose FileRecent projectsABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit.
    • Choose ToolsVersion ControlPull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
    • Type init() if requested.
    • Open the file BIN-FUNC-Semantic_similarity.R and follow the instructions.


     

    Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.


     


    Further reading, links and resources

    Wu et al. (2013) Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method. PLoS ONE 8:e66745. (pmid: 23741529)

    PubMed ] [ DOI ]

    Gan et al. (2013) From ontology to semantic similarity: calculation of ontology-based semantic similarity. ScientificWorldJournal 2013:793091. (pmid: 23533360)

    PubMed ] [ DOI ]

    Alvarez & Yan (2011) A graph-based semantic similarity measure for the gene ontology. J Bioinform Comput Biol 9:681-95. (pmid: 22084008)

    PubMed ] [ DOI ]

    Jain & Bader (2010) An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology. BMC Bioinformatics 11:562. (pmid: 21078182)

    PubMed ] [ DOI ]

    Notes


     


    About ...
     
    Author:

    Boris Steipe <boris.steipe@utoronto.ca>

    Created:

    2017-08-05

    Modified:

    2020-09-24

    Version:

    1.1

    Version history:

    • 1.1 2020 Maintenance
    • 1.0 First live version
    • 0.1 First stub

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.