Difference between revisions of "CSB Assignment Week 2"

From "A B C"
Jump to navigation Jump to search
m (Created page with "<div id="CSB"> <div class="b1"> Assignments for Week 2 </div> Exercises for this week relate to this week's lecture.<br /> Pre-reading for this week will prepare next week's lec...")
 
m
Line 5: Line 5:
  
 
Exercises for this week relate to this week's lecture.<br />
 
Exercises for this week relate to this week's lecture.<br />
Pre-reading for this week will prepare next week's lecture.<br />
+
Pre-reading for this week will prepare next week's lectures.<br />
 
Exercises and pre-reading will be topics on next week's quiz.  
 
Exercises and pre-reading will be topics on next week's quiz.  
  
Line 14: Line 14:
 
==Exercises==
 
==Exercises==
  
 +
# Navigate to the [http://www.yeastgenome.org/ ''Saccharomyces'' Genome Database] and search for the gene name '''mbp1''' using the search box. Review the information available on the result page. Find, and note note down the UniProt ID.
 +
# For comparison, review the gene information of the functionally related human [http://www.ncbi.nlm.nih.gov/gene/1869 E2F1 transcription factor] at the NCBI. Here too, find, and note note down the UniProt ID.
 +
<!--
 +
UniProt IDS:
 +
  Mbp1: P39678
 +
  E2F1: Q01094
 +
  TFDP1: Q14186
 +
  MBP: P02686
 +
  P39678, Q01094, Q14186, P02686
 +
-->
 +
 +
Next, we compute the semantic similarity of these two genes. The GO database lists a number of tools for this task (http://www.geneontology.org/GO.tools_by_type.semantic_similarity.shtml).
 +
 +
# For comparison, find the IDs of a protein of related, and of unrelated function in Uniprot.
 +
## Find the UniProt ID of E2F1's human interaction partner TFDP1, which we would expect to be annotated as functionally similar to both E2F1 and MBP1;
 +
## also find the UniProt ID  of human MBP (myelin basic protein), which is functionally unrelated.
 +
# Navigate to the [http://xldb.di.fc.ul.pt/tools/proteinon/ ProteInOn] site at Lisbon University in Portugal, continue to the "Gene Analysis" page and further to "Functional Similarity of Two Genes" - an online tool to compute GO-based semantic similarity. Select "compute protein semantic similarity", use "Measure: simGIC" and "GO type: Biological process". Enter your four UniProt IDs in the correct format and '''run''' the computation.
 +
# Interpret the similarity score table. Does it correspond to your expectations?
  
TBD
 
  
  
 
==Pre-reading==
 
==Pre-reading==
 +
Next week we will discuss various aspects of working with genome-scale data sets. One of the topics is ''enrichment-analysis''. For many experimental approaches, the ultimate outcome is a list of genes. Enrichment analysis addresses the question: do genes in a set have a remarkable property in common? The methodologies discussed here have applications in many fields of computational biology. 
  
 
+
{{#pmid:19597782}}
TBD
 
  
  

Revision as of 21:28, 20 January 2012

Assignments for Week 2

Exercises for this week relate to this week's lecture.
Pre-reading for this week will prepare next week's lectures.
Exercises and pre-reading will be topics on next week's quiz.



Exercises

  1. Navigate to the Saccharomyces Genome Database and search for the gene name mbp1 using the search box. Review the information available on the result page. Find, and note note down the UniProt ID.
  2. For comparison, review the gene information of the functionally related human E2F1 transcription factor at the NCBI. Here too, find, and note note down the UniProt ID.

Next, we compute the semantic similarity of these two genes. The GO database lists a number of tools for this task (http://www.geneontology.org/GO.tools_by_type.semantic_similarity.shtml).

  1. For comparison, find the IDs of a protein of related, and of unrelated function in Uniprot.
    1. Find the UniProt ID of E2F1's human interaction partner TFDP1, which we would expect to be annotated as functionally similar to both E2F1 and MBP1;
    2. also find the UniProt ID of human MBP (myelin basic protein), which is functionally unrelated.
  2. Navigate to the ProteInOn site at Lisbon University in Portugal, continue to the "Gene Analysis" page and further to "Functional Similarity of Two Genes" - an online tool to compute GO-based semantic similarity. Select "compute protein semantic similarity", use "Measure: simGIC" and "GO type: Biological process". Enter your four UniProt IDs in the correct format and run the computation.
  3. Interpret the similarity score table. Does it correspond to your expectations?


Pre-reading

Next week we will discuss various aspects of working with genome-scale data sets. One of the topics is enrichment-analysis. For many experimental approaches, the ultimate outcome is a list of genes. Enrichment analysis addresses the question: do genes in a set have a remarkable property in common? The methodologies discussed here have applications in many fields of computational biology.

Tilford & Siemers (2009) Gene set enrichment analysis. Methods Mol Biol 563:99-121. (pmid: 19597782)

PubMed ] [ DOI ] Set enrichment analytical methods have become commonplace tools applied to the analysis and interpretation of biological data. The statistical techniques are used to identify categorical biases within lists of genes, proteins, or metabolites. The goal is to discover the shared functions or properties of the biological items represented within the lists. Application of these methods can provide great biological insight, including the discovery of participation in the same biological activity or pathway, shared interacting genes or regulators, common cellular compartmentalization, or association with disease. The methods require ordered or unordered lists of biological items as input, understanding of the reference set from which the lists were selected, categorical classifiers describing the items, and a statistical algorithm to assess bias of each classifier. Due to the complexity of most algorithms and the number of calculations performed, computer software is almost always used for execution of the algorithm, as well as for presentation of the results. This chapter will provide an overview of the statistical methods used to perform an enrichment analysis. Guidelines for assembly of the requisite information will be presented, with a focus on careful definition of the sets used by the statistical algorithms. The need for multiple test correction when working with large libraries of classifiers is emphasized, and we outline several options for performing the corrections. Finally, interpreting the results of such analysis will be discussed along with examples of recent research utilizing the techniques.