CSB Assignment Week 2

From "A B C"
Revision as of 22:05, 20 January 2012 by Boris (talk | contribs) (→‎Exercises)
Jump to navigation Jump to search

Assignments for Week 2

Exercises for this week relate to this week's lecture.
Pre-reading for this week will prepare next week's lectures.
Exercises and pre-reading will be topics on next week's quiz.



Exercises

  1. Navigate to the Saccharomyces Genome Database and search for the gene name mbp1 using the search box. Review the information available on the result page. Find, and note note down the UniProt ID.
  2. For comparison, review the gene information of the functionally related human E2F1 transcription factor at the NCBI. Here too, find, and note note down the UniProt ID.


Next, we compute the semantic similarity of these two genes. The GO database lists a number of tools for this task (http://www.geneontology.org/GO.tools_by_type.semantic_similarity.shtml).

  1. For comparison, find the IDs of a protein of related, and of unrelated function in Uniprot.
    1. Find the UniProt ID of E2F1's human interaction partner TFDP1, which we would expect to be annotated as functionally similar to both E2F1 and MBP1;
    2. also find the UniProt ID of human MBP (myelin basic protein), which is functionally unrelated.
  2. Navigate to the ProteInOn site at Lisbon University in Portugal - the online tool to compute GO-based semantic similarity that was discussed in last weeks reading assignment. Select "compute protein semantic similarity", use "Measure: simGIC" and "GO type: Biological process". Enter your four UniProt IDs in the correct format and run the computation.
  3. Interpret the similarity score table. Does it correspond to your expectations?


Finally, we'll use the GO's AmiGO browser to compare the genes graphically.

  1. Navigate to the AmiGO search interface, select "genes or proteins" and enter MBP1. Filter the results by the correct species and restrict the reults to the biological process ontology.
  2. This should return the GO annotation page for the yeast Mbp1 protein. Follw the "5 term associations" in the header bar.
  3. Click on "view in tree" for the GO term GO:0000083.
  4. This shows you the ontology of the term in text form, including the number of genes annotated to each term. In the right hand box you should find a link that you can follow for a graphical view.
  5. In a separate window, repeat the process for human E2F1 (choose the most specific term, i.e. the one that refers to the gene's role in the G1/S transition - GO:0000082).
  6. Roughly compare the two ontologies.
  7. Contrast this with the ontology for human MBP, specifically the axon ensheathment process.

Pre-reading

Next week we will discuss various aspects of working with genome-scale data sets. One of the topics is enrichment-analysis. For many experimental approaches, the ultimate outcome is a list of genes. Enrichment analysis addresses the question: do genes in a set have a remarkable property in common? The methodologies discussed here have applications in many fields of computational biology.

Tilford & Siemers (2009) Gene set enrichment analysis. Methods Mol Biol 563:99-121. (pmid: 19597782)

PubMed ] [ DOI ] Set enrichment analytical methods have become commonplace tools applied to the analysis and interpretation of biological data. The statistical techniques are used to identify categorical biases within lists of genes, proteins, or metabolites. The goal is to discover the shared functions or properties of the biological items represented within the lists. Application of these methods can provide great biological insight, including the discovery of participation in the same biological activity or pathway, shared interacting genes or regulators, common cellular compartmentalization, or association with disease. The methods require ordered or unordered lists of biological items as input, understanding of the reference set from which the lists were selected, categorical classifiers describing the items, and a statistical algorithm to assess bias of each classifier. Due to the complexity of most algorithms and the number of calculations performed, computer software is almost always used for execution of the algorithm, as well as for presentation of the results. This chapter will provide an overview of the statistical methods used to perform an enrichment analysis. Guidelines for assembly of the requisite information will be presented, with a focus on careful definition of the sets used by the statistical algorithms. The need for multiple test correction when working with large libraries of classifiers is emphasized, and we outline several options for performing the corrections. Finally, interpreting the results of such analysis will be discussed along with examples of recent research utilizing the techniques.