Difference between revisions of "ABC-INT-Genome annotation"

From "A B C"
Jump to navigation Jump to search
m
m
Line 19: Line 19:
  
  
{{STUB}}
+
{{LIVE}}
  
 
{{Vspace}}
 
{{Vspace}}
Line 30: Line 30:
 
<!-- included from "../components/ABC-INT-Genome_annotation.components.wtxt", section: "abstract" -->
 
<!-- included from "../components/ABC-INT-Genome_annotation.components.wtxt", section: "abstract" -->
 
This page assesses the learning units for data management and sequence analysis of  genomic sequence data.
 
This page assesses the learning units for data management and sequence analysis of  genomic sequence data.
 
* choose a fungal genome
 
* create a datamodel for domain annotations
 
* implement as a dataframe
 
* annotate signal sequence (or disordered stretches)
 
* plot and analyze results
 
 
<section end=abstract />
 
<section end=abstract />
  
Line 48: Line 42:
 
*[[BIN-FUNC-Annotation|BIN-FUNC-Annotation (Function Annotation)]]
 
*[[BIN-FUNC-Annotation|BIN-FUNC-Annotation (Function Annotation)]]
 
*[[BIN-Genome-Browsers|BIN-Genome-Browsers (Genome Browsers)]]
 
*[[BIN-Genome-Browsers|BIN-Genome-Browsers (Genome Browsers)]]
 
{{Vspace}}
 
 
 
=== Objectives ===
 
<!-- included from "../components/ABC-INT-Genome_annotation.components.wtxt", section: "objectives" -->
 
...
 
 
{{Vspace}}
 
 
 
=== Outcomes ===
 
<!-- included from "../components/ABC-INT-Genome_annotation.components.wtxt", section: "outcomes" -->
 
...
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 68: Line 48:
 
=== Deliverables ===
 
=== Deliverables ===
 
<!-- included from "../components/ABC-INT-Genome_annotation.components.wtxt", section: "deliverables" -->
 
<!-- included from "../components/ABC-INT-Genome_annotation.components.wtxt", section: "deliverables" -->
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-milestone" -->
+
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-integrator" -->
*<b>No separate deliverables</b>: This unit collects other units and has no deliverables on its own.
+
*<b>Integrator unit</b>: Deliverables will be marked as detailed on this page.
  
 
{{Vspace}}
 
{{Vspace}}
Line 84: Line 64:
 
* Document your results in a short report on a subpage of your User page on the Student Wiki. Describe your methods (R-code!) in an appendix;
 
* Document your results in a short report on a subpage of your User page on the Student Wiki. Describe your methods (R-code!) in an appendix;
 
* When you are done with everything, add the following category tag to the page:
 
* When you are done with everything, add the following category tag to the page:
::<code><nowiki>[[Category:EVAL-INT-Phylogeny]]</nowiki></code>
+
::<code><nowiki>[[Category:EVAL-INT-Genome_annotation]]</nowiki></code>
 
:'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
 
:'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
 
{{Smallvspace}}
 
{{Smallvspace}}
Line 102: Line 82:
 
:'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
 
:'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
 
{{Smallvspace}}
 
{{Smallvspace}}
<!--
 
 
;Literature research option
 
;Literature research option
:Navigate to the [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/ABC-INT-Phylogeny_topics '''Phylogeny Literature Research Topics] page on the Student Wiki.
+
:This option requires that a primary publication is available for the MYSPE genome sequence; if there is none, this option is not available.
:* Pick a topic and enter your name in the table to claim it.
+
:* Write a report on the annotation methodology that was used for the MYSPE genome. Note: this is not a review, but a report. Think of a "whitepaper", not a publication. Write to a specialist technical audience - imagine collaborators who want to use the same methods - and be specific to provide actionable information.
:* Write a report on your research. Note: this is not a review, but a report. Think of a "whitepaper", not a publication. Write to a specialist technical audience and be specific to provide actionable information.
 
 
:* write your report on a subpage of your User page of the Student Wiki;
 
:* write your report on a subpage of your User page of the Student Wiki;
 
:* make sure that you have included all references and citations.
 
:* make sure that you have included all references and citations.
 
:* When you are done with everything, add the following category tag to the page:
 
:* When you are done with everything, add the following category tag to the page:
::<code><nowiki>[[Category:EVAL-INT-Phylogeny]]</nowiki></code>
+
::<code><nowiki>[[Category:EVAL-INT-Genome_annotation]]</nowiki></code>
 
:'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
 
:'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
 
{{Smallvspace}}
 
{{Smallvspace}}
-->
 
<!--
 
 
;Oral exam option
 
;Oral exam option
 
* Work through the tasks described in the scenario. Remember to document your work in your journal.
 
* Work through the tasks described in the scenario. Remember to document your work in your journal.
Line 121: Line 97:
 
* Schedule an oral exam by editing the [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/Signup-Oral_exams_2017 '''signup page on the Student Wiki''']. Enter the unit that you are signing up for, and your name. You must have signed-up for an exam slot before 21:00 on the day before your exam.
 
* Schedule an oral exam by editing the [http://steipe.biochemistry.utoronto.ca/abc/students/index.php/Signup-Oral_exams_2017 '''signup page on the Student Wiki''']. Enter the unit that you are signing up for, and your name. You must have signed-up for an exam slot before 21:00 on the day before your exam.
 
{{Smallvspace}}
 
{{Smallvspace}}
-->
+
;Genome sequence analysis option
<!--
+
* Start a subpage of your User page on the Student Wiki to document your analysis;
;R code option
+
* Work through the tasks described in the scenario, download sequence data and develop an analysis script as required. Keep your script generic, so that you could easily adapt it to analyze a different gene. Keep careful Journal notes of your activities with your analysis.
* Work through the tasks described in the scenario and develop code as required.
 
* Put your code on a subpage of your User page on the Student Wiki;
 
 
* When you are done with everything, add the following category tag to the page:
 
* When you are done with everything, add the following category tag to the page:
::<code><nowiki>[[Category:EVAL-INT-Phylogeny]]</nowiki></code>
+
::<code><nowiki>[[Category:EVAL-INT-Genome_annotation]]</nowiki></code>
 
:'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
 
:'''Do not''' change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
-->
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 138: Line 111:
 
== Contents ==
 
== Contents ==
 
<!-- included from "../components/ABC-INT-Genome_annotation.components.wtxt", section: "contents" -->
 
<!-- included from "../components/ABC-INT-Genome_annotation.components.wtxt", section: "contents" -->
<!-- included from "ABC-unit_components.wtxt", section: "milestone" -->
+
{{Smallvspace}}
This is a "milestone unit". Its purpose is merely to collect a number of preparatory units into a single, common prerequisite. It has no contents of its own; you are expected to be familiar and competent with all preparatory material at this point.
+
===Scenario===
 +
{{Smallvspace}}
 +
You know that MYSPE has an Mbp1 orthologue. The key questions of functional genome annotation would be: does it work in the same way in MYSPE as in yeast? Does it have the same target genes? Is it regulated by orthologues to other yeast genes that imply the same feedback mechanisms and genetic regulatory circuits? Here we will try to deduce just one part of such questions: is the binding motif for Mbp1 conserved? If that is the case, we could automate the task to find genes that are potentially regulated by MBP1_MYSPE, if not, we would need to pursue a different strategy of binding site discovery.
 +
 
 +
Here is how we assess the conservation of the Mbp1 DNA binding motif in MYSPE, working from the orthologue of  Cdc6, a pre-replicative complex component:
 +
* Find the MYSPE orthologue for yeast Cdc6.
 +
* Fetch 500 nucleotides of upstream genome sequence. (Demonstrate that this is the correct sequence by showing the first 10 translated Cdc6 codons with your sequence.)
 +
* The yeast Mbp1 canonical binding site is defined by the regular expression <tt>[AT]CGCG[AT]</tt>.
 +
* Are there <tt>CGCG<tt> motifs present in your nucleotide sequence?
 +
* Identify them using a regular expression search. You may find the following code useful:
 +
<source lang="R">
 +
patt <- "..CGCG.."
 +
m <- gregexpr(patt, mySeq)
 +
regmatches(mySeq, m)[[1]]
 +
</source>
 +
* Are there [AT]CGCG and CGCG[AT]? What about [AT]CGCG[AT]?
 +
* Where are they located? Do they cluster? Are they arranged in a similar way as the yeast binding sites that you visited at UCSC?
 +
* Interpret your finding. Does this support or refute the idea that MBP1_MYSPE has the same DNA sequence binding specificity as MBP1_SACCEE?
 +
 
 +
{{Vspace}}
 +
 
  
 
{{Vspace}}
 
{{Vspace}}
Line 208: Line 201:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-08-09
+
:2017-11-19
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:0.1
+
:1.0
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.0 First live version
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>

Revision as of 07:28, 20 November 2017

Integration Unit: Genome annotation


 

Keywords:  Integrator unit: annotate sequences in a genome


 



 


 


Abstract

This page assesses the learning units for data management and sequence analysis of genomic sequence data.


 


This unit ...

Prerequisites

You need to complete the following units before beginning this one:


 


Deliverables

  • Integrator unit: Deliverables will be marked as detailed on this page.


 


Evaluation

This "Integrator Unit" should be submitted for evaluation for a maximum of 8 marks if one of the written deliverables is chosen, resp. 16 marks for the oral exam[1].

Please note the evaluation types that are available as options for this unit. Choose one evaluation type that you have not chosen for another Integrator Unit. (Each submitted Integrator Unit must be evaluated in a different way and one of your evaluations - but not your first one - must be an oral exam).
 
Interview option
Identify a laboratory whose work includes genome annotation, or re-annotation. Get in touch with the PI, a postdoc or senior graduate student in the laboratory and interview them in person or by eMail. Find out
  • why this work is important;
  • how they approach it methodologically;
  • in particular, what features they are looking for, and what discoveries can be made by looking for these features (get very specific on that point, we are most interested in strategies for interpretation of data);
  • what they have recently learned;
  • what the major challenges, current discussions, or controversies are.
  • write up your interview on a subpage of your User page of the Student Wiki;
  • add information that may be required to understand the methodology;
  • make sure that you have included important literature references.
  • When you are done with everything, add the following category tag to the page:
[[Category:EVAL-INT-Genome_annotation]]
Do not change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
 
Literature research option
This option requires that a primary publication is available for the MYSPE genome sequence; if there is none, this option is not available.
  • Write a report on the annotation methodology that was used for the MYSPE genome. Note: this is not a review, but a report. Think of a "whitepaper", not a publication. Write to a specialist technical audience - imagine collaborators who want to use the same methods - and be specific to provide actionable information.
  • write your report on a subpage of your User page of the Student Wiki;
  • make sure that you have included all references and citations.
  • When you are done with everything, add the following category tag to the page:
[[Category:EVAL-INT-Genome_annotation]]
Do not change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
 
Oral exam option
  • Work through the tasks described in the scenario. Remember to document your work in your journal.
  • Part of your task will involve writing an R script, place that code in a subpage of your User page on the Student Wiki and link to it from your Journal. (Do not add an evaluation category tag to that code).
  • Your work must be complete before 21:00 on the day of your exam.
  • Schedule an oral exam by editing the signup page on the Student Wiki. Enter the unit that you are signing up for, and your name. You must have signed-up for an exam slot before 21:00 on the day before your exam.
 
Genome sequence analysis option
  • Start a subpage of your User page on the Student Wiki to document your analysis;
  • Work through the tasks described in the scenario, download sequence data and develop an analysis script as required. Keep your script generic, so that you could easily adapt it to analyze a different gene. Keep careful Journal notes of your activities with your analysis.
  • When you are done with everything, add the following category tag to the page:
[[Category:EVAL-INT-Genome_annotation]]
Do not change your submission page after this tag has been added. The page will be marked and the category tag will be removed by the instructor.


 


Contents

 

Scenario

 

You know that MYSPE has an Mbp1 orthologue. The key questions of functional genome annotation would be: does it work in the same way in MYSPE as in yeast? Does it have the same target genes? Is it regulated by orthologues to other yeast genes that imply the same feedback mechanisms and genetic regulatory circuits? Here we will try to deduce just one part of such questions: is the binding motif for Mbp1 conserved? If that is the case, we could automate the task to find genes that are potentially regulated by MBP1_MYSPE, if not, we would need to pursue a different strategy of binding site discovery.

Here is how we assess the conservation of the Mbp1 DNA binding motif in MYSPE, working from the orthologue of Cdc6, a pre-replicative complex component:

  • Find the MYSPE orthologue for yeast Cdc6.
  • Fetch 500 nucleotides of upstream genome sequence. (Demonstrate that this is the correct sequence by showing the first 10 translated Cdc6 codons with your sequence.)
  • The yeast Mbp1 canonical binding site is defined by the regular expression [AT]CGCG[AT].
  • Are there CGCG motifs present in your nucleotide sequence?
  • Identify them using a regular expression search. You may find the following code useful:
patt <- "..CGCG.."
m <- gregexpr(patt, mySeq)
regmatches(mySeq, m)[[1]]
  • Are there [AT]CGCG and CGCG[AT]? What about [AT]CGCG[AT]?
  • Where are they located? Do they cluster? Are they arranged in a similar way as the yeast binding sites that you visited at UCSC?
  • Interpret your finding. Does this support or refute the idea that MBP1_MYSPE has the same DNA sequence binding specificity as MBP1_SACCEE?


 


 


Further reading, links and resources

 


Notes

  1. Note: the oral exam will focus on the unit content but will also cover other material that leads up to it.


 


Self-evaluation

 



 




 

If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.



 

About ...
 
Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2017-11-19

Version:

1.0

Version history:

  • 1.0 First live version
  • 0.1 First stub

CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.