ABC-INT-Genome annotation
Integrator Unit: Genome annotation
(Integrator unit: annotate sequences in a genome)
Abstract:
This page assesses the learning units for data management and sequence analysis of genomic sequence data.
Deliverables:
Prerequisites:
This unit builds on material covered in the following prerequisite units:
This page still needs to undergo revisions. Do not work on these tasks yet, and do not prepare contents for submission.
Contents
Evaluation
This "Integrator Unit" should be submitted for evaluation for a maximum of 8 marks if one of the written deliverables is chosen, resp. 16 marks for the oral test[1].
- Please note the evaluation types that are available as options for this unit. Choose one evaluation type that you have not chosen for another Integrator Unit. (Each submitted Integrator Unit must be evaluated in a different way and one of your evaluations - but not your first one - must be an oral test).
- Interview option
- Identify a laboratory whose work includes genome annotation, or re-annotation. Get in touch with the PI, a postdoc or senior graduate student in the laboratory and interview them in person or by eMail. Find out
- why this work is important;
- how they approach it methodologically;
- in particular, what features they are looking for, and what discoveries can be made by looking for these features (get very specific on that point, we are most interested in strategies for interpretation of data);
- what they have recently learned;
- what the major challenges, current discussions, or controversies are.
- write up your interview on a subpage of your User page of the Student Wiki;
- add information that may be required to understand the methodology;
- make sure that you have included important literature references.
- When you are done with everything, add the following category tag to the end of page:
 - [[Category:EVAL-INT-Genome_annotation]].
 
Once the page has been saved with this tag, it is considered "submitted". Do not change your submission after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
- Literature research option
- This option requires that a primary publication is available for the MYSPE genome sequence; if there is none, this option is not available.
- Write a report on the annotation methodology that was used for the MYSPE genome. Note: this is not a review, but a report. Think of a "whitepaper", not a publication. Write to a specialist technical audience - imagine collaborators who want to use the same methods - and be specific to provide actionable information.
- write your report on a subpage of your User page of the Student Wiki;
- make sure that you have included all references and citations.
- When you are done with everything, add the following category tag to the end of page:
 - [[Category:EVAL-INT-Genome_annotation]].
 
Once the page has been saved with this tag, it is considered "submitted". Do not change your submission after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
- Oral test option
- Work through the tasks described in the scenario. Remember to document your work in your journal.
- Part of your task will involve writing an R script, place that code in a subpage of your User page on the Student Wiki and link to it from your Journal. (Do not add an evaluation category tag to that code).
- Your work must be complete before 21:00 on the day before your exam.
- Schedule an oral test by editing the signup page on the Student Wiki. Enter the unit that you are signing up for, and your name. You must have signed-up for an exam slot before 21:00 on the day before your exam.
- Genome sequence analysis option
- Start a subpage of your User page on the Student Wiki to document your analysis;
- Work through the tasks described in the scenario, download sequence data and develop an analysis script as required. Keep your script generic, so that you could easily adapt it to analyze a different gene. Keep careful Journal notes of your activities with your analysis.
- When you are done with everything, add the following category tag to the end of page:
- [[Category:EVAL-INT-Genome_annotation]].
 
Once the page has been saved with this tag, it is considered "submitted". Do not change your submission after this tag has been added. The page will be marked and the category tag will be removed by the instructor.
Contents
Scenario
You know that MYSPE has an Mbp1 orthologue. The key questions of functional genome annotation would be: does it work in the same way in MYSPE as in yeast? Does it have the same target genes? Is it regulated by orthologues to other yeast genes that imply the same feedback mechanisms and genetic regulatory circuits? Here we will try to deduce just one part of such questions: is the binding motif for Mbp1 conserved? If that is the case, we could automate the task to find genes that are potentially regulated by MBP1_MYSPE, if not, we would need to pursue a different strategy of binding site discovery.
Here is how we assess the conservation of the Mbp1 DNA binding motif in MYSPE, working from the orthologue of CDC6, a pre-replicative complex component:
- Find the MYSPE orthologue for yeast CDC6.
- Fetch 500 nucleotides of upstream genome sequence. (Demonstrate that this is the correct sequence by showing the first 10 translated CDC6 codons with your sequence.)
- The yeast Mbp1 canonical binding site is defined by the regular expression [AT]CGCG[AT].
- Are there CGCG motifs present in your nucleotide sequence?
- Identify them using a regular expression search. You may find the following code useful:
patt <- "..CGCG.."
m <- gregexpr(patt, mySeq)
regmatches(mySeq, m)[[1]]- Are there [AT]CGCG or CGCG[AT] motifs? What about [AT]CGCG[AT]?
- Where are they located? Do they cluster? Are they arranged in a similar way as the yeast binding sites that you visited at UCSC?
- Interpret your finding. Does this support or refute the idea that MBP1_MYSPE has the same DNA sequence binding specificity as MBP1_SACCEE?
Self-evaluation
Further reading, links and resources
Notes
- ↑ Note: the oral test will focus on the unit content but will also cover other material that leads up to it.
About ... 
 
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2018-12-18
Version:
- 1.0.1
Version history:
- 1.0.1 Capitalize CDC6
- 1.0 First live version
- 0.1 First stub
 This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.
 This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.
