Expected Preparations:
|
|||||||||||
|
|||||||||||
Keywords: Integrator unit: annotate sequences in a genome | |||||||||||
|
|||||||||||
Objectives:
|
Outcomes:
|
||||||||||
|
|||||||||||
Deliverables: Integrator unit: Deliverables can be submitted for course marks. See below for details. |
|||||||||||
|
|||||||||||
Evaluation: Material based on this Integrator Unit can be submitted for summative feedback (course marks). It will be marked for a maximum of 18 marks for a regular submission, resp. 36 marks if you choose this for your Oral Test1. For your report:
If you choose this unit for your Oral Test option:
|
This page integrates concepts and methods for data management and sequence analysis of genomic sequence data.
You know that MYSPE has an orthologue of yeast Mbp1. That’s very
useful knowledge: yeast is a well studied model organism, and
its target genes for most transcription factors have been experimentally
determined. If MYSPE has regulatory genetic circuits that are conserved
among fungi, you could perform functional genome
annotation based on orthology to yeast genes. Thus you might
ask questions like: does regulation work in the same way in MYSPE as in
yeast? Does it have the same target genes? Are MYSPE target genes of
MBP1_MYSPE
co-regulated by orthologues to other yeast
genes, which would imply conserved feedback mechanisms and genetic
regulatory circuits?
Here we will try to deduce just one part of such inquiry: is the binding motif for yeast Mbp1 conserved in a MYSPE orthologue of a S. cerevisiae target gene? If that is the case, we could automate the task to find genes that are potentially regulated by MBP1_MYSPE, if not, we would need to pursue a different strategy of binding site discovery.
Here is how we could develop an analysis of the conservation of the Mbp1 DNA binding motif in MYSPE manually.
biomart::
code. Manual selection and copy/paste
from a sequence database record is not acceptable for this
task.The yeast Mbp1 canonical binding site is defined by the regular
expression "[AT]CGCG[AT]"
. (Please review RPR-RegEx
if you are not sure about the meaning of "["
and
"."
in a regular expression.) In your report note:
CGCG
motifs present in your nucleotide
sequence?gregexpr()
and
regmatches()
. The following code-sample may get you
started:patt <- "..CGCG.."
m <- gregexpr(patt, mySeq)
regmatches(mySeq, m)[[1]]
[AT]CGCG
or CGCG[AT]
motifs?
What about [AT]CGCG[AT]
?MBP1_MYSPE
targets.
The annotation below is based on the Sporothrix Schenckii orthologue of CDC6. CDC is a pre-replicative complex component that is one of Mbp1’s target genes, and it is highly conserved. This sample demonstrates the required formatting and level of detail for a valid submission.
ATG
: range 1255377 .. 1255379>ref|NW_015971139.1|:1254877-1255406 Sporothrix schenckii 1099-18 chromosome Unknown Cont38, whole genome shotgun sequence
5'-TCCACCAAACTAGTCGGGCGAGCTGAACTATGTCGTCCGCCATTTAAAGC
CCACTGTACGAATAGCGCAATACTGTAGACGACCGCACAGTGTATCTGTG
GCTAGTGTGCAAGCACGCGCCACGGCAGCTGGGCGGGTCTGGGGTCAATC
=====x
CTCCCACGTACGCGTAAAACCGCCAACGCGTCCAGCAATGGCAGGGGTAA
======
GTCAGTCGCGCTTTCTTCGCGTAAAGTGGTTCCTCTATTTGGCGCGCGCT
=====x
TCCTCATTAAATCTTGTACCTCCCTTGGCCACCATCTTGAACTTTCCTTC
GTGCTTTCCACGTTTGACTTCATTCCCTGTTACTTCCATTTTGTCCATTC
TTGCGACTGTCTATTCTTTCTTTGCGAGCATCTACGCATCTATCCATCGT
TCTTTCCGTTGTATGCATCTACGTCGCTGTTCTTGCCATTGCTTTACCCC
TTTCTTTAAACCCTTCCTCCTTTGCTCTTTCCTCACCACACACTACAAAC
ATG GTT GCT TCC TCG CTC GGA AAG CGG ATC..... -3'
M V A S S L G K R I ...
If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.
Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.
[END]
Note: the oral test is cumulative. It will focus on the content of this unit but will also cover other material that leads up to it.↩︎
Please note: if you can’t demonstrate that you are working with the correct sequence, there is no point in continuing to search for putative binding motifs. Even if you would find one, that would be meaningless, because it would be in the wrong context. Please resist any temptation to edit or otherwise manipulate the sequence: that would be an academic offence. The sequence you show must be exactly the sequence you have downloaded from the database, and your links must work and produce exactly the correct sequence. If you can’t get this to work, contact me to resolve the problem.↩︎
Be wary of off-by-one errors: the range
10..20
spans eleven nucleotides, not ten.↩︎
Just claiming “yes” or “no” is not sufficient to discuss a similar arrangement: you need to give specifics, such as number of sites and their quality, distance to start, distance to each other, overlap … etc.↩︎