Difference between revisions of "BIO Assignment 2 2011"

From "A B C"
Jump to navigation Jump to search
Line 1: Line 1:
<!-- {{Template:Active}} -->
+
{{Template:Active}}
{{Template:Inactive}}
+
<!-- {{Template:Inactive}} -->
  
 
&nbsp;
 
&nbsp;
Line 23: Line 23:
 
ord=second|
 
ord=second|
 
due = Thursday, October 9. at 10:00 in the morning}}
 
due = Thursday, October 9. at 10:00 in the morning}}
 +
 +
 +
;Your documentation for the procedures you follow in this assignment will be worth 1 mark.
 +
  
 
&nbsp;
 
&nbsp;
Line 32: Line 36:
 
Introduction
 
Introduction
 
</div>
 
</div>
Baker's yeast, ''Saccharomyces cerevisiae'', is perhaps the most important model organism since it is a eukaryote that has been studied genetically and biochemically in great detail for many decades and it is easily manipulated with high-throughput experimental methods. We will use information from this model organism to study the conservation of function and sequence in other fungi whose genomes have been completely sequenced; the assignments are an exercise in model-organism reasoning: the transfer of knowledge from one, well-studied organism to others.  
+
Baker's yeast, ''Saccharomyces cerevisiae'', is perhaps the most important [http://en.wikipedia.org/wiki/Model_organism model organism]. It is a eukaryote that has been studied genetically and biochemically in great detail for many decades, and it is easily manipulated with high-throughput experimental methods. We will use information from this model organism to study the conservation of function and sequence in other fungi whose genomes have been completely sequenced; the assignments are an exercise in model-organism reasoning: the transfer of knowledge from one, well-studied organism to others.  
  
This and the following assignments will revolve around a transcription factor that plays an important role in the regulation of the cell cycle: '''Mbp1''' is a key component of the MBF complex (Mbp1/Swi6). It regulates gene expression at the crucial G1/S-phase transition of the mitotic cell cycle and has been shown to bind to the regulatory regions of more than a hundred target genes.
+
This and the following assignments will revolve around a transcription factor that plays an important role in the regulation of the cell cycle: '''Mbp1''' is a key component of the MBF complex (Mbp1/Swi6). This complex regulates gene expression at the crucial G1/S-phase transition of the mitotic cell cycle and has been shown to bind to the regulatory regions of more than a hundred target genes.
  
 
One would speculate that such central control machinery would be conserved in other fungi and it will be your task in these assignments to collect evidence whether related molecular machinery is present in some of the newly sequenced fungal genomes. Throughout the assignments we will use freely available tools to conduct bioinformatics investigations of questions such as:
 
One would speculate that such central control machinery would be conserved in other fungi and it will be your task in these assignments to collect evidence whether related molecular machinery is present in some of the newly sequenced fungal genomes. Throughout the assignments we will use freely available tools to conduct bioinformatics investigations of questions such as:
 +
*What functional features can we detect in Mbp1?
 
*Do homologous proteins exist in other organisms?
 
*Do homologous proteins exist in other organisms?
*Do we believe these may bind to similar sequence motifs?
+
*Do we believe these homologues may bind to similar sequence motifs?
 
*Do we believe they may function in a similar way?
 
*Do we believe they may function in a similar way?
*Do other organisms appear to have related systems?
+
*Do other organisms appear to have related cell-cycle control systems?
 +
 
  
Access the [http://db.yeastgenome.org/cgi-bin/locus.pl?locus=mbp1 information page on Mbp1] at the ''Saccharomyces'' Genome Database and read the summary paragraph on the protein's function!
+
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
 +
*Access the [http://db.yeastgenome.org/cgi-bin/locus.pl?locus=mbp1 information page on Mbp1] at the ''Saccharomyces'' Genome Database and read the summary paragraph on the protein's function!
 +
</div>
  
 
(If you would like to brush up on the concepts mentioned above, you could study the corresponding chapter in [http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=mcb.chapter.3432 Lodish's Molecular Cell Biology]. It is not strictly necessary to understand the details of the yeast cell-cycle to complete the assignments, but recommended, since it's obviously more fun to work with concepts that actually make some sense.)
 
(If you would like to brush up on the concepts mentioned above, you could study the corresponding chapter in [http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=mcb.chapter.3432 Lodish's Molecular Cell Biology]. It is not strictly necessary to understand the details of the yeast cell-cycle to complete the assignments, but recommended, since it's obviously more fun to work with concepts that actually make some sense.)
  
In this particular assignment you will go on a search and retrieve mission for information and annotation of Mbp1 homologues in a fungal genome, using common public databases and Web resources.
+
In this particular assignment you will go on a search and retrieve mission for information on yeast Mbp1, using common public databases and Web resources.
  
  
Line 52: Line 60:
 
==Retrieve==
 
==Retrieve==
 
</div>
 
</div>
 +
 
&nbsp;
 
&nbsp;
 +
 
&nbsp;
 
&nbsp;
  
  
 +
Much useful information on yeast Mbp1 is compiled at the [http://db.yeastgenome.org/cgi-bin/locus.pl?locus=mbp1 SGD information page on Mbp1]. However we don't always have the luxury of such precompiled information. Let's look at the protein and it's features "the traditional way".
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
<div style="padding: 5px; background: #EEEEEE;  border:solid 1px #AAAAAA;">
=== The Genome (1 mark)===
+
*Navigate to the NCBI homepage (you probably have bookmarked it anyway) and enter <code>Mbp1 AND "saccharomyces cerevisiae"[organism]</code> as an Entrez query.
 +
*Click on '''Protein''' and find the RefSeq record for the protein sequence.
 +
*From the NCBI RefSeq record, obtain a FASTA sequence of the protein and paste it into your assignment.
 
</div>
 
</div>
  
The systematic name and strain of a fungus is listed with the [[Group project|project group]] that you have been assigned to. Navigate to the NCBI homepage &rarr; "Genomic Biology" &rarr; "Fungal Genomes Central" &rarr; "Genome Sequencing Projects". This should take you to a tabular view of ongoing and completed fungal genome sequencing projects. Find your organism name in this table. There may be one or more sequencing projects associated with the organism, but there should be only one project for the specific strain.
+
&nbsp;
 +
 
 +
&nbsp;
 +
 
 +
There are several sources for functional domain annotations of proteins. The NCBI has the [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml Conserved Domain Database], in Europe, the [http://smart.embl-heidelberg.de/ SMART database] provides such annotations. In terms of domains, both resources are very comparable. But SMART also analyses more general features such as low-complexity sequences and coiled coils. In order to use SMART however, we need the '''Uniprot accession number''' that corresponds to the refseq identifier. In a rational world, one would wish that such important crossreferences would simply be provided by the NCBI ... well, we have been wishing this for many years now. Fortunately ID-mapping services exist.
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
 
*Decide  which project is the most suitable one for analysis and record your decision. Report the strain and Taxonomy-ID for this organism.
 
  
 +
<div style="padding: 5px; background: #EEEEEE;">
 +
*Navigate to the [http://www.uniprot.org/?tab=mapping UniProt ID-Mapping service]. Enter the RefSeq identifier for the yeast Mbp1 protein and retrieve the corresponding UniProtKB Accession number.
 
</div>
 
</div>
  
If you can't identify the criteria that make one project more or less useful for your task, or you don't know which ones are more or less important, you are of course welcome to discuss your questions on the list.
+
&nbsp;
 
 
Click on the organism name to navigate to the Genome Project information page.
 
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
 
*Comment briefly on the status of the data you are working with: include information such as whether the entire genome is available or only a partial sequence? How many chromosomes does this genome have? What is the status of its genome assembly and annotation? Has the mitochondrial genome been sequenced as well?
 
</div>
 
 
&nbsp;
 
&nbsp;
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
  
=== APSES domain transcription factors (1 mark)===
+
==Analyse==
 
</div>
 
</div>
 +
&nbsp;
 +
&nbsp;
  
Mbp1 is a large multidomain protein; it binds DNA through a small domain called the APSES domain and many organisms have more than one transcription factor that has a domain homologous to other APSES domains. In the assignments, we will analyse how these APSES domains have evolved, to obtain a perspective on the evolution of regulatory systems in general. Accordingly we should first define an APSES domain sequence and then use it to find all its relatives in each target organism.
 
  
&nbsp;<br>
+
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
Use the NCBI Entrez system to search for the string "apses" in the "Conserved Domains" database and access the entry for the APSES domain. You should find a number of aligned sequences on that page, each with their own GI identifier.
 
  
<div style="padding: 5px; background: #EEEEEE;">
+
=== ''saccharomyces cerevisiae'' Mbp1 - domain annotations===
*Identify the two sequences that come from ''Saccharomyces cerevisiae'' (the Mbp1 and Swi4 APSES domains).
 
 
</div>
 
</div>
  
&nbsp;<br>
+
Now we can analyse Mbp1's domain in SMART, and se this information to annotate the sequence in detail.
Access the GenPept (NCBI protein) record for the ''Saccharomyces cerevisiae'' Mbp1 protein.  
 
  
 
<div style="padding: 5px; background: #EEEEEE;">
 
<div style="padding: 5px; background: #EEEEEE;">
*Obtain the FASTA sequence for the whole, full-length protein, save it and paste it into your assignment.
+
*Navigate to the [http://smart.embl-heidelberg.de/ SMART database], enter the yeast Mbp1 accession number and review the domain features of the protein.
 +
*In your assignment, highlight the annotated features in the actual sequence by using the SMART annotations.
 
</div>
 
</div>
  
&nbsp;<br>
+
&nbsp;
Working from the APSES domain alignment in CDD, define the sequence of the entire APSES domain in the Mbp1 protein.
 
  
<div style="padding: 5px; background: #EEEEEE;">
+
&nbsp;
*Save the sequence of the Mbp1 APSES domain in FASTA format (i.e. give it an appropriate header) and paste it into your assignment. Comment if and how it is different from the sequence you find on the CDD page.
 
</div>
 
  
&nbsp;<br>
 
Navigate back to the Genome Project Database table for fungi and click on the '''"B"''' link next to your organism. This takes you to a page with a BLAST search form. Run a BLAST search with the full-length ''Saccharomyces cervisiae'' Mbp1 protein sequence against the proteins of your organism only!
 
<!-- (In the case of Schizosaccharomyces pombe, it appears the link is not included on that page but you have to go through the main BLAST page and use the drop down selection in the "Options" field to limit your query to S. pombe). -->
 
  
<div style="padding: 5px; background: #EEEEEE;">
+
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
*Record the parameters you have used for the search and the relevant search results.
 
  
*List the accession numbers and names of all putative homologues. How many are there? How many (if any) do you expect? What do you conclude ?
+
=== APSES domains ===
 
</div>
 
</div>
 +
 +
As you see from the annotations, Mbp1 is a large multidomain protein; it binds DNA through a small domain called the APSES domain and many organisms have more than one transcription factor that has a domain homologous to other APSES domains. Since we are interested in related proteins, and all functional relatives would be expected to share such a DNA binding domain, we should define this domain in more detail in order to be able to use it later to search for homologous proteins in each target organism.
  
 
&nbsp;<br>
 
&nbsp;<br>
Run a second BLAST search using only the ''Saccharomyces cervisiae'' Mbp1 APSES domain sequence.
+
Use the NCBI Entrez system to search for the string "apses" in the "Conserved Domains" database and access the entry for the APSES domain. You should find a number of aligned sequences on that page, each with their own GI identifier.  
  
 
<div style="padding: 5px; background: #EEEEEE;">
 
<div style="padding: 5px; background: #EEEEEE;">
*Record the parameters you have used for the search and the relevant search results.
+
*Identify the two sequences that come from ''Saccharomyces cerevisiae'' (the Mbp1 and Swi4 APSES domains).
 
+
*Check whether the NCBI and the SMART definition of the APSES domain in Mbp1 coincide.
*List the accession numbers and names of all putative homologues. How many are there? Are the results different from your previous search? How? What do you conclude ?
+
*Make sure you understand how the sequences displayed on the CDD page and the actual domain sequences differ. <small>Hint: not all sequences are displayed in their full-length.</small>
 
</div>
 
</div>
  
 
&nbsp;
 
&nbsp;
 
;(Please contact me immediately in case you cannnot find any significant alignments - you cannot continue with the assignment if you get stuck at this point.)
 
  
 
&nbsp;
 
&nbsp;
  
 +
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
  
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
+
=== Homologous structure ===
 
 
==Align==
 
 
</div>
 
</div>
&nbsp;
 
&nbsp;
 
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
The presence of a conserved APSES domain demonstrates that the sequences of ''your'' protein and all other APSES domains are homologous. We can expect that the structures of all homologous APSES domains should be similar, i.e. if the structure of even only one is known, we should be able to conclude the approximate three-dimensional structure of any one of them. Indeed, structural information ''is'' available for APSES domains!
=== Sequences and accession numbers (1 mark)===
 
</div>
 
  
Retrieve the entire protein sequences for those significant hits that you have found with the APSES domain search in your organism. The easiest way to do this is to click on the links on the BLAST results page. The NCBI does most of their internal cross-referencing with GI numbers, however these are less useful for crossreferencing to other databases.
+
Identify and download the most appropriate coordinate file to study the structure, function and conservation of APSES domains from the PDB. Your choice could be based on:
 +
* experimental method (X-ray or NMR)
 +
* quality of the structure (resolution, refinement)
 +
* size of the structure (number of animo acids for which structure has been determined)
  
 
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
 
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
Find  and record
+
*Record how you have identified the file, what criteria you have used to define whether it is better suited for analysis than others, and paste the <tt>HEADER</tt>, <tt>TITLE</tt>, <tt>COMPND</tt> and <tt>SOURCE</tt> records from the file into your assignment.
* the GI number
 
*the GenPept accession number(s) ('''NCBI'''),
 
*the RefSeq identifier(s) if available,
 
*and the Uniprot accession number(s) ('''EBI''')
 
*and as always, report what you have done to find this information.
 
 
</div>
 
</div>
  
You can find these database identifiers in some cases when the appropriate cross-references have been entered into the annotations. Alternatively you can run running a BLAST search with a sequence to find the exact same sequence in the other database. Depending on what you are looking for, either search at the [http://www.ncbi.nlm.nih.gov/blast NCBI] or at the [http://www.ebi.ac.uk/blast/ EBI]. But (!) restrict the search to your assigned organism! You will have to figure out how to do that and of course report the parameters you have used. <small>''I know that this BLAST method appears to be a maximally inefficient way to retrieve a cross-reference for a sequence. However, frequently the databases are simply not providing the appropriate cross-references and the detour through a BLAST search is the only practical way to get them. Most unfortunate.''</small>
 
 
<small>'''Discovering''' a significantly better (or at least significantly more interesting) way to obtain the cross-references '''and''' being the first one to post it on the mailing list probably will merit a bonus point.</small>
 
  
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
  
=== Sequence alignment (1 mark)===
+
=== DNA binding site ===
 
</div>
 
</div>
  
Retrieve the FASTA sequence of the protein in your organism that you have found to be most similar to yeast Mbp1.
+
The Mbp1 APSES domain has been shown to bind to DNA and the residues involved in DNA binding have been characterized. ([http://www.ncbi.nlm.nih.gov/pubmed/10747782 Taylor ''et al.'' (2000) ''Biochemistry'' '''39''': 3943-3954] and [http://www.ncbi.nlm.nih.gov/pubmed/18491920 Deleeuw ''et al.'' (2008) Biochemistry. '''47''':6378-6385]) . In particular the residues between 50-74 have been proposed to comprise the DNA recognition domain.
  
Use an online tool to generate an optimal full-length (global) alignment between the most-similar protein and  ''S. cerevisiae''Mbp1. (BLAST does not generate ''optimal'' alignments! Use the correct one of the EMBOSS tools instead.). You have to figure out where to find a Web service that does such alignments, which algorithm to use and to how to define reasonable parameters for the alignment.
+
&nbsp;<br><div style="padding: 5px; background: #FFCC99;">
 +
;Analysis (1 mark)
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
* Using VMD, generate a parallel stereo view of the protein structure that clearly shows the proposed Mbp1 DNA recognition domain, distinctly coloured differently from the rest of the protein. Use a representation that includes the sidechains.
*Report your procedure, parameters, alignment and results, and comment on the quality of the alignment. Is the protein a full-length homologue of Mbp1?
 
</div>
 
  
 +
* Generate a second VMD stereo image as above, but use a representation that emphasizes the secondary structure of the structure (tube or cartoon representation, colouring by structure).
  
<div style="padding: 5px; background: #BDC3DC; border:solid 1px #AAAAAA;">
+
* Generate a third VMD stereo image that shows three representations combined: (1) the backbone, (2) the sidechains of residues that presumably contact DNA, distinctly colored, and (3) a transparent surface of the entire protein. This image should show whether residues annotated as DNA binding form a contiguous binding interface.
  
==Analyse==
+
Paste the images into your assignment in a compressed format. Briefly(!) summarize the VMD forms and parameters you have used.
 
</div>
 
</div>
&nbsp;
 
&nbsp;
 
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
=== Sequence annotation (2 marks)===
 
</div>
 
  
Annotate the amino acid sequence of your organism's Mbp1 homologue with the following online tools (most of these can be found via links on http://www.expasy.org/ ):
+
DNA binding interfaces are expected to comprise a number of positively charged amino acids, that might form salt-bridges with the phosphate backbone.  
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
&nbsp;<br><div style="padding: 5px; background: #FFCC99;">
*predicted molecular weight
+
;Analysis (2 marks)
*presence of transmembrane helices (TMpred or TMHMM)
 
*presence of internal repeats
 
*presence of signal sequences (SignalP 3.0)
 
*prediction for localization of the protein (PSORT II - make sure you use the right psort program!)
 
*prediction of functional motifs and patterns (ScanProsite or InterproScan)
 
*coiled coils and leucine zippers ([http://2zip.molgen.mpg.de/index.html 2Zip server])
 
*RPS BLAST
 
  
Briefly state for each analysis your procedure, the important results (if not obvious from the output), what the results mean, and whether your results are consistent with your expectations about this protein.
+
*Report whether this is the case here and which residues might be included.
</div>
 
 
 
&nbsp;
 
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
*Do the DNA binding residues form a contiguous surface that is compatible with a binding interface? Justify your conclusions.
  
=== Homologous structure (1 mark)===
 
 
</div>
 
</div>
  
The presence of a conserved APSES domain demonstrates that the sequences of ''your'' protein and all other APSES domains are homologous. We can expect that the structures of all homologous APSES domains should be similar, i.e. if the structure of even only one is known, we should be able to conclude the approximate three-dimensional structure of any one of them. Indeed, structural information ''is'' available for APSES domains!
+
&nbsp;
  
Identify and download the most appropriate coordinate file to study the structure, function and conservation of APSES domains from the PDB.
+
&nbsp;
 
 
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
 
*Record how you have identified the file, what criteria you have used to define whether it is better suited for analysis than others, and paste the <tt>HEADER</tt>,  <tt>TITLE</tt>,  <tt>COMPND</tt> and  <tt>SOURCE</tt> records from the file into your assignment.
 
</div>
 
  
  
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
+
=== The Genome of Interest ===
=== DNA binding site (3 marks)===
 
 
</div>
 
</div>
  
The Mbp1 APSES domain has been shown to bind to DNA and the residues involved in DNA binding have been characterized. ([http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=10747782 Taylor et al. (2000) Biochemistry 39: 3943-3954]) . In particular the residues between 50-74 have been proposed to comprise the DNA recognition domain.
+
Up to now, we have looked at the model-organism gene to obtain a baseline of information we are interested in. To move on, we need to access the genome of an organism we are interested in. In this course, the organism of interest is assigned to you.
  
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
+
The systematic name and strain of a fungus is listed with the [[Group project|project group]] that you have been assigned to. Navigate to the NCBI homepage &rarr; "Genomic Biology" &rarr; "Fungal Genomes Central" &rarr; "Genome Sequencing Projects". This should take you to a tabular view of ongoing and completed fungal genome sequencing projects. Find your organism name in this table. There may be one or more sequencing projects associated with the organism, but there should be only one project for the specific strain.
* Using VMD, generate a parallel stereo view of the protein structure that clearly shows the proposed Mbp1 DNA recognition domain distinctly coloured differently from the rest of the protein. Use a representation that includes the sidechains.
 
  
* Generate a second VMD stereo image as above, but use a representation that emphasizes the secondary structure of the structure.
+
Click on the organism name to navigate to the Genome Project information page.
 
 
* Generate a third VMD stereo image  that shows three representations combined: (1) the backbone, (2) the sidechains of residues that presumably contact DNA, distinctly colored, and (3) a transparent surface of the entire protein. This image should show whether residues annotated as DNA binding form a contiguous binding interface.
 
 
 
Paste the images into your assignment in a compressed format (not windows BMP!) use medium resolution JPEG, PNG or LWZ-compressed TIFF formats. Briefly(!) summarize the VMD forms and parameters you have used.
 
</div>
 
 
 
 
 
DNA binding interfaces are expected to comprise a number of positively charged amino acids, that might form salt-bridges with the phosphate backbone.  
 
  
 
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
 
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
*Report whether this is the case here and which residues might be included.
+
*Review the status of the data you are working with - such as
 
+
**whether the entire genome is available or only a partial sequence;
*Do the DNA binding residues form a contiguous surface that is compatible with a binding interface?
+
**How many chromosomes does this genome have?
 
+
**What is the status of its genome assembly and annotation?
*Consider the surface exposed residues that could form part of the DNA binding interface of Mbp1 (i.e. the cationic residues you have described above and the '''exposed''' sidechains inbetween): are they conserved between Mbp1 and your protein?
+
**Has the mitochondrial genome been sequenced as well?
 +
**Why is this organism deemed important enough to be sequenced?
 
</div>
 
</div>
  
 
&nbsp;
 
&nbsp;
 +
  
 
<div style="padding: 5px; background: #D3D8E8;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #D3D8E8;  border:solid 1px #AAAAAA;">
Line 247: Line 213:
 
</div>
 
</div>
  
If you have any questions at all, don't hesitate to mail me at [mailto:boris.steipe@utoronto.ca boris.steipe@utoronto.ca] or post your question to the [mailto:bch441_2006@googlegroups.com Course Mailing List]
+
If you have any questions at all, don't hesitate to mail me at [mailto:boris.steipe@utoronto.ca boris.steipe@utoronto.ca] or post your question to the [mailto:bch441_2008@googlegroups.com Course Mailing List]

Revision as of 04:57, 29 September 2008

Note! This assignment is currently active. All significant changes will be announced on the mailing list.

 
 

   


   

Assignment 2 - Search, retrieve and annotate

   


Preparation, submission and due date

Read carefully.
Be sure you have understood all parts of the assignment and cover all questions in your answers! Sadly, we always get assignments back in which important aspects have simply been overlooked and marks are unnecessarily lost. Sadly, we always get assignments back in which important aspects have simply been overlooked and marks are unnecessarily lost. If you did not notice that the above sentence was repeated, you are not reading carefully enough.

Review the guidelines for preparation and submission of BCH441 assignments.

The due date for the assignment is Thursday, October 9. at 10:00 in the morning.

   


Your documentation for the procedures you follow in this assignment will be worth 1 mark.


   


Introduction

Baker's yeast, Saccharomyces cerevisiae, is perhaps the most important model organism. It is a eukaryote that has been studied genetically and biochemically in great detail for many decades, and it is easily manipulated with high-throughput experimental methods. We will use information from this model organism to study the conservation of function and sequence in other fungi whose genomes have been completely sequenced; the assignments are an exercise in model-organism reasoning: the transfer of knowledge from one, well-studied organism to others.

This and the following assignments will revolve around a transcription factor that plays an important role in the regulation of the cell cycle: Mbp1 is a key component of the MBF complex (Mbp1/Swi6). This complex regulates gene expression at the crucial G1/S-phase transition of the mitotic cell cycle and has been shown to bind to the regulatory regions of more than a hundred target genes.

One would speculate that such central control machinery would be conserved in other fungi and it will be your task in these assignments to collect evidence whether related molecular machinery is present in some of the newly sequenced fungal genomes. Throughout the assignments we will use freely available tools to conduct bioinformatics investigations of questions such as:

  • What functional features can we detect in Mbp1?
  • Do homologous proteins exist in other organisms?
  • Do we believe these homologues may bind to similar sequence motifs?
  • Do we believe they may function in a similar way?
  • Do other organisms appear to have related cell-cycle control systems?


 

  • Access the information page on Mbp1 at the Saccharomyces Genome Database and read the summary paragraph on the protein's function!

(If you would like to brush up on the concepts mentioned above, you could study the corresponding chapter in Lodish's Molecular Cell Biology. It is not strictly necessary to understand the details of the yeast cell-cycle to complete the assignments, but recommended, since it's obviously more fun to work with concepts that actually make some sense.)

In this particular assignment you will go on a search and retrieve mission for information on yeast Mbp1, using common public databases and Web resources.


Retrieve

 

 


Much useful information on yeast Mbp1 is compiled at the SGD information page on Mbp1. However we don't always have the luxury of such precompiled information. Let's look at the protein and it's features "the traditional way".

  • Navigate to the NCBI homepage (you probably have bookmarked it anyway) and enter Mbp1 AND "saccharomyces cerevisiae"[organism] as an Entrez query.
  • Click on Protein and find the RefSeq record for the protein sequence.
  • From the NCBI RefSeq record, obtain a FASTA sequence of the protein and paste it into your assignment.

 

 

There are several sources for functional domain annotations of proteins. The NCBI has the Conserved Domain Database, in Europe, the SMART database provides such annotations. In terms of domains, both resources are very comparable. But SMART also analyses more general features such as low-complexity sequences and coiled coils. In order to use SMART however, we need the Uniprot accession number that corresponds to the refseq identifier. In a rational world, one would wish that such important crossreferences would simply be provided by the NCBI ... well, we have been wishing this for many years now. Fortunately ID-mapping services exist.


  • Navigate to the UniProt ID-Mapping service. Enter the RefSeq identifier for the yeast Mbp1 protein and retrieve the corresponding UniProtKB Accession number.

 

 

Analyse

   


saccharomyces cerevisiae Mbp1 - domain annotations

Now we can analyse Mbp1's domain in SMART, and se this information to annotate the sequence in detail.

  • Navigate to the SMART database, enter the yeast Mbp1 accession number and review the domain features of the protein.
  • In your assignment, highlight the annotated features in the actual sequence by using the SMART annotations.

 

 


APSES domains

As you see from the annotations, Mbp1 is a large multidomain protein; it binds DNA through a small domain called the APSES domain and many organisms have more than one transcription factor that has a domain homologous to other APSES domains. Since we are interested in related proteins, and all functional relatives would be expected to share such a DNA binding domain, we should define this domain in more detail in order to be able to use it later to search for homologous proteins in each target organism.

 
Use the NCBI Entrez system to search for the string "apses" in the "Conserved Domains" database and access the entry for the APSES domain. You should find a number of aligned sequences on that page, each with their own GI identifier.

  • Identify the two sequences that come from Saccharomyces cerevisiae (the Mbp1 and Swi4 APSES domains).
  • Check whether the NCBI and the SMART definition of the APSES domain in Mbp1 coincide.
  • Make sure you understand how the sequences displayed on the CDD page and the actual domain sequences differ. Hint: not all sequences are displayed in their full-length.

 

 

Homologous structure

The presence of a conserved APSES domain demonstrates that the sequences of your protein and all other APSES domains are homologous. We can expect that the structures of all homologous APSES domains should be similar, i.e. if the structure of even only one is known, we should be able to conclude the approximate three-dimensional structure of any one of them. Indeed, structural information is available for APSES domains!

Identify and download the most appropriate coordinate file to study the structure, function and conservation of APSES domains from the PDB. Your choice could be based on:

  • experimental method (X-ray or NMR)
  • quality of the structure (resolution, refinement)
  • size of the structure (number of animo acids for which structure has been determined)

 

  • Record how you have identified the file, what criteria you have used to define whether it is better suited for analysis than others, and paste the HEADER, TITLE, COMPND and SOURCE records from the file into your assignment.


DNA binding site

The Mbp1 APSES domain has been shown to bind to DNA and the residues involved in DNA binding have been characterized. (Taylor et al. (2000) Biochemistry 39: 3943-3954 and Deleeuw et al. (2008) Biochemistry. 47:6378-6385) . In particular the residues between 50-74 have been proposed to comprise the DNA recognition domain.

 

Analysis (1 mark)
  • Using VMD, generate a parallel stereo view of the protein structure that clearly shows the proposed Mbp1 DNA recognition domain, distinctly coloured differently from the rest of the protein. Use a representation that includes the sidechains.
  • Generate a second VMD stereo image as above, but use a representation that emphasizes the secondary structure of the structure (tube or cartoon representation, colouring by structure).
  • Generate a third VMD stereo image that shows three representations combined: (1) the backbone, (2) the sidechains of residues that presumably contact DNA, distinctly colored, and (3) a transparent surface of the entire protein. This image should show whether residues annotated as DNA binding form a contiguous binding interface.

Paste the images into your assignment in a compressed format. Briefly(!) summarize the VMD forms and parameters you have used.


DNA binding interfaces are expected to comprise a number of positively charged amino acids, that might form salt-bridges with the phosphate backbone.

 

Analysis (2 marks)
  • Report whether this is the case here and which residues might be included.
  • Do the DNA binding residues form a contiguous surface that is compatible with a binding interface? Justify your conclusions.

 

 


The Genome of Interest

Up to now, we have looked at the model-organism gene to obtain a baseline of information we are interested in. To move on, we need to access the genome of an organism we are interested in. In this course, the organism of interest is assigned to you.

The systematic name and strain of a fungus is listed with the project group that you have been assigned to. Navigate to the NCBI homepage → "Genomic Biology" → "Fungal Genomes Central" → "Genome Sequencing Projects". This should take you to a tabular view of ongoing and completed fungal genome sequencing projects. Find your organism name in this table. There may be one or more sequencing projects associated with the organism, but there should be only one project for the specific strain.

Click on the organism name to navigate to the Genome Project information page.

 

  • Review the status of the data you are working with - such as
    • whether the entire genome is available or only a partial sequence;
    • How many chromosomes does this genome have?
    • What is the status of its genome assembly and annotation?
    • Has the mitochondrial genome been sequenced as well?
    • Why is this organism deemed important enough to be sequenced?

 


[End of assignment]

If you have any questions at all, don't hesitate to mail me at boris.steipe@utoronto.ca or post your question to the Course Mailing List