BIO Assignment 2 2011

From "A B C"
Revision as of 04:22, 28 September 2008 by WikiSysop (talk | contribs)
Jump to navigation Jump to search

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

 
 

   


   

Assignment 2 - Search, retrieve and annotate

   


Preparation, submission and due date

Read carefully.
Be sure you have understood all parts of the assignment and cover all questions in your answers! Sadly, we always get assignments back in which important aspects have simply been overlooked and marks are unnecessarily lost. Sadly, we always get assignments back in which important aspects have simply been overlooked and marks are unnecessarily lost. If you did not notice that the above sentence was repeated, you are not reading carefully enough.

Review the guidelines for preparation and submission of BCH441 assignments.

The due date for the assignment is Thursday, October 9. at 10:00 in the morning.

   

   


Introduction

Baker's yeast, Saccharomyces cerevisiae, is perhaps the most important model organism since it is a eukaryote that has been studied genetically and biochemically in great detail for many decades and it is easily manipulated with high-throughput experimental methods. We will use information from this model organism to study the conservation of function and sequence in other fungi whose genomes have been completely sequenced; the assignments are an exercise in model-organism reasoning: the transfer of knowledge from one, well-studied organism to others.

This and the following assignments will revolve around a transcription factor that plays an important role in the regulation of the cell cycle: Mbp1 is a key component of the MBF complex (Mbp1/Swi6). It regulates gene expression at the crucial G1/S-phase transition of the mitotic cell cycle and has been shown to bind to the regulatory regions of more than a hundred target genes.

One would speculate that such central control machinery would be conserved in other fungi and it will be your task in these assignments to collect evidence whether related molecular machinery is present in some of the newly sequenced fungal genomes. Throughout the assignments we will use freely available tools to conduct bioinformatics investigations of questions such as:

  • Do homologous proteins exist in other organisms?
  • Do we believe these may bind to similar sequence motifs?
  • Do we believe they may function in a similar way?
  • Do other organisms appear to have related systems?

Access the information page on Mbp1 at the Saccharomyces Genome Database and read the summary paragraph on the protein's function!

(If you would like to brush up on the concepts mentioned above, you could study the corresponding chapter in Lodish's Molecular Cell Biology. It is not strictly necessary to understand the details of the yeast cell-cycle to complete the assignments, but recommended, since it's obviously more fun to work with concepts that actually make some sense.)

In this particular assignment you will go on a search and retrieve mission for information and annotation of Mbp1 homologues in a fungal genome, using common public databases and Web resources.


Retrieve

   


The Genome (1 mark)

Access the Organism list to retrieve an organism name for this assignment. Navigate to the NCBI homepage at http://www.ncbi.nlm.nih.gov . Enter the systematic name into the search field, select the taxonomy database and identify the organism that you have been assigned.

 

  • Record the taxonomy ID for the species or strain(s) that are associated with this organism name

Return to the the NCBI home-page and navigate to the "Genomic Biology" section, continue with the link for Fungi under the Genome Projects Database. This should take you to a tabular view of ongoing and completed fungal genome sequencing projects. Find your organism name in this table. There may be one or more sequencing projects associated with this organism.

 

  • Decide which project is the most suitable one for analysis and record your decision. Report the strain and Taxonomy-ID for this organism.

If you can't identify the criteria that make one project more or less useful for your task, or you don't know which ones are more or less important, you are of course welcome to discuss your questions on the list.

Click on the organism name to navigate to the Genome Project information page.

 

  • Comment briefly on the status of the data you are working with: include information such as whether the entire genome is available or only a partial sequence? How many chromosomes does this genome have? What is the status of its genome assembly and annotation? Has the mitochondrial genome been sequenced as well?

 

APSES domain transcription factors (1 mark)

Mbp1 is a large multidomain protein; it binds DNA through a small domain called the APSES domain and many organisms have more than one transcription factor that has a domain homologous to other APSES domains. In the assignments, we will analyse how these APSES domains have evolved, to obtain a perspective on the evolution of regulatory systems in general. Accordingly we should first define an APSES domain sequence and then use it to find all its relatives in each target organism.

 
Use the NCBI Entrez system to search for the string "apses" in the "Conserved Domains" database and access the entry for the APSES domain. You should find a number of aligned sequences on that page, each with their own GI identifier.

  • Identify the two sequences that come from Saccharomyces cerevisiae (the Mbp1 and Swi4 APSES domains).

 
Access the GenPept (NCBI protein) record for the Saccharomyces cerevisiae Mbp1 protein.

  • Obtain the FASTA sequence for the whole, full-length protein, save it and paste it into your assignment.

 
Working from the APSES domain alignment in CDD, define the sequence of the entire APSES domain in the Mbp1 protein.

  • Save the sequence of the Mbp1 APSES domain in FASTA format (i.e. give it an appropriate header) and paste it into your assignment. Comment if and how it is different from the sequence you find on the CDD page.

 
Navigate back to the Genome Project Database table for fungi and click on the "B" link next to your organism. This takes you to a page with a BLAST search form. Run a BLAST search with the full-length Saccharomyces cervisiae Mbp1 protein sequence against the proteins of your organism only!

  • Record the parameters you have used for the search and the relevant search results.
  • List the accession numbers and names of all putative homologues. How many are there? How many (if any) do you expect? What do you conclude ?

 
Run a second BLAST search using only the Saccharomyces cervisiae Mbp1 APSES domain sequence.

  • Record the parameters you have used for the search and the relevant search results.
  • List the accession numbers and names of all putative homologues. How many are there? Are the results different from your previous search? How? What do you conclude ?

 

(Please contact me immediately in case you cannnot find any significant alignments - you cannot continue with the assignment if you get stuck at this point.)

 


Align

   

Sequences and accession numbers (1 mark)

Retrieve the entire protein sequences for those significant hits that you have found with the APSES domain search in your organism. The easiest way to do this is to click on the links on the BLAST results page. The NCBI does most of their internal cross-referencing with GI numbers, however these are less useful for crossreferencing to other databases.

 

Find and record

  • the GI number
  • the GenPept accession number(s) (NCBI),
  • the RefSeq identifier(s) if available,
  • and the Uniprot accession number(s) (EBI)
  • and as always, report what you have done to find this information.

You can find these database identifiers in some cases when the appropriate cross-references have been entered into the annotations. Alternatively you can run running a BLAST search with a sequence to find the exact same sequence in the other database. Depending on what you are looking for, either search at the NCBI or at the EBI. But (!) restrict the search to your assigned organism! You will have to figure out how to do that and of course report the parameters you have used. I know that this BLAST method appears to be a maximally inefficient way to retrieve a cross-reference for a sequence. However, frequently the databases are simply not providing the appropriate cross-references and the detour through a BLAST search is the only practical way to get them. Most unfortunate.

Discovering a significantly better (or at least significantly more interesting) way to obtain the cross-references and being the first one to post it on the mailing list probably will merit a bonus point.

Sequence alignment (1 mark)

Retrieve the FASTA sequence of the protein in your organism that you have found to be most similar to yeast Mbp1.

Use an online tool to generate an optimal full-length (global) alignment between the most-similar protein and S. cerevisiaeMbp1. (BLAST does not generate optimal alignments! Use the correct one of the EMBOSS tools instead.). You have to figure out where to find a Web service that does such alignments, which algorithm to use and to how to define reasonable parameters for the alignment.

 

  • Report your procedure, parameters, alignment and results, and comment on the quality of the alignment. Is the protein a full-length homologue of Mbp1?


Analyse

   

Sequence annotation (2 marks)

Annotate the amino acid sequence of your organism's Mbp1 homologue with the following online tools (most of these can be found via links on http://www.expasy.org/ ):

 

  • predicted molecular weight
  • presence of transmembrane helices (TMpred or TMHMM)
  • presence of internal repeats
  • presence of signal sequences (SignalP 3.0)
  • prediction for localization of the protein (PSORT II - make sure you use the right psort program!)
  • prediction of functional motifs and patterns (ScanProsite or InterproScan)
  • coiled coils and leucine zippers (2Zip server)
  • RPS BLAST

Briefly state for each analysis your procedure, the important results (if not obvious from the output), what the results mean, and whether your results are consistent with your expectations about this protein.

 

Homologous structure (1 mark)

The presence of a conserved APSES domain demonstrates that the sequences of your protein and all other APSES domains are homologous. We can expect that the structures of all homologous APSES domains should be similar, i.e. if the structure of even only one is known, we should be able to conclude the approximate three-dimensional structure of any one of them. Indeed, structural information is available for APSES domains!

Identify and download the most appropriate coordinate file to study the structure, function and conservation of APSES domains from the PDB.

 

  • Record how you have identified the file, what criteria you have used to define whether it is better suited for analysis than others, and paste the HEADER, TITLE, COMPND and SOURCE records from the file into your assignment.


DNA binding site (3 marks)

The Mbp1 APSES domain has been shown to bind to DNA and the residues involved in DNA binding have been characterized. (Taylor et al. (2000) Biochemistry 39: 3943-3954) . In particular the residues between 50-74 have been proposed to comprise the DNA recognition domain.

 

  • Using VMD, generate a parallel stereo view of the protein structure that clearly shows the proposed Mbp1 DNA recognition domain distinctly coloured differently from the rest of the protein. Use a representation that includes the sidechains.
  • Generate a second VMD stereo image as above, but use a representation that emphasizes the secondary structure of the structure.
  • Generate a third VMD stereo image that shows three representations combined: (1) the backbone, (2) the sidechains of residues that presumably contact DNA, distinctly colored, and (3) a transparent surface of the entire protein. This image should show whether residues annotated as DNA binding form a contiguous binding interface.

Paste the images into your assignment in a compressed format (not windows BMP!) use medium resolution JPEG, PNG or LWZ-compressed TIFF formats. Briefly(!) summarize the VMD forms and parameters you have used.


DNA binding interfaces are expected to comprise a number of positively charged amino acids, that might form salt-bridges with the phosphate backbone.

 

  • Report whether this is the case here and which residues might be included.
  • Do the DNA binding residues form a contiguous surface that is compatible with a binding interface?
  • Consider the surface exposed residues that could form part of the DNA binding interface of Mbp1 (i.e. the cationic residues you have described above and the exposed sidechains inbetween): are they conserved between Mbp1 and your protein?

 

[End of assignment]

If you have any questions at all, don't hesitate to mail me at boris.steipe@utoronto.ca or post your question to the Course Mailing List