BIO Assignment 2 2011

From "A B C"
Jump to navigation Jump to search

   

Assignment 2 - Search, retrieve and annnotate

Note: This assignment is currently inactive. Unannounced changes may be made at any time.

Introduction

Baker's yeast, Saccharomyces cerevisiae, is perhaps the most important model organism since it is a eukaryote that has been studied genetically and biochemically in great detail for many decades and it is easily manipulated with high-throughput experimental methods. We will use information from this model organism to study the conservation of function and sequence in other fungi whose genomes have been completely sequenced. This and the following assignments will revolve around a transcription factor that plays an important role in the regulation of the cell cycle: Mbp1, a key component of the MBF complex (Mbp1/Swi6) that regulates gene expression at the crucial G1/S-phase transition of the mitotic cell cycle and has been shown to bind to the regulatory regions of more than a hundred target genes.

One would assume that such control machinery would be conserved in other fungi and it will be your task in these assignments to collect evidence whether related molecular machinery is present in some of the newly sequenced fungal genomes.

(If you need to brush up on the concepts mentioned above, you could study the corresponding chapter in Lodish's Molecular Cell Biology. It is not strictly necessary to understand the details of the yeast cell-cycle to complete the assignments, but highly recommended, if this all is to make some sense.)

In this particular assignment you will go on a search and retrieve mission for information and annotation of Mbp1 homologues in a fungal genome, using common public databases and Web resources.


Preparation, submission and due date

Read carefully. Be sure you have understood all parts of the assignment and cover all questions in your answers! Sadly, we always get assignments back in which important aspects have simply been overlooked and marks are unnecessarily lost. Sadly, we always get assignments back in which important aspects have simply been overlooked and marks are unnecessarily lost. If you did not notice that the above sentence was repeated, you are not reading carefully enough.

Prepare a Microsoft Word document with a title page that contains:

  • your full name
  • your Student ID
  • your e-mail address
  • the organism name you have been assigned (see below)

Follow the steps outlined below. You are encouraged to write your answers in short answer form or point form, like you would document an analysis in a laboratory notebook. However, you must

  • document what you have done,
  • note what Web sites and tools you have used,
  • paste important data sequences, alignments, information etc.

If you do not document the process of your work, we will deduct marks. Try to be concise, not wordy! Use your judgement: are you giving us enough information so we could exectly reproduce what you have done?

Write your answers into separate paragraphs and give each its title. Save your document with a filename of: A2_{lastname}.{firstname}.doc (for example my first assignment would be named: A2_steipe.boris.doc - and don't include the brackets this time, please!)

Finally e-mail the document to [boris.steipe@utoronto.ca] before the due date.

Your document must not contain macros. Please turn off and/or remove all macros from your Word document; we will disable macros, since they pose a security risk.

With the number of students in the course, we have to economize on processing the assignments. Thus we will not accept assignments that are not prepared as described above. If you have technical difficulties, contact me.

The due date for the assignment is Thursday, October 19. at 10:00 in the morning.

Grading

Don't wait until the last day to find out there are problems! Assignments that are received past the due date will have one mark deducted at the first minute of every twelve hour period past the due date. Assignments received more than 5 days past the due date will not be assessed.

Marks are noted below in the section headings for of the tasks. A total of 10 marks will be awarded, if your assignment answers all of the questions. A total of 2 bonus marks (up to a maximum of 10 overall) can be awarded for particularily interesting findings, or insightful comments. A total of 2 marks can be subtracted for lack of form or for glaring errors. The marks you receive will

  • count directly towards your final marks at the end of term, for BCH441 (undergraduates), or
  • be divided by two for BCH1441 (graduates).

   


Retrieve

   

The Genome (1 mark)

Access the Organism list to retrieve an organism name for this assignment. Navigate to the NCBI homepage at http://www.ncbi.nlm.nih.gov . Enter the systematic name into the search field, select the taxonomy database and identify the organism that you have been assigned. If there is a "Taxonomy Browser" page for your organism, you may find a link to Genome Projects on that page. If not ...

... enter the NCBI Web-site and navigate to the "Genomic Biology" pages, continue with the link for Fungi under the Genome Projects Database. This should take you to a tabular view of ongoing and completed fungal genome sequencing projects.

Follow the link attached to the organism name to navigate to the Genome Project information.

Either way you should arrive at the Genome Project page for your organism.

 

  • Report the taxonomy ID for the strain whose genome has been sequenced (found on one of the previous pages).
  • Comment briefly on the status of the data you are working with: include information such as whether the entire genome is available or only a partial sequence? How many chromosomes does this genome have? What is the status of its genome assembly and annotation? Has the mitochondrial genome been sequenced as well?

 

 

APSES domain transcription factors (1 mark)


Mbp1 is a large multidomain protein, however it and its relatives bind DNA through a small domain called the APSES domain. In the assignments, we will analyse how these APSES domains have evolved, since this gives us a picture on the evolution of regulatory systems in general. Accordingly we should first define an APSES domain sequence and then use it to find all its relatives in each target organism.

Use the Entrez system to search for the string "apses" in the "Conserved Domains" database. You should find a number of aligned sequences on that page, each with their own GI identifier. Two of these sequences are from Saccharomyces cerevisiae (the Mbp1 and Swi4 APSES domains). Identify these two.

Navigate back to the Genome Project Database table for fungi and click on the "B" link next to your organism. This takes you to a page with a BLAST search form. Run a BLAST search with the Saccharomyces cervisiae Mbp1 APSES domain sequence against the proteins encoded by your assigned organism's genome.


  • Record the parameters you have used for the search and the relevant search results.
  • List the accession numbers and names of all putative homologues. How many are there? How many (if any) do you expect? What do you conclude ?
(Please contact me immediately in case you cannnot find any significant alignments - you cannot continue with the assignment if you get stuck at this point.)

Align

   

Sequences and accession numbers (1 mark)

Retrieve the entire protein sequences for your highly significant hits from the links on the BLAST results page. Find and list

  • the GenPept accession number(s) (NCBI),
  • the RefSeq identifier(s) if available,
  • and the Uniprot accession number(s) (EBI)

For each of the homologous sequences; find these database identifiers

  • either by identifying the appropriate cross-references from the annotations
  • or by running a BLAST search against the sequence database. Depending on what you have and what you are looking for, either search at the NCBI or at the EBI. But (!) restrict the search to your assigned organism! You will have to figure out how to do that and of course report the parameters you have used. I know that this BLAST method appears to be a maximally inefficient way to retrieve a cross-reference for a sequence. However, frequently the databases are simply not providing the appropriate cross-references and the detour through a BLAST search is the only practical way to get them. Most unfortunate.

Discovering a significantly better (or at least significantly more interesting) way to obtain the cross-references and being the first one to post it on the list probably will merit a bonus point.

Sequence alignment (1 mark)


Analyse

   

Sequence annotation (2 marks)


Homologous structure (1 mark)


DNA binding site (3 marks)


[End of assignment]

If you have any questions at all, don't hesitate to mail me at boris.steipe@utoronto.ca or post your question to the Course Mailing List