Difference between revisions of "BIO Assignment 3 2011"

From "A B C"
Jump to navigation Jump to search
Line 256: Line 256:
  
 
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
 
&nbsp;<br><div style="padding: 5px; background: #EEEEEE;">
*Review the resulting file for the [[All_APSES_proteins|'''95 proteins'''] and make sure you understand the procedure that led to it. You might perhaps try and reproduce this.
+
*Review the resulting file for the [[All_APSES_proteins|'''95 proteins''']] and make sure you understand the procedure that led to it. You might perhaps try and reproduce this.
 
</div>
 
</div>
 
&nbsp;<br>
 
&nbsp;<br>
  
The next step to obtain the necessary input data was to define the APSES domains in these sequences. The approach to this is summarized in the resulting multi-FASTA file of [[All_APSES_domains|'''all APSES domains'''].
+
The next step to obtain the necessary input data was to define the APSES domains in these sequences. The approach to this is summarized in the resulting multi-FASTA file of [[All_APSES_domains|'''all APSES domains''']].
  
  

Revision as of 19:43, 15 November 2006

   

Assignment 3 - Multiple Sequence Alignment

Please note: This assignment is currently inactive. Unannounced changes may be made at any time.  


 

Introduction

A careful multiple sequence alignment is a cornerstone for the annotation of a gene or protein...


Preparation, submission and due date

Read carefully. Be sure you have understood all parts of the assignment and cover all questions in your answers! Sadly, we always get assignments back in which people have simply overlooked crucial questions. Sadly, we always get assignments back in which people have not described procedural details. If you did not notice that the above were two different sentences, you are still not reading carefully enough.

Prepare a Microsoft Word document with a title page that contains:

  • your full name
  • your Student ID
  • your e-mail address
  • the organism name you have been assigned (see below)

Follow the steps outlined below. You are encouraged to write your answers in short answer form or point form, like you would document an analysis in a laboratory notebook. However, you must

  • document what you have done,
  • note what Web sites and tools you have used,
  • paste important data sequences, alignments, information etc.

If you do not document the process of your work, we will deduct marks. Try to be concise, not wordy! Use your judgement: are you giving us enough information so we could exactly reproduce what you have done? If not, we will deduct marks. Avoid RTF and unnecessary formating. Do not paste screendumps. Keep the size of your submission below 1.5 MB.

Write your answers into separate paragraphs and give each its title. Save your document with a filename of: A3_family name.given name.doc (for example my first assignment would be named: A3_steipe.boris.doc - and don't switch the order of your given name and familyname please!)

Finally e-mail the document to [boris.steipe@utoronto.ca] before the due date.

Your document must not contain macros. Please turn off and/or remove all macros from your Word document; we will disable macros, since they pose a security risk.

With the number of students in the course, we have to economize on processing the assignments. Thus we will not accept assignments that are not prepared as described above. If you have technical difficulties, contact me.

The due date for the assignment is XXXXX at 10:00 in the morning.

Grading

Don't wait until the last day to find out there are problems! Assignments that are received past the due date will have one mark deducted at the first minute of every twelve hour period past the due date. Assignments received more than 5 days past the due date will not be assessed.

Marks are noted below in the section headings for of the tasks. A total of 10 marks will be awarded, if your assignment answers all of the questions. A total of 2 bonus marks (up to a maximum of 10 overall) can be awarded for particularily interesting findings, or insightful comments. A total of 2 marks can be subtracted for lack of form or for glaring errors. The marks you receive will

  • count directly towards your final marks at the end of term, for BCH441 (undergraduates), or
  • be divided by two for BCH1441 (graduates).

   

Retrieve

   

In Assignment 2 you had retrieved the Saccharomyces cerevisiae Mbp1 protein sequence. Here I have compiled the most similar homologues from the organisms you have studied:

Source sequences

 

In your second assignments, the following proteins have been found as the best matches to the yeast Mbp1 protein. Since there was some variation in the sequences you reported, I have generated the list de novo, BLASTing against RefSeq proteins of fungal genomes, keeping only the top hit from each species and verifying the best match against yeast as the top hit on a BLINK search. The UniProt accessions were obtained for all sequences with one query using the new UniProt ID mapping service.


Organism CODE GI Refseq Uniprot Accession Most similar yeast gene
Aspergillus fumigatus ASPFU 70986922 XP_748947 Q4WGN2 Mbp1
Aspergillus nidulans ASPNI 67525393 XP_660758 Q5B8H6 Mbp1
Aspergillus terreus ASPTE 115391425 XP_001213217 Q0CQJ5 Mbp1
Candida albicans CANAL 68465419 XP_723071 Q5ANP5 Mbp1
Candida glabrata CANGL 50286059 XP_445458 Q6FWD6 Mbp1
Cryptococcus neoformans CRYNE 58266778 XP_570545 Q5KHS0 Mbp1
Debaryomyces hansenii DEBHA 50420495 XP_458784 Q6BSN6 Mbp1
Eremothecium gossypii EREGO 45199118 NP_986147 Q752H3 Mbp1
Gibberella zeae GIBZE 46116756 XP_384396 Q4IEY8 Mbp1
Kluyveromyces lactis KLULA 50308375 XP_454189 P39679 Mbp1
Magnaporthe grisea MAGGR 39964664 XP_365024 ACC Mbp1*
Neurospora crassa NEUCR 85109541 XP_962967 Q7SBG9 Mbp1
Saccharomyces cerevisiae SACCE 6320147 NP_010227 P39678 Mbp1
Schizosaccharomyces pombe SCHPO 19113944 NP_593032 P41412 Mbp1
Ustilago maydis USTMA 71024227 XP_762343 Q4P117 Mbp1
Yarrowia lipolytica YARLI 50545439 XP_500257 Q6CGF5 Mbp1

* Note: This is a full-length homologue, however the C-terminal half is more similar to Swi6 than to Mbp1.



 

  • Download sequences and generate multi-Fasta file

 

But Mbp1 orthologues are not the only proteins that contain APSES domains. In order to find all the rest, a PSI BLAST search was performed with the yeast Mbp1 APSES domain, defined as follows:

>Yeast Mbp1 APSES domain (AA 24..107 of NP_010227)
SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY
QGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDG

This retrieved a total of [[All_APSES_proteins|95 proteins] with sequences similar to the Mbp1 APSES domain.

 

  • Review the resulting file for the 95 proteins and make sure you understand the procedure that led to it. You might perhaps try and reproduce this.

 

The next step to obtain the necessary input data was to define the APSES domains in these sequences. The approach to this is summarized in the resulting multi-FASTA file of all APSES domains.



 

  • For all these proteins APSES domains plus Mbp1 homologues, perform a batch search of the APSES domain, as represented in the SMART database ...
  • using the domain boundaries defined by SMART, generate a multi-Fasta file of all APSES domains. your header lines should contain a five letter code for the organism, the name of the most similar yeast gene and the starting and ending amino acid number from the source sequence. It should NOT contain any non-alphanumeric character except for the starting ">".

Orthologues

 

Instruction

 

  • Task

 

Instruction

 

  • Task.

APSES domains

 

Instruction

 

  • Task

 

Instruction

 

  • Task.

   

Align

   

SUB section Heading (X marks)

 

Instruction

 

  • Task

 

Instruction

 

  • Task.

   

Analyse

   

SUB section Heading (X marks)

 

Instruction

 

  • Task

 

Instruction

 

  • Task.

   

[End of assignment]

If you have any questions at all, don't hesitate to mail me at boris.steipe@utoronto.ca or post your question to the Course Mailing List