BIO Assignment 3 2011

Read carefully. Be sure you have understood all parts of the assignment and cover all questions in your answers! Sadly, we always get assignments back in which people have simply overlooked crucial questions. Sadly, we always get assignments back in which people have not described procedural details. If you did not notice that the above were two different sentences, you are still not reading carefully enough.

Prepare a Microsoft Word document with a title page that contains:

your full name
your Student ID
your e-mail address
the organism name you have been assigned (see below)

Follow the steps outlined below. You are encouraged to write your answers in short answer form or point form, like you would document an analysis in a laboratory notebook. However, you must

document what you have done,
note what Web sites and tools you have used,
paste important data sequences, alignments, information etc.

If you do not document the process of your work, we will deduct marks. Try to be concise, not wordy! Use your judgement: are you giving us enough information so we could exactly reproduce what you have done? If not, we will deduct marks. Avoid RTF and unnecessary formating. Do not paste screendumps. Keep the size of your submission below 1.5 MB.

Write your answers into separate paragraphs and give each its title. Save your document with a filename of: A3_family name.given name.doc (for example my first assignment would be named: A3_steipe.boris.doc - and don't switch the order of your given name and familyname please!)

Finally e-mail the document to [boris.steipe@utoronto.ca] before the due date.

Your document must not contain macros. Please turn off and/or remove all macros from your Word document; we will disable macros, since they pose a security risk.

With the number of students in the course, we have to economize on processing the assignments. Thus we will not accept assignments that are not prepared as described above. If you have technical difficulties, contact me.

The due date for the assignment is XXXXX at 10:00 in the morning.

Grading

Don't wait until the last day to find out there are problems! Assignments that are received past the due date will have one mark deducted at the first minute of every twelve hour period past the due date. Assignments received more than 5 days past the due date will not be assessed.

Marks are noted below in the section headings for of the tasks. A total of 10 marks will be awarded, if your assignment answers all of the questions. A total of 2 bonus marks (up to a maximum of 10 overall) can be awarded for particularily interesting findings, or insightful comments. A total of 2 marks can be subtracted for lack of form or for glaring errors. The marks you receive will

count directly towards your final marks at the end of term, for BCH441 (undergraduates), or
be divided by two for BCH1441 (graduates).

Retrieve

In Assignment 2 you had retrieved the Saccharomyces cerevisiae Mbp1 protein sequence. Here I have compiled the most similar homologues from the organisms you have studied:

Mbp1 homologues

Our first task is to compile a multi-FASTA file for all Mbp1 orthologues.

In your second assignments, you used BLAST to find the best matches to the yeast Mbp1 protein in your assigned organism's genome. Since there was some variation in the sequences you reported, I have generated a list de novo using the following procedure:

Retrieve the Mbp1 protein sequence by searching Entrez for Mbp1 AND "saccharomyces cerevisiae"[organism]
Click on the RefSeq tab to find the RefSeq ID "NP_010227"
Access the BLAST form for protein/protein BLAST and paste the RefSeq ID into the query field. Choose refseq as the database. Keep default parameters. Choose Fungi as an ENTREZ query limit in the Options section.
On the results page, check the checkbox next to the alignment for the top hit from each species we are studying.
Click on "Get selected sequences".
Verify that each sequence finds Mbp1 as the best match in the saccharomyces cerevisiae genome by clicking on each "BLINK" (click for example) in the retrieved list. Scroll down the list, the top hit for a saccharomyces cerevisiae protein should be Mbp1.
Obtain UniProt accessionsfor all sequences with a single query using the new UniProt ID mapping service. Simply paste all RefSeq IDs into the form.

*Organism*	`CODE`	GI	Refseq	Uniprot Accession	Most similar yeast gene
Aspergillus fumigatus	`ASPFU`	70986922	XP_748947	Q4WGN2	Mbp1
Aspergillus nidulans	`ASPNI`	67525393	XP_660758	Q5B8H6	Mbp1
Aspergillus terreus	`ASPTE`	115391425	XP_001213217	Q0CQJ5	Mbp1
Candida albicans	`CANAL`	68465419	XP_723071	Q5ANP5	Mbp1
Candida glabrata	`CANGL`	50286059	XP_445458	Q6FWD6	Mbp1
Cryptococcus neoformans	`CRYNE`	58266778	XP_570545	Q5KHS0	Mbp1
Debaryomyces hansenii	`DEBHA`	50420495	XP_458784	Q6BSN6	Mbp1
Eremothecium gossypii	`EREGO`	45199118	NP_986147	Q752H3	Mbp1
Gibberella zeae	`GIBZE`	46116756	XP_384396	Q4IEY8	Mbp1
Kluyveromyces lactis	`KLULA`	50308375	XP_454189	P39679	Mbp1
Magnaporthe grisea	`MAGGR`	39964664	XP_365024	ACC	Mbp1*
Neurospora crassa	`NEUCR`	85109541	XP_962967	Q7SBG9	Mbp1
Saccharomyces cerevisiae	`SACCE`	6320147	NP_010227	P39678	Mbp1
Schizosaccharomyces pombe	`SCHPO`	19113944	NP_593032	P41412	Mbp1
Ustilago maydis	`USTMA`	71024227	XP_762343	Q4P117	Mbp1
Yarrowia lipolytica	`YARLI`	50545439	XP_500257	Q6CGF5	Mbp1

* Note: This is a full-length homologue, however the C-terminal half is more similar to Swi6 than to Mbp1.

Download all sequences, generate a multi-Fasta file and save it to your computer.

Hint: don't do this by hand, you can get the sequences all at once. Click here if you don't know how.
Don't submit the file but do record how you created it. (1 mark)

Explain if these sequences are necessarily orthologues to yeast Mbp1 (defined through the "reciprocal best-match" criterium). Explain if these sequences are necessarily orthologues to each other. (1 mark)

Other ASPES domain sequences

Mbp1 orthologues are not the only proteins that contain APSES domains. In order to find all the rest, a PSI BLAST search was performed with the yeast Mbp1 APSES domain, defined as follows:

>Yeast Mbp1 APSES domain (AA 24..107 of NP_010227)
SIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGFGKY
QGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDG

This retrieved (after editing) a total of 70 proteins with sequences similar to the Mbp1 APSES domain.

Review the resulting file for the 70 proteins and make sure you understand the procedure that led to it. You might perhaps try and reproduce this.

The next step to obtain the necessary input data was to define the APSES domains in these sequences. The approach to this is summarized in the resulting multi-FASTA file of all APSES domains.

For all these proteins APSES domains plus Mbp1 homologues, perform a batch search of the APSES domain, as represented in the SMART database ...
using the domain boundaries defined by SMART, generate a multi-Fasta file of all APSES domains. your header lines should contain a five letter code for the organism, the name of the most similar yeast gene and the starting and ending amino acid number from the source sequence. It should NOT contain any non-alphanumeric character except for the starting ">".

Orthologues

Instruction

Task

Instruction

Task.

APSES domains

Instruction

Task

Instruction

Task.

Align

SUB section Heading (X marks)

Instruction

Task

Instruction

Task.

Analyse

SUB section Heading (X marks)

Instruction

Task

Instruction

Task.

[End of assignment]

If you have any questions at all, don't hesitate to mail me at boris.steipe@utoronto.ca or post your question to the Course Mailing List

BIO Assignment 3 2011

Contents

Retrieve

Mbp1 homologues

Other ASPES domain sequences

Orthologues

APSES domains

Align

SUB section Heading (X marks)

Analyse

SUB section Heading (X marks)

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools