BIO Assignment Week 2

From "A B C"
Revision as of 00:08, 21 September 2012 by Boris (talk | contribs)
Jump to navigation Jump to search

Assignment for Week 2
Scenario, Databases, Search and Retrieve

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

 
 

Concepts and activities (and reading, if applicable) for this assignment will be topics on next week's quiz.



 

The Scenario

Baker's yeast, Saccharomyces cerevisiae, is perhaps the most important model organism. It is a eukaryote that has been studied genetically and biochemically in great detail for many decades, and it is easily manipulated with high-throughput experimental methods. We will use information from this model organism to study the conservation of function and sequence in other fungi whose genomes have been completely sequenced; the assignments are an exercise in model-organism reasoning: the transfer of knowledge from one, well-studied organism to others.

This and the following assignments will revolve around a transcription factor that plays an important role in the regulation of the cell cycle: Mbp1 is a key component of the MBF complex (Mbp1/Swi6). This complex regulates gene expression at the crucial G1/S-phase transition of the mitotic cell cycle and has been shown to bind to the regulatory regions of more than a hundred target genes. It is therefore a DNA binding protein that acts as a control switch for a key cellular process.

One would speculate that such central control machinery would be conserved in other fungi and it will be your task in these assignments to collect evidence whether related molecular components are present in some of the newly sequenced fungal genomes. Throughout the assignments we will use freely available tools to conduct bioinformatics investigations of sequences, structures and relationships that may ultimately answer questions such as:

  • Do related proteins exist in other organisms?
  • What functional features can we detect in the related proteins?
  • Do we have evidence that they may bind to similar sequence motifs?
  • Do we believe they may function in a similar way?

Task:
Access the information page on Mbp1 at the Saccharomyces Genome Database and read the summary paragraph on the protein's function!

(If you would like to brush up on the concepts mentioned above, you could study the corresponding chapter in Lodish's Molecular Cell Biology and./or read Nobel laureate Paul Nurse's review of the key concepts of the eukaryotic cycle. It is not strictly necessary to understand the details of the yeast cell-cycle to complete the assignments, but it's obviously more satisfying to work with concepts that actually make some sense.)

For reference, this is the FASTA formatted sequence of Mbp1 from Saccharomyces cerevisiae:

>gi|6320147|ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c]
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDRKKAIRSASTSAIMET
KRNNKKAEENQFQSSKILGNPTAAPRKRGRPVGSTRGSRRKLGVNLQRSQSDMGFPRPAIPNSSISTTQL
PSIRSTMGPQSPTLGILEEERHDSRQQQPQQNNSAQFKEIDLEDGLSSDVEPSQQLQQVFNQNTGFVPQQ
QSSLIQTQQTESMATSVSSSPSLPTSPGDFADSNPFEERFPGGGTSPIISMIPRYPVTSRPQTSDINDKV
NKYLSKLVDYFISNEMKSNKSLPQVLLHPPPHSAPYIDAPIDPELHTAFHWACSMGNLPIAEALYEAGTS
IRSTNSQGQTPLMRSSLFHNSYTRRTFPRIFQLLHETVFDIDSQSQTVIHHIVKRKSTTPSAVYYLDVVL
SKIKDFSPQYRIELLLNTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLTANEIMNQQYEQM
MIQNGTNQHVNSSNTDLNIHVNTNNIETKNDVNSMVIMSPVSPSDYITYPSQIATNISRNIPNVVNSMKQ
MASIYNDLHEQHDNEIKSLQKTLKSISKTKIQVSLKTLEVLKESSKDENGEAQTNDDFEILSRLQEQNTK
KLRKRLIRYKRLIKQKLEYRQTVLLNKLIEDETQATTNNTVEKDNNTLERLELAQELTMLQLQRKNKLSS
LVKKFEDNAKIHKYRRIIREGTEMNIEEVDSSLDVILQTLIANNNKNKGAEQIITISNANSHA

I have highlighted the protein's APSES domain (also known as a KilA-N domain), which is the DNA binding element of the sequence. Of course, such coloring is not part of the actual FASTA file which contains only a header and sequence letters.


Choosing YFO (Your Favourite Organism)

The first task is to choose a species in which to conduct your explorations.


Many fungal genomes have been sequenced and more are added each year. For the purposes of the course assignments, we need a species

  • that has transcription factors with APSES domains;
  • whose genome has been completely sequenced;
  • for which records exist in the RefSeq database, NCBI's unique sequence collection.


To prepare such a list of species, I have searched the NCBI's RefSeq database for proteins whose sequences are similar to the APSES domain of Mbp1 and compiled the names of organisms that contain them.

 

Next, I would like to assign species from this list randomly to each student, but I'd also like to avoid having to make a fresh table of assignments every year.

Here is R code to accomplish this:

Task:


  • Read, try to understand and then execute the following R-code.
pickSpecies <- function(ID) {
	# this function randomly picks a fungal species
	# from a list. It is seeded by a student ID. Therefore
	# the pick is random, but reproducible.
	
	# first, define a list of species:
	Species <- c(
		"Ajellomyces dermatitidis (AJEDE)",
		"Arthroderma gypseum (ARTGY)",
		"Ashbya gossypii (ASHGO)",
		"Aspergillus clavatus (ASPCL)",
		"Aspergillus flavus (ASPFL)",
		"Botryotinia fuckeliana (BOTFU)",
		"Candida glabrata (CANGL)",
		"Chaetomium globosum (CHAGL)",
		"Clavispora lusitaniae (CLALU)",
		"Coccidioides immitis (COCIM)",
		"Coprinopsis cinerea (COPCI)",
		"Debaryomyces hansenii (DEBHA)",
		"Gibberella zeae (GIBZE)",
		"Kluyveromyces lactis (KLULA)",
		"Komagataella pastoris (KOMPA)",
		"Laccaria bicolor (LACBI)",
		"Lachancea thermotolerans (LACTH)",
		"Lodderomyces elongisporus (LODEL)",
		"Magnaporthe oryzae (MAGOR)",
		"Malassezia globosa (MALGL)",
		"Meyerozyma guilliermondii (MEYGU)",
		"Nectria haematococca (NECHA)",
		"Neosartorya fischeri (NEOFI)",
		"Paracoccidioides brasiliensis (PARBR)",
		"Penicillium chrysogenum (PENCH)",
		"Puccinia graminis (PUCGR)",
		"Pyrenophora teres (PYRTE)",
		"Scheffersomyces stipitis (SCHST)",
		"Schizophyllum commune (SCHCO)",
		"Phaeospheria nodorum (PHANO)",
		"Schizosaccharomyces japonicus (SCHJA)",
		"Sclerotinia sclerotiorum (SCLSC)",
		"Talaromyces stipitatus (TALST)",
		"Trichophyton rubrum (TRIRU)",
		"Uncinocarpus reesii (UNCRE)",
		"Vanderwaltozyma polyspora (VANPO)",
		"Verticillium albo-atrum (VERAL)",
		"Yarrowia lipolytica (YARLI)",
		"Zygosaccharomyces rouxii (ZYGRO)"
		)
	l <- length(Species)    # number of elements in the list
	set.seed(ID)            # seed the random number generator
	                        # with the student ID
	i <- runif(1, 0, 1)     # pick one random number between 0 and 1
	i <- l * i              # multiply with number of elements
	i <- ceiling(i)         # round up to nearest integer
	choice <- Species[i]    # pick the i'th element from list
	return(choice)
}
  • Execute the function pickSpecies() with your student ID as its parameter. Example:
 > pickSpecies(991234567)
 [ 1] "Candida glabrata (CANGL)"
  • Note down the species name and its five letter abbreviation. Use this species whenever this or future assignments refer to YFO.


 

Keeping a notebook

Consider it a part of your assignment to document your activities. This will be helpful, because the assignment is more or less integrated over the entire term, and later assignments will make use of earlier results. But it is also excellent practice for "real" research.