Difference between revisions of "BIO Assignment 1 2011"

From "A B C"
Jump to navigation Jump to search
m
 
(35 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<div style="padding: 5px; background: #FF4560;  border:solid 2px #000000;">
+
<!-- {{Template:Active}} -->
'''Note!'''
+
{{Template:Inactive}}
This assignment is currently inactive. Major and minor unannounced changes may be made at any time.
 
</div>
 
&nbsp;
 
 
 
&nbsp;
 
  
  
Line 16: Line 11:
 
</div>
 
</div>
  
<!-- Note: This assignment is currently active. All changes will be announced on the course mailing list. -->
 
  
 
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
 
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
Line 23: Line 17:
 
In this assignment we will introduce some key databases for bioinformatics that we will refer to throughout the course. You will receive more information on these databases as the course progresses, but you should be familiar in principle with their contents and services from the outset. Also, to enhance your course experience, you should familiarize yourself with a molecular graphics viewer and practice viewing molecules in stereo.
 
In this assignment we will introduce some key databases for bioinformatics that we will refer to throughout the course. You will receive more information on these databases as the course progresses, but you should be familiar in principle with their contents and services from the outset. Also, to enhance your course experience, you should familiarize yourself with a molecular graphics viewer and practice viewing molecules in stereo.
  
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
 
Submission and due date
 
</div>
 
 
Prepare a Microsoft Word document with a title page that contains:
 
*your full name
 
*your Student ID
 
*your e-mail address
 
Copy the assessment part of the assignment into the document. save it with a filename of:
 
<code>A1_{lastname}.{firstname}.doc</code>
 
(for example my first assignment would be named: A1_steipe.boris.doc)
 
and e-mail the document to boris.steipe@utoronto.ca before the due date.
 
With the number of students in the course, we have to economize on processing the assignments. Thus we will not accept assignments that are not prepared as described above. If you have technical difficulties, contact me.
 
 
'''The due date for the assignment is Monday, October 1. at 11:00 in the morning.'''
 
 
<div style="padding: 2px; background: #F0F1F7;  border:solid 1px #AAAAAA; font-size:125%;color:#444444">
 
Grading
 
</div>
 
 
Marks for this assignment are self assessed. We expect your honesty, but you can expect that the skills you obtain through this assignment will also be required for the final exam.
 
 
Don't wait until the last day to find out there are problems! Assignments that are received past the due date will have one mark deducted and an additional mark for every full twelve hour period past the due date. Assignments received more than 5 days past the due date will not be assessed.  If you need an extension, you '''must''' arrange this beforehand.
 
  
The marks you receive will
+
{{Template:Preparation|
* count directly towards your final marks at the end of term, for BCH441 (undergraduates), or
+
care=Be sure you have understood all parts of the assignment and cover all questions in your answers!|
* be divided by two for BCH1441 (graduates).
+
num=1|
 +
ord=first|
 +
due = Thursday, September 25 at 10:00 in the morning}}
  
 
&nbsp;
 
&nbsp;
Line 62: Line 35:
  
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
===NAR DB and Web Service issues (1 mark)===
+
 
 +
=== Entrez and the NCBI (1 mark)===
 
</div>
 
</div>
Familiarize yourself with the scope and breadth of the tools and resources that are available on the Web.
+
The NCBI administers some of the world's most important databases, such as GenBank. In this section you should
 
+
*Explore the NCBI Web site, familiarize yourself with its key databases and explore the resources to become confident that you will find information that you are looking for.
# Access the [http://nar.oxfordjournals.org/content/vol35/suppl_1/index.dtl '''2007 NAR supplement issue on databases'''], browse the table of contents and read the abstract for at least one of the databases that you find particularly interesting. Follow the link to that database. Explore, and briefly note your experiences: was it clear what data is being maintained at the site? Was it clear how to use the database? Was it clear how this database could be useful for your research?
+
*Follow a protein's annotations into PubMed and familiarize yourself with PubMed's query syntax.
# Access the [http://nar.oxfordjournals.org/content/vol35/suppl_2/index.dtl 2007 '''NAR supplement issue on Web Services'''], browse the table of contents and read the abstract for at least one of the services that you find particularly interesting. Follow the link to that tool. Explore, and briefly note your experiences: was it clear what data the service uses, what it does and what its results mean? Was it clear how to use the service? Was it clear how this service could be useful for your research?
+
*Explore the Entrez search page, and learn how to limit queries and restrict searches
 
 
 
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
  
=== Entrez and the NCBI (2 marks)===
 
</div>
 
The NCBI adminsters some of the world's most important databases, such as GenBank.
 
  
 
# Access the '''NCBI''' website at http://www.ncbi.nlm.nih.gov/ Look for the '''site-map''' and browse the contents of this large site; find which databases and services are hosted here. Expect to spend at least half an hour to familiarize yourself with the site.
 
# Access the '''NCBI''' website at http://www.ncbi.nlm.nih.gov/ Look for the '''site-map''' and browse the contents of this large site; find which databases and services are hosted here. Expect to spend at least half an hour to familiarize yourself with the site.
# Access the '''Map viewer''' (under the Genomes section of the Databases division). Click on the link to ''Saccharomyces cerevisiae'' for a whole genome view, then click on the icon for chromosome IV for a more detailed view. Enter the region between 340,000 and 380,000 in the "Region shown" fields on the left. How many genes does this region contain? How many of these are protein genes?
+
# Access the '''Map viewer''' (under the '''Genomes''' section of the '''Databases''' division). Click on the link under ''Saccharomyces cerevisiae'' (Build 2.1) for a whole genome view, then click on the icon for chromosome IV for a more detailed view. Enter the region between 340,000 and 380,000 in the "Region shown" fields on the left. How many genes does this region contain? How many of these are protein genes?
 
## Click on '''MBP1''' to follow the link to its Entrez Gene page. Study the contents of the page. If you are not clear what the sections show you, click on one of the question marks. If you are still not clear, ask on our mailing list.
 
## Click on '''MBP1''' to follow the link to its Entrez Gene page. Study the contents of the page. If you are not clear what the sections show you, click on one of the question marks. If you are still not clear, ask on our mailing list.
### Follow the link to '''PubMed''' for this gene. You should find (at least) 25 publications. Click on the '''History''' tab to find the index of the query that got you here (eg. "#4"). Now search for those papers in your query that were published in 2007: enter <tt>#4 AND 2007[DP]</tt> into the search field and click "Go". Make yourself familiar with the [http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.section.pubmedhelp.Search_Field_Descrip Search field descriptions and tags] (in particular <tt>[DP]</tt>, <tt>[AU]</tt>, <tt>[TI]</tt>, and <tt>[TA]</tt>), how you use the ''History'' to combine searches, and the use of <tt>AND</tt>, <tt>OR</tt>, <tt>NOT</tt> and brackets.
+
### Follow the link to '''PubMed''' for this gene. You should find (at least) 27 publications. Click on the '''History''' tab to find the index of the query that got you here (eg. "#4"). Now search for those papers in your query that were published in 2008: enter <tt>#4 AND 2008[DP]</tt> into the search field and click "Go". Make yourself familiar with the [http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.section.pubmedhelp.Search_Field_Descrip Search field descriptions and tags] (in particular <tt>[DP]</tt>, <tt>[AU]</tt>, <tt>[TI]</tt>, and <tt>[TA]</tt>), how you use the ''History'' to combine searches, and the use of <tt>AND</tt>, <tt>OR</tt>, <tt>NOT</tt> and brackets.
 
## Back at the MapViewer pager, click on '''pr''' in the same row as the MBP1 gene to find a list of '''GenPept''' (protein) records for this gene. Follow the link to the '''RefSeq''' record for this protein: <tt>'''NP_010227'''</tt>. This is a flat-file record for the Mbp1 gene. Study the fields and the format. Then use the "Display" option in the header to show this protein sequence in a FASTA format, choose "send to ... Text" to get '''only''' the FASTA format. Make sure you understand the difference between GenBank/GenPept and RefSeq, between GI number, accession and locus (refer to the lecture slides as soon as they are posted).  
 
## Back at the MapViewer pager, click on '''pr''' in the same row as the MBP1 gene to find a list of '''GenPept''' (protein) records for this gene. Follow the link to the '''RefSeq''' record for this protein: <tt>'''NP_010227'''</tt>. This is a flat-file record for the Mbp1 gene. Study the fields and the format. Then use the "Display" option in the header to show this protein sequence in a FASTA format, choose "send to ... Text" to get '''only''' the FASTA format. Make sure you understand the difference between GenBank/GenPept and RefSeq, between GI number, accession and locus (refer to the lecture slides as soon as they are posted).  
 
# In the header bar of the MapViewer, click on the link to '''Entrez'''. Enter <tt>mbp1</tt> into the search field of the Entrez page and click "GO".
 
# In the header bar of the MapViewer, click on the link to '''Entrez'''. Enter <tt>mbp1</tt> into the search field of the Entrez page and click "GO".
 
## Increase the relevance of returned items by '''restricting your search''' to a particular organism. Access and read the [http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.section.EntrezHelp.Entrez__the_Life_Sci Help pages for Entrez] and make sure you understand how to use limits and how to search in search field indexes. You will already have encountered similar concepts when you visited PubMed.
 
## Increase the relevance of returned items by '''restricting your search''' to a particular organism. Access and read the [http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.section.EntrezHelp.Entrez__the_Life_Sci Help pages for Entrez] and make sure you understand how to use limits and how to search in search field indexes. You will already have encountered similar concepts when you visited PubMed.
 
##Enter: <tt>mbp1 AND "saccharomyces cerevisiae"[organism]</tt> into the Entrez search field and click "GO". Click on the CoreNucleotide link of the results.
 
##Enter: <tt>mbp1 AND "saccharomyces cerevisiae"[organism]</tt> into the Entrez search field and click "GO". Click on the CoreNucleotide link of the results.
## The RefSeq sequence in these results is the entire yeast chromosome IV (1.5 Mbp) which you probably don't want to explore unless you actually want to. Check the list for a different record that contains only the gene's (full-length) nucleotide sequence. There are (as of this writing) two such records. Explore either one of the two, these are nucleotide sequences in the GenBank flat file format.  
+
## The RefSeq record listed in the results contains the entire yeast chromosome IV (1.5 Mbp) which you probably don't want to explore unless you actually want to. The result is correct, since ''mbp1'' is one of the 787 genes annotated on that chromosome, but perhaps not what we had in mind when we queried for a nucleotide sequence of the ''mbp1'' gene. Check the results for a different record that contains only the ''mbp1'' gene's (full-length) nucleotide sequence. There are (as of this writing) two such records. Explore either one of the two, these are nucleotide sequences in the GenBank flat file format.  
 +
 
 +
;Document your activities in point form.
 +
 
  
  
Line 91: Line 62:
 
=== The EBI (1 mark) ===
 
=== The EBI (1 mark) ===
 
</div>
 
</div>
In many ways the European EBI is complementary to the US NCBI. A data-sharing agreement for instance guarantees that the contents of the EMBL Nucleotide Sequence Database, GenBank and the Japanese DDBJ are synchronized on a daily basis. But there are of course also unique and uniquely valuable resources at the EBI.
+
In many ways the European EBI is complementary to the US NCBI. A data-sharing agreement for instance guarantees that the contents of the EMBL Nucleotide Sequence Database, GenBank and the Japanese DDBJ are synchronized on a daily basis. But there are of course also unique and uniquely valuable resources at the EBI. In this part of the assignment
 +
*you should explore the EBI Web site, familiarize yourself with its contents and services and explore the resources to become confident you will find information that you are looking for.
 +
*You should read the 2can tutorial on database browsing and the UniProt knowledgebase.
 +
*You should compared a UniProt record with the corresponding GenPept record and use the ensembl browser to access a gene report.
 +
 
  
 
#Enter the '''EBI Website''' at http://www.ebi.ac.uk/ Look for the site-map and explore the contents of this site, the databases, the services and its other offerings. Spend some time getting an idea of what is being offered here.
 
#Enter the '''EBI Website''' at http://www.ebi.ac.uk/ Look for the site-map and explore the contents of this site, the databases, the services and its other offerings. Spend some time getting an idea of what is being offered here.
 
# Visit the '''2can''' education support portal at http://www.ebi.ac.uk/2can/home.html . Explore its offerings, in particular, follow the links <u>Bioinformatics tutorials</u> &rarr; <u>Database browsing</u> and read the section on the different interface systems. You have encountered Entrez previously, now find out more about SRS, BioMart and UniProt Search.
 
# Visit the '''2can''' education support portal at http://www.ebi.ac.uk/2can/home.html . Explore its offerings, in particular, follow the links <u>Bioinformatics tutorials</u> &rarr; <u>Database browsing</u> and read the section on the different interface systems. You have encountered Entrez previously, now find out more about SRS, BioMart and UniProt Search.
 
# To learn more about the '''UniProt''' database: access the UniProt user manual at http://ca.expasy.org/sprot/userman.html and read through sections '''1''' and '''2''' of the manual.  
 
# To learn more about the '''UniProt''' database: access the UniProt user manual at http://ca.expasy.org/sprot/userman.html and read through sections '''1''' and '''2''' of the manual.  
##Contrast the contents of a Uniprot record with a GenPept record: for example [http://www.ebi.uniprot.org/uniprot-srv/flatView.do?proteinId=MBP1_YEAST&pager.offset= '''MBP1_YEAST'''] and [http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=6320147 '''NP_010227'''].
+
##Contrast the contents of a Uniprot record with a GenPept record: for example [http://www.uniprot.org/uniprot/P39678 '''MBP1_YEAST'''] and [http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=6320147 '''NP_010227'''].
# Follow the link to '''Ensembl''', click on ''saccharomyces cerevisiae'' and then on chromosome IV. Access the regions from basepair 340000 to 380000; contrast the display with the NCBI MapViewer. Identify the Mbp1 gene and click on it to retrieve its Gene report (under the systematic name: YDL056W). Can you find your way from this Gene report to the expressed protein sequence?
+
# Follow the link to '''Ensembl''', click on ''saccharomyces cerevisiae'' and then on chromosome IV. Access the regions from basepair 340000 to 380000; contrast the display with the NCBI MapViewer. Identify the Mbp1 gene and click on it to retrieve its Gene report (under the systematic name: YDL056W). Find your way from this Gene report to the expressed protein sequence and list the steps you have gone through.
 +
 
 +
;Document your activities in point form.
 +
 
  
  
Line 104: Line 82:
 
===The PDB (1 mark) ===
 
===The PDB (1 mark) ===
 
</div>
 
</div>
Visit the PDB website at http://www.pdb.org/
+
Visit the RCSB PDB website at http://www.pdb.org/ , explore the database and familiarize yourself with its contents.
  
Browse across the different sections and set yourself a specific objective so you are confident you know what this part is and does. Look for the "About PDB" page and explore the page. Explore the links on the "Education" page to see where you might fill in gaps in your knowledege of structural molecular biology. From the homepage, find a protein of your choice (eg. the yeast Swi6 or Mbp1 protein) and explore the information that is available for it. Expect to spend more than an hour on this task.
+
#Look for the "Getting started" page and explore the page.  
 +
#Explore the links on the "Education" page to see where you might fill in gaps in your knowledege of structural molecular biology, such as the '''Biological Units''' tutorial; read up on one or two the excellent molecule of the month articles, such as the TATA binding protein (July 2005).
 +
#From the homepage, search for the yeast Mbp1 protein (by keyword) and explore the information that is available in one of the entries that was retrieved.
  
For example, you might have:
+
;Document your activities in point form.
* worked through the PDB query tutorial
 
* browsed through the Molecule of the Month articles and studied the entry on TATA-box binding proteins
 
* and explored structures entries for 1SW6, 1E0B and 1BM8
 
* etc.
 
  
<div style="padding: 5px; background: #E9EBF3; border:solid 1px #AAAAAA;">
+
&nbsp;
=== KEGG (1 mark) ===
+
&nbsp;
</div>
+
&nbsp;
Data integration across a variety of databases is a major challenge for the data-management side of bioinformatics. You have come across two solutions above: the NCBI Entrez system and the EBI SRS system. Here is another solution to this problem: the EBI BioMart system.
 
* Access the BioMart server at http://www.biomart.org
 
* Follow at least two links into BioMart databases (e.g. ensembl and HapMap)
 
* Explore
 
  
 
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
 +
 
==Molecular graphics==
 
==Molecular graphics==
 
</div>
 
</div>
Access the Rasmol tutorial at http://biochemistry.utoronto.ca/steipe/bioinformatics/tutorials/rasmol_tutorial.html
 
Install Rasmol(Linux), RasMac (Macintosh), or RasTop(Windows) on your computer.
 
 
Work through the tutorial.
 
  
  
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
===VMD (1 marks)===
+
===VMD===
 
</div>
 
</div>
Access the Rasmol tutorial at http://biochemistry.utoronto.ca/steipe/bioinformatics/tutorials/rasmol_tutorial.html
+
Access the [[VMD|'''VMD tutorial''']]. Work through parts 1 (installation) and 2 (basic visualization).
Install Rasmol(Linux), RasMac (Macintosh), or RasTop(Windows) on your computer.
 
  
Work through the tutorial.
 
  
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
=== Stereo vision (3 marks):===
+
=== Stereo vision (1 mark):===
 
</div>
 
</div>
Use the hints given in the stereo vision section of the Rasmol tutorial and practice viewing molecules in stereo. Make sure that you use the Rasmol command
+
Access the '''[[Stereo Vision]]''' tutorial and practice viewing molecular structures in stereo.  
set stereo -5
+
 
to display molecules for divergent ("wall-eyed", not "cross-eyed") stereo view. Practice at least ...
+
Practice at least ...
 
* two times daily,
 
* two times daily,
 
* for 3-5 minutes each session,
 
* for 3-5 minutes each session,
 
* for at least twelve days (twentyfour sessions) between now and the due date of the assignment.
 
* for at least twelve days (twentyfour sessions) between now and the due date of the assignment.
Keep up your practice after the assignment. '''Stereo viewing will be required in the final exam.'''
 
  
You will receive 1 mark for every 8 sessions (4 days) of practice you have completed (max. 3 marks). Use different molecules and try them with different colouring and renderings. You will find a list suggesting interesting molecules on the tutorial page.
+
Keep up your practice after the assignment. '''Stereo viewing will be required in the final exam.'''  Practice with different molecules and try out different colours and renderings.
  
Record your progress on a sheet of paper. Make sure you also record the information for the supplementary questions you need to turn in (see below).
 
  
'''Note: do not go through this assignment mechanically. If you are not making any progress, contact me so we can help you on the right track.'''
+
You will receive 1 mark if you have completed at least 2 practice sessions per day of at least 5 minutes per session, twice daily.
  
<div style="padding: 5px; background: #BDC3DC;  border:solid 1px #AAAAAA;">
+
Record your progress on a sheet of paper. Make sure you also record the information for the supplementary questions you need to turn in (see below).
==Assessment==
 
</div>
 
 
 
&nbsp;
 
  
&nbsp;
+
'''Note: do not go through this assignment mechanically. If you are not making any progress with stereo vision, contact me so we can help you on the right track.'''
  
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
+
Supplementary questions (these will not be marked):
Copy and paste from the following for your submission. Copy the most appropriate phrase for each section, fill in the blanks and answer the questions truthfully.</div>
 
&nbsp;
 
 
 
'''1.1'''
 
'''NAR database issue:'''
 
I have explored the database TOC and explored the following database:
 
________________________. My comments on clarity of contents, usability
 
and utility are as follows:
 
 
 
 
(0.5 marks) / or: I did not find the time to explore one of the databases (0 marks).
 
 
 
'''NAR Web service issue:'''
 
I have explored the Web service TOC and explored the following tool:
 
________________________. My comments on clarity of contents, usability
 
and utility are as follows:
 
 
 
 
(0.5 mark) / or:  I did not find the time to explore one of the Web services (0 marks).
 
&nbsp;
 
&nbsp;
 
  
 +
*In which session have you been able to visualize a 3D image in focus for the first time?
 +
*From which session on have you been able to view molecules in stereo comfortably?
 +
*How are you currently doing - can you view molecules in stereo on screen and on paper with ease / with some effort / with difficulty / rarely / not at all?
  
'''1.2'''
 
'''NCBI:'''
 
I have explored the NCBI Web site, familiarized myself with its
 
key databases and am confident that I will find information
 
that I am  looking for. I have also used the MapViewer, followed
 
a gene's annotations into PubMed and familiarized myself with
 
PubMeds query syntax.
 
(1 mark) / or: I did not find the time to do this part of the assignment (0 marks).
 
 
'''Entrez:'''
 
I have explored the Entrez search page, and learned how to limit queries
 
and restrict searches. I am confident that I will find the information
 
that I am  looking for. The full-length yeast Mbp1 nucleotide sequence I found
 
has the accession number ____________ .
 
(1 mark) / or: I did not find the time to do this part of the assignment (0 marks).
 
  
&nbsp;
 
 
&nbsp;
 
&nbsp;
  
'''1.3 EBI:'''
 
I have explored the EBI Web site, familiarized myself with its
 
contents and services and am confident that I will find information
 
that I am looking for. I have read the 2can tutorial on database browsing
 
and have read about the UniProt knowledgebase. Also, I have used the
 
Ensembl genome browser to access the Gene report for Mbp1. In order
 
to get from the Gene report to the protein sequence I have done the
 
following:
 
 
 
(1 mark) / or: I did not find the time to explore the EBI site and its offerings (0 marks).
 
&nbsp;
 
 
&nbsp;
 
&nbsp;
  
'''1.4 the PDB:'''
 
I have explored the PDB Web site in detail, familiarized myself
 
with its contents and am confident that I will find information
 
that I am looking for. A part of the site that I found particularily
 
interesting or useful is ________________________ (1 mark).
 
or
 
I did not find the time to explore the PDB site in detail (0 marks).
 
&nbsp;
 
&nbsp;
 
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
Copy and paste from the following statements the one that best characterizes your situation:
 
</div>
 
 
'''VMD:'''
 
I have successfully completed the first part of the VMD tutorial;
 
I understand the purpose and use of
 
all the forms used in the tutorial examples
 
and can confidently use the program for
 
my own visualization tasks (2 marks).
 
or
 
I have worked with the VMD tutorial. Nevertheless,
 
I am not confident with the forms and their use
 
and before I can use it for my own purposes
 
I will need additional training (1 mark).
 
or
 
I have not done this part of the assignment (0 marks).
 
&nbsp;
 
&nbsp;
 
 
 
<div style="padding: 5px; background: #E9EBF3;  border:solid 1px #AAAAAA;">
 
Copy the following statement into your assignment and fill in the blanks truthfully.
 
</div>
 
 
Stereo Vision:
 
I have practiced stereo viewing twice daily on _____ days
 
for a total of _____ practice sessions (maximum 3 marks for 24 sessions).
 
&nbsp;
 
&nbsp;
 
 
Supplementary questions (these will not be marked):
 
  
I have been able to visualize a 3D image in focus for the first time on day ___.
 
 
I have been able to view molecules in stereo comfortably since day ___.
 
 
Currently I am able to view molecules in stereo on screen and on
 
paper with ease / with some effort / with difficulty / rarely / not at all.
 
  
  

Latest revision as of 01:39, 29 October 2012

Note! This assignment is currently inactive. Major and minor unannounced changes may be made at any time.

 
 


   

Assignment 1 - Databases and Molecular Models


Introduction

In this assignment we will introduce some key databases for bioinformatics that we will refer to throughout the course. You will receive more information on these databases as the course progresses, but you should be familiar in principle with their contents and services from the outset. Also, to enhance your course experience, you should familiarize yourself with a molecular graphics viewer and practice viewing molecules in stereo.


Preparation, submission and due date

Read carefully.
Be sure you have understood all parts of the assignment and cover all questions in your answers!

Review the guidelines for preparation and submission of BCH441 assignments.

The due date for the assignment is Thursday, September 25 at 10:00 in the morning.

   

   


Key databases

   

Entrez and the NCBI (1 mark)

The NCBI administers some of the world's most important databases, such as GenBank. In this section you should

  • Explore the NCBI Web site, familiarize yourself with its key databases and explore the resources to become confident that you will find information that you are looking for.
  • Follow a protein's annotations into PubMed and familiarize yourself with PubMed's query syntax.
  • Explore the Entrez search page, and learn how to limit queries and restrict searches


  1. Access the NCBI website at http://www.ncbi.nlm.nih.gov/ Look for the site-map and browse the contents of this large site; find which databases and services are hosted here. Expect to spend at least half an hour to familiarize yourself with the site.
  2. Access the Map viewer (under the Genomes section of the Databases division). Click on the link under Saccharomyces cerevisiae (Build 2.1) for a whole genome view, then click on the icon for chromosome IV for a more detailed view. Enter the region between 340,000 and 380,000 in the "Region shown" fields on the left. How many genes does this region contain? How many of these are protein genes?
    1. Click on MBP1 to follow the link to its Entrez Gene page. Study the contents of the page. If you are not clear what the sections show you, click on one of the question marks. If you are still not clear, ask on our mailing list.
      1. Follow the link to PubMed for this gene. You should find (at least) 27 publications. Click on the History tab to find the index of the query that got you here (eg. "#4"). Now search for those papers in your query that were published in 2008: enter #4 AND 2008[DP] into the search field and click "Go". Make yourself familiar with the Search field descriptions and tags (in particular [DP], [AU], [TI], and [TA]), how you use the History to combine searches, and the use of AND, OR, NOT and brackets.
    2. Back at the MapViewer pager, click on pr in the same row as the MBP1 gene to find a list of GenPept (protein) records for this gene. Follow the link to the RefSeq record for this protein: NP_010227. This is a flat-file record for the Mbp1 gene. Study the fields and the format. Then use the "Display" option in the header to show this protein sequence in a FASTA format, choose "send to ... Text" to get only the FASTA format. Make sure you understand the difference between GenBank/GenPept and RefSeq, between GI number, accession and locus (refer to the lecture slides as soon as they are posted).
  3. In the header bar of the MapViewer, click on the link to Entrez. Enter mbp1 into the search field of the Entrez page and click "GO".
    1. Increase the relevance of returned items by restricting your search to a particular organism. Access and read the Help pages for Entrez and make sure you understand how to use limits and how to search in search field indexes. You will already have encountered similar concepts when you visited PubMed.
    2. Enter: mbp1 AND "saccharomyces cerevisiae"[organism] into the Entrez search field and click "GO". Click on the CoreNucleotide link of the results.
    3. The RefSeq record listed in the results contains the entire yeast chromosome IV (1.5 Mbp) which you probably don't want to explore unless you actually want to. The result is correct, since mbp1 is one of the 787 genes annotated on that chromosome, but perhaps not what we had in mind when we queried for a nucleotide sequence of the mbp1 gene. Check the results for a different record that contains only the mbp1 gene's (full-length) nucleotide sequence. There are (as of this writing) two such records. Explore either one of the two, these are nucleotide sequences in the GenBank flat file format.
Document your activities in point form.


The EBI (1 mark)

In many ways the European EBI is complementary to the US NCBI. A data-sharing agreement for instance guarantees that the contents of the EMBL Nucleotide Sequence Database, GenBank and the Japanese DDBJ are synchronized on a daily basis. But there are of course also unique and uniquely valuable resources at the EBI. In this part of the assignment

  • you should explore the EBI Web site, familiarize yourself with its contents and services and explore the resources to become confident you will find information that you are looking for.
  • You should read the 2can tutorial on database browsing and the UniProt knowledgebase.
  • You should compared a UniProt record with the corresponding GenPept record and use the ensembl browser to access a gene report.


  1. Enter the EBI Website at http://www.ebi.ac.uk/ Look for the site-map and explore the contents of this site, the databases, the services and its other offerings. Spend some time getting an idea of what is being offered here.
  2. Visit the 2can education support portal at http://www.ebi.ac.uk/2can/home.html . Explore its offerings, in particular, follow the links Bioinformatics tutorialsDatabase browsing and read the section on the different interface systems. You have encountered Entrez previously, now find out more about SRS, BioMart and UniProt Search.
  3. To learn more about the UniProt database: access the UniProt user manual at http://ca.expasy.org/sprot/userman.html and read through sections 1 and 2 of the manual.
    1. Contrast the contents of a Uniprot record with a GenPept record: for example MBP1_YEAST and NP_010227.
  4. Follow the link to Ensembl, click on saccharomyces cerevisiae and then on chromosome IV. Access the regions from basepair 340000 to 380000; contrast the display with the NCBI MapViewer. Identify the Mbp1 gene and click on it to retrieve its Gene report (under the systematic name: YDL056W). Find your way from this Gene report to the expressed protein sequence and list the steps you have gone through.
Document your activities in point form.


The PDB (1 mark)

Visit the RCSB PDB website at http://www.pdb.org/ , explore the database and familiarize yourself with its contents.

  1. Look for the "Getting started" page and explore the page.
  2. Explore the links on the "Education" page to see where you might fill in gaps in your knowledege of structural molecular biology, such as the Biological Units tutorial; read up on one or two the excellent molecule of the month articles, such as the TATA binding protein (July 2005).
  3. From the homepage, search for the yeast Mbp1 protein (by keyword) and explore the information that is available in one of the entries that was retrieved.
Document your activities in point form.

     

Molecular graphics


VMD

Access the VMD tutorial. Work through parts 1 (installation) and 2 (basic visualization).


Stereo vision (1 mark):

Access the Stereo Vision tutorial and practice viewing molecular structures in stereo.

Practice at least ...

  • two times daily,
  • for 3-5 minutes each session,
  • for at least twelve days (twentyfour sessions) between now and the due date of the assignment.

Keep up your practice after the assignment. Stereo viewing will be required in the final exam. Practice with different molecules and try out different colours and renderings.


You will receive 1 mark if you have completed at least 2 practice sessions per day of at least 5 minutes per session, twice daily.

Record your progress on a sheet of paper. Make sure you also record the information for the supplementary questions you need to turn in (see below).

Note: do not go through this assignment mechanically. If you are not making any progress with stereo vision, contact me so we can help you on the right track.

Supplementary questions (these will not be marked):

  • In which session have you been able to visualize a 3D image in focus for the first time?
  • From which session on have you been able to view molecules in stereo comfortably?
  • How are you currently doing - can you view molecules in stereo on screen and on paper with ease / with some effort / with difficulty / rarely / not at all?


 

 



[End of assignment]