Difference between revisions of "Database Exam Questions"
(→2002) |
(→2003) |
||
Line 86: | Line 86: | ||
<small>''(Indeed, since the RasMol tutorial was a task of the first assignment, a question like this may turn up every now and then.)''</small> | <small>''(Indeed, since the RasMol tutorial was a task of the first assignment, a question like this may turn up every now and then.)''</small> | ||
− | ==2003== | + | ==2003 - Entrez== |
[[Image:Entrez_result.jpg|frame|none|This is a screenshot of the result of searching the NCBI Entrez database with the search string "sh3". ]] | [[Image:Entrez_result.jpg|frame|none|This is a screenshot of the result of searching the NCBI Entrez database with the search string "sh3". ]] | ||
Line 97: | Line 97: | ||
*'''OMIM''' | *'''OMIM''' | ||
</div> | </div> | ||
− | |||
==2002== | ==2002== |
Revision as of 06:24, 11 December 2006
- One aspect of Bioinformatics concerns itself with the storage, organisation, and retreival of biological information. The questions in this section consider the contents and use of some of the key abstractions (sequences, structures, graphs ...) that we deal with, and the databases we store them in.
2003
In the excerpt from the PDBsum database shown here, please comment briefly on the following points. Questions correspond to the numbers shown on the image. The region within the circle is an enlargement of the original, for better legibility.
|
2004 - PDB Format
Despite its many shortcomings and inconsistencies, the PDB format for coordinate datasets is still the most widely accepted format, chiefly due to the large number of legacy programs that use it, but also because it is human readable. The following is an excerpt from the PDB file of pea defensin (1JKZ.PDB).
[...] ATOM 404 N ALA A 28 1.084 7.614 2.493 1.00 0.00 N ATOM 405 CA ALA A 28 0.164 7.660 3.616 1.00 0.00 C ATOM 406 C ALA A 28 0.842 7.090 4.856 1.00 0.00 C ATOM 407 O ALA A 28 0.731 5.902 5.139 1.00 0.00 O ATOM 408 CB ALA A 28 -1.123 6.911 3.287 1.00 0.00 C ATOM 409 H ALA A 28 1.535 6.768 2.288 1.00 0.00 H ATOM 410 HA ALA A 28 -0.085 8.696 3.802 1.00 0.00 H ATOM 411 1HB ALA A 28 -1.278 6.918 2.218 1.00 0.00 H ATOM 412 2HB ALA A 28 -1.957 7.396 3.773 1.00 0.00 H ATOM 413 3HB ALA A 28 -1.047 5.891 3.634 1.00 0.00 H [...]
- Which atom numbers correspond to the backbone atoms and which atomnumbers correspond to the sidechain of this aminoacid ?
- Describe the information in the following columns (indicated by the values in the first record):
" N ", "A", "1.084", "1.00", "0.00"
. - Are any of these columns optional, and if yes, would their absence shift the positions of the other columns?
- Briefly discuss the relationship between
SEQRES
records in a PDB file, the genetic sequence of a protein, and the sequence that can be derived from the coordinate records.
Here is a sequence file for this protein.
>gi|20139322|sp|P81929|PSD1_PEA Defense-related peptide 1 KTCEHLADTYRGVCFTNASCDDHCKNKAHLISGTCHNWKCFCTQNC
- What is the name of this file-format?
- Find the amino acid for which the coordinates were given above, in this sequence. (Write its one letter code into your exam booklet together with the preceeding and the following amino acid and underline it e.g. ABC ).
In the coordinate file of the immunoglobulin domain 2IMM.pdb you find the following record.
HETATM 877 O HOH 1 -4.169 60.050 40.145 1.00 3.00 O
- What does this record describe ?
- When you display the structure of 2IMM.pdb with RasMol, the protein is displayed as a wireframe model but you see nothing that corresponds to the above record. What do you need to do ?
(Indeed, since the RasMol tutorial was a task of the first assignment, a question like this may turn up every now and then.)
2003 - Entrez
Briefly discuss each database that the following terms link to, what relationship the results have to the search term and what use one can make of the links.
- PubMed
- Protein
- CDD
- OMIM
2002
- Task
Comment