Expected Preparations:
|
|||||||||||
|
|||||||||||
Keywords: NWS (optimal global) and SW (optimal local) algorithms; alignment via EMBOSS tools in practice; interpretation of alignments | |||||||||||
|
|||||||||||
Objectives:
This unit will …
|
Outcomes:
After working through this unit you …
|
||||||||||
|
|||||||||||
Deliverables: Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these. Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page. Your protein database: Add APSES domain annotations for MBP1_MYSPE proteins to your database. |
|||||||||||
|
|||||||||||
Evaluation: NA: This unit is not evaluated for course marks. |
This unit covers the concepts and algorithms for optimal pairwise sequence alignments.
Task…
Optimal pairwise sequence alignment is the mainstay of sequence
comparison. To try our first alignments in practice, we will start with
aligning Mbp1 and its MYSPE relative. For simplicity, I will call the
two proteins MBP1_SACCE
and MBP1_MYSPE
through
the remainder of the unit.
EMBOSS tools are a collection of standard sequence analysis programs. The most important ones are hosted at the EBI, but the EMBOSS explorer site hosts many more. They offer Needlman-Wunsch and Smith-Waterman alignments.
Task…
MBP1_SACCE
and
MBP1_MYSPE
from your database that you have prepared in the
BIN-Storing_data
unit. Open the RStudio project and enter the code below - substituting
the proper name for MYSPE where appropriate.source("makeProteinDB.R")
# Print the MBP1_SACCE sequence
sel <- myDB$protein$name == "MBP1_SACCE"
myDB$protein$sequence[sel]
# Print the MBP1_MYSPE sequence
sel <- myDB$protein$name == paste0("MBP1_", biCode(MYSPE))
myDB$protein$RefSeqID[sel]
(If this didn’t work, fix the problem. Did you give your sequence the right name in your database?)
Task…
MBP1_SACCE
and MBP1_MYSPE
sequences
again and run the program with default parameters.
Biostrings has extensive functions for sequence alignments. They are generally well written and tightly integrated with the rest of Bioconductor’s functions. There are a few quirks however: for example alignments won’t work with lower-case sequences1.
Task…
ABC-units
R project. If you
have loaded it before, choose File ▸ Recent
projects ▸ ABC-Units. If you have not loaded
it before, follow the instructions in the RPR-Introduction
unit.init()
if requested.BIN-ALI-Optimal_sequence_alignment.R
and
follow the instructions.
Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.
Fitch, W M.
(2000). “Homology a personal view on some of the problems”. Trends
in Genetics : Tig 16(5):227–31 .
[PMID: 10782117]
[DOI: 10.1016/s0168-9525(00)02005-9]
If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.
Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.
[END]
While this seems like an unnecessary limitation, given that we could easily write such code to transform to-upper when looking up values in the MDM, perhaps it is meant as an additional sanity check that we haven’t inadvertently included text in the sequence that does not belong there, such as the FASTA header line.↩︎