BIN-ALI-MSA

Multiple Sequence Alignment

(Multiple sequence alignment)

Abstract:

A carefully produced multiple sequence alignment is an indispensable, extarordinarily valuable asset for the analysis of sequence features. Fully automated methods are regularly inferior to knowledgeable manual curation of alignments. In this unit we will discuss the concepts, practice producing MSA's online and in R, and analyze, write and display alignments. The goal is to empower you to produce the best alignments possible.

Objectives:
This unit will ...

... introduce the benefits of multiple sequence alignments (MSA), the objective functions they pursue, algorithms and methods, practical considerations, and the analysis of alignments;
... demonstrate Web services that calculate MSAs;
... teach how to compute and analyze MSA's in R.

Outcomes:
After working through this unit you ...

... can critically assess available options for producing Multiple Sequence Alignments;
... are familar with online and R programming tools to produce alignments;
... have aligned the full length sequence of the MYSPE Mbp1 orthologue to a selected set of reference sequences.

Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.

Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.

Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Prerequisites:
This unit builds on material covered in the following prerequisite units:

Create a new page on the student Wiki as a subpage of your User Page.
Put all of your writing to submit on this one page.
When you are done with everything, go to the Quercus Assignments page and open the first Learning Unit that you have not submitted yet. Paste the URL of your Wiki page into the form, and click on Submit Assignment.

Your link can be submitted only once and not edited. But you may change your Wiki page at any time. However only the last version before the due date will be marked. All later edits will be silently ignored.

Short Report option: 1. Create a new page on the student Wiki as a subpage of your User Page.; 2. Write a short report on one of the five following topics - A, B, C, D, or E. (All reports must have the R code you wrote in an appendix.)

A - Publication quality plot

A.1 Create a publication quality figure and figure caption of an MSA of Mbp1 orthologue sequences including MYSPE, that covers the APSES domain only. Produce this as a single page PDF using the msa:: package msa::msaPrettyPrint() function, and upload to the Student Wiki.

A.2 In your report, document the procedure and discuss how you have chosen the color parameters to illustrate interesting points about the domain.

B - Algorithm Comparison: MAFFT

B.1 At the EBI, produce a MSA of the full-length Mbp1 orthologues of the reference species plus MYSPE, using the MAFFT algorithm - a good, general purpose MSA algorithm.

B.2 Import the alignment to R and evaluate its quality relative to the MUSCLE alignment with default parameters. Report your findings.

C - Algorithm Comparison: WebPRANK

C.1 At the EBI, produce a MSA of the Mbp1 orthologues of the reference species plus MYSPE, using the WebPRANK algorithm which has an interesting approach to defining indels from computed phylogenetic relationships.

C.2 Import the alignment to R and evaluate its quality relative to the MUSCLE alignment with default parameters. Report your findings.

D - Algorithm Comparison: PRALINE

D.1 PRALINE reportedly produces some of the best alignments due to its (slow) PSI-BLAST profile pre-processing step, that pulls in additional homologues to increase the information that goes into the alignment. Access the PRALINE Web Server and produce a high-quality MSA of the Mbp1 orthologues of the reference species plus MYSPE.

D.2 Import the alignment to R and evaluate its quality relative to the MUSCLE alignment with default parameters. Report your findings.

E - Algorithm Parameters: MUSCLE

E.1 MUSCLE has a large number of additional parameters to tweak alignments. Discuss their use, and try different variations on the MSA of the Mbp1 orthologues of the reference species plus MYSPE^[1].

E.2 Report on the results of your experiments.

3. When you are done, submit the link to your page via Quercus as described above.

R-code option

Alignments can get very long it would be great to have an overview plot of the full-length alignment in one image. Your task is to write a function for that.

Submit code according to the following requirements. Make sure your code is documented and that you have tested your functions to be correct.

Write a function that takes an MsaAAMultipleAlignment object as input and produces a plot of the entire alignment. Sections of gaps shall be shown as continuos lines (segments()). Aligned residues shall be shown as rectangles (rect()). Provide an option to define line colors (e.g. default: "lightgrey"). Provide an option to define fill colors for residue rectangles (e.g. default: "skyblue"). Provide an option to color alignment columns with a color gradient according to the alignment score instead. Here is some code for inspiration of how to work with a color palette:

# v is the vector of moving-average scores of msaMscores
lev <- cut(v, labels = FALSE, breaks = 10)
myPal <- colorRampPalette(c("#e8e8e8", "#d6d6d6","#c4c4c4", "#b2b2b2",
                            "#f4a582", "#d6604d", "#b2182b"))
myCol <- myPal(max(lev))

barplot(msaMScores, col=myCol[lev], border = NA)

Create a new page on the student Wiki as a subpage of your User Page. Put your documented code and instructions there.
When you are done, submit the link to your page via Quercus as described above.

Option to write a "Self-Evaluation Question"

You can submit a "Self-Evaluation Question" for at most one of your assignments.

Write a "Self-evaluation Question" (with a model solution) that explores the interpretation of an MSA. The goal is for the learner to think about the biological interpretation of a multiple sequence alignment. Questions that I find interesting often explain the context of a biological fact (e.g. a phosporylation site, a ligand binding site, a domain boundary, a frameshift mutation etc. etc.), then ask to interpret an MSA as to how it represents information about the fact. Apply the marking rubrics in spirit to satisfy yourself of the quality of your question. Use the format and code templates that you find on the Self evaluation questions page - but don't assume those examples are already models of excellent contributions. This will be a short-answer format question. Note: assume that approximately the same amount of work is expected for all evaluation options. Consequently, the standard of excellence for this option will be quite high.

Create a new page on the student Wiki as a subpage of your User Page. Develop your question there.
When you are done, submit the link to your page via Quercus as described above.

Multiple Sequence Alignment

In order to perform a multiple sequence alignment, we obviously need a set of homologous sequences. This is not trivial. All interpretation of MSA results depends absolutely on how the input sequences were chosen. Should we include only orthologues, or paralogues as well? Should we include only species with fully sequenced genomes, or can we tolerate that some orthologous genes are possibly missing for a species? Should we include all sequences we can lay our hands on, or should we restrict the selection to a manageable number of representative sequences? All of these choices influence our interpretation:

orthologues are expected to be functionally and structurally conserved;
paralogues may have divergent function but have similar structure;
missing genes may make paralogs look like orthologs; and
selection bias may weight our results toward sequences that are over-represented and do not provide a fair representation of evolutionary divergence.

MSA's on the web at the EBI

The EBI hosts a number of excellent MSA programs on their Website. Let's perform an MSA of full length MBP1 orthologues:

Task:

Navigate to the NCBI protein database and paste the MBP1 protein RefSeq IDs from our database into the search form:

NP_010227 NP_593032 XP_660758 XP_007682304 XP_955821 XP_001837394
XP_569090 XP_003327086 XP_011392621 XP_006957051

(add your MBP1_MYSPE RefSeq ID too!)

This will give you a page with links to the retrieved sequences. Click on Summary and choose FASTA(text) as the Format to retrieve all sequences at once as a multi-FASTA formatted page (this is useful, remember it!)
Open another browser window and navigate to the EBI MSA tools page.
Click on Launch T-coffee.
Copy the FASTA sequences from the NCBI page, and paste them into the form at the EBI's T-Coffee page. Click Submit.
The result should show you the aligned sequences, with three blocks of high similarity:
- The most N-terminal block is the APSES domain - the main DNA binding domain of these transcription factors.
- In the middle, we have Ankyrin domains: these are protein-protein interaction modules that Mbp1 uses to recruit other proteins to the bound complex.
- At the end, there is one additional, shorter segment of high similarity.

Explore the tabs that are available, in particular note that you can save the result to a file.
Click on the Download Alignment File tab to load the alignment as text into a browser window. Then save the file into your project directory with a filename of msaT.aln. (.aln is the standard extension for CLUSTAL Formatted aligment files, so it helps if we give the file that extension. Of course you know better than to rely on an extension to signal the filetype and format.)

MSA's in R

Let's move to our RStudio project to explore producing and analyzing multiple sequence alignments in R.

Task:

Open RStudio and load the ABC-units R project. If you have loaded it before, choose File → Recent projects → ABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit.
Choose Tools → Version Control → Pull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
Type init() if requested.
Open the file BIN-ALI-MSA.R and follow the instructions.

Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.

Sequence alignment editors

Really excellent software tools have been written that help you visualize and manually curate multiple sequence alignments. If anything, I think they tend to do too much. Past versions of the course have used Jalview, but I have heard good things of AliView (and if you are on a Mac seqotron might interest you, but I only cover software that is free and runs on all three major platforms).

Here, I am just mentioning the two alignment editors and encourage you to explore and use them. If you have experience with comparing them, let us know.

[Jalview] an integrated MSA editor and sequence annotation workbench from the Barton lab in Dundee. Lots of functions.
[AliView] from Uppsala: fast, lean, looks to be very practical.

However: we should spend a moment considering the kind of improvements manual editing of alignments can aim for.

Alignment Editing

A good MSA comprises only columns of residues that play similar roles in the proteins' mechanism and/or that evolve in a comparable structural context. Since the alignment reflects the result of biological selection and conservation, it has relatively few indels and the indels it has are usually not placed into elements of secondary structure or into functional motifs. For example, the contiguous features annotated for Mbp1 are expected to be left intact by a good alignment.

A poor MSA has many errors in its columns; these contain residues that actually have different functions or structural roles, even though they may look similar according to a (pairwise!) scoring matrix. A poor MSA also may have introduced indels in biologically irrelevant positions, to maximize spurious sequence similarities. Some of the features annotated for Mbp1 will be disrupted in a poor alignment and residues that are conserved may be placed into different columns.

Often errors or inconsistencies are easy to spot. The main goal of manual editing is to make an alignment biologically more plausible. Most commonly this means to mimize the number of rare evolutionary events that the alignment suggests and/or to emphasize conservation of known functional motifs. Here are some examples:

Reduce number of indels

From a Probcons alignment:
0447_DEBHA    ILKTE-K-T---K--SVVK      ILKTE----KTK---SVVK
9978_GIBZE    MLGLN-PGLKEIT--HSIT      MLGLNPGLKEIT---HSIT
1513_CANAL    ILKTE-K-I---K--NVVK      ILKTE----KIK---NVVK
6132_SCHPO    ELDDI-I-ESGDY--ENVD      ELDDI-IESGDY---ENVD
1244_ASPFU    ----N-PGLREIC--HSIT  ->  ----NPGLREIC---HSIT
0925_USTMA    LVKTC-PALDPHI--TKLK      LVKTCPALDPHI---TKLK
2599_ASPTE    VLDAN-PGLREIS--HSIT      VLDANPGLREIS---HSIT
9773_DEBHA    LLESTPKQYHQHI--KRIR      LLESTPKQYHQHI--KRIR
0918_CANAL    LLESTPKEYQQYI--KRIR      LLESTPKEYQQYI--KRIR

Gaps marked in red were moved. The sequence similarity in the alignment does not change considerably, however the total number of indels in this excerpt is reduced to 13 from the original 22

Move indels to more plausible position

From a CLUSTAL alignment:
4966_CANGL     MKHEKVQ------GGYGRFQ---GTW      MKHEKVQ------GGYGRFQ---GTW
1513_CANAL     KIKNVVK------VGSMNLK---GVW      KIKNVVK------VGSMNLK---GVW
6132_SCHPO     VDSKHP-----------QID---GVW  ->  VDSKHPQ-----------ID---GVW
1244_ASPFU     EICHSIT------GGALAAQ---GYW      EICHSIT------GGALAAQ---GYW

The two characters marked in red were swapped. This does not change the number of indels but places the "Q" into a a column in which it is more highly conserved (green). Progressive alignments are especially prone to this type of error.

Conserve motifs

From a CLUSTAL W alignment:
6166_SCHPO      --DKRVA---GLWVPP      --DKRVA--G-LWVPP
XBP1_SACCE      GGYIKIQ---GTWLPM      GGYIKIQ--G-TWLPM
6355_ASPTE      --DEIAG---NVWISP  ->  ---DEIA--GNVWISP
5262_KLULA      GGYIKIQ---GTWLPY      GGYIKIQ--G-TWLPY

The first of the two residues marked in red is a conserved, solvent exposed hydrophobic residue that may mediate domain interactions. The second residue is the conserved glycine in a beta turn that cannot be mutated without structural disruption. Changing the position of a gap and insertion in one sequence improves the conservation of both motifs.

An example of alignment editing for ankyrin domains.

This is example below came from alignment editing in JALVIEW. Columns were coloured by hydrophobicity, and the examples were exported to HTML and then pasted into the page source. Not that the bottom row of the alignment contains a manually added sequence that represents secondary structure elements that were determined by X-ray crystallography of the Swi6 ankyrin domain.

10
|

20
|

30
|

40
|

MBP1_USTMA/341-368

-

Y

G

D

Q

L

-

A

D

-

I

L

-

N

F

Q

D

E

G

E

T

P

L

T

M

A

R

A

R

S

MBP1B_SCHCO/470-498

-

R

E

D

G

D

Y

-

K

S

-

F

L

-

D

L

Q

D

E

H

G

D

T

A

L

N

I

A

R

V

G

N

MBP1_ASHGO/465-494

F

S

P

Q

Y

R

I

-

E

T

-

L

I

-

N

A

Q

D

C

K

G

S

T

P

L

H

I

A

M

N

R

D

MBP1_CLALU/550-586

G

N

Q

N

G

N

S

N

D

K

E

-

L

I

S

K

F

L

N

H

Q

D

N

E

G

N

T

A

F

H

I

A

Y

N

M

S

MBPA_COPCI/514-542

-

H

E

G

D

F

-

R

S

-

L

V

-

D

L

Q

D

E

H

G

D

T

A

I

N

I

A

R

V

G

N

MBP1_DEBHA/507-550

I

R

D

S

Q

E

I

-

E

N

K

L

S

L

S

D

K

E

L

I

A

K

F

I

N

H

Q

D

I

D

G

N

T

A

F

H

I

V

A

Y

N

L

N

MBP1A_SCHCO/388-415

-

Y

P

K

E

L

-

A

D

-

V

L

-

N

F

Q

D

E

D

G

E

T

A

L

T

M

A

R

C

R

S

MBP1_AJECA/374-403

T

L

P

H

Q

I

-

S

M

-

L

-

S

Q

D

S

N

G

D

T

A

L

A

K

N

G

C

MBP1_PARBR/380-409

I

L

P

H

Q

I

-

S

L

-

L

-

S

Q

D

S

N

G

D

T

A

L

A

K

N

G

C

MBP1_NEOFI/363-392

T

C

S

Q

D

E

I

-

D

L

-

L

-

S

C

Q

D

S

N

G

D

T

A

L

V

A

R

N

G

A

MBP1_ASPNI/365-394

T

F

S

P

E

V

-

D

L

-

L

-

S

C

Q

D

S

V

G

D

T

A

V

L

V

A

R

N

G

V

MBP1_UNCRE/377-406

M

Y

P

H

E

V

-

G

L

-

L

-

A

S

Q

D

S

N

G

D

T

A

L

T

A

K

N

G

C

MBP1_PENCH/439-468

T

C

S

Q

D

E

I

-

Q

M

-

L

-

S

C

Q

D

Q

N

G

D

T

A

V

L

V

A

R

N

G

A

MBPA_TRIVE/407-436

V

F

P

R

H

E

I

-

S

L

-

L

-

S

Q

D

A

N

G

D

T

A

L

T

A

K

N

G

C

MBP1_PHANO/400-429

T

W

I

P

E

V

-

T

R

-

L

-

N

A

Q

D

Q

N

G

D

T

A

I

M

I

A

R

N

G

A

MBPA_SCLSC/294-313

-

L

-

D

A

R

D

I

N

G

N

T

A

I

H

I

A

K

N

K

A

MBPA_PYRIS/363-392

T

W

I

P

E

V

-

T

R

-

L

-

N

A

D

Q

N

G

D

T

A

I

M

I

A

R

N

G

A

MBP1_/361-390

-

N

H

S

L

G

V

L

S

Q

-

F

M

-

D

T

Q

N

E

G

D

T

A

L

H

I

L

A

R

S

G

A

MBP1_ASPFL/328-364

T

E

Q

P

G

E

V

I

T

L

G

R

-

F

I

S

E

I

V

N

L

R

D

Q

G

D

T

A

L

N

L

A

G

R

A

R

S

MBPA_MAGOR/375-404

Q

H

D

P

N

F

V

-

Q

-

L

-

D

A

Q

D

N

D

G

N

T

A

V

H

L

A

Q

R

G

S

MBP1_CHAGL/361-390

S

R

S

A

D

E

L

-

Q

-

L

-

D

S

Q

D

N

E

G

N

T

A

V

H

L

A

M

R

D

A

MBP1_PODAN/372-401

V

R

Q

P

E

V

-

Q

A

-

L

-

D

A

Q

D

E

G

N

T

A

L

H

L

A

R

V

N

A

MBP1_LACTH/458-487

F

S

P

R

Y

R

I

-

E

N

-

L

I

-

N

A

Q

D

Q

N

G

D

T

A

V

H

L

A

Q

N

G

D

MBP1_FILNE/433-460

-

Y

P

Q

E

L

-

A

D

-

V

I

-

N

F

Q

D

E

G

E

T

A

L

T

I

A

R

A

R

S

MBP1_KLULA/477-506

F

T

P

Q

Y

R

I

-

D

V

-

L

I

-

N

Q

D

N

D

G

N

S

P

L

H

Y

A

T

N

K

D

MBP1_SCHST/468-501

A

K

D

P

D

N

K

-

K

D

-

L

I

A

K

F

I

N

H

Q

D

S

D

G

N

T

A

F

H

I

C

S

H

N

L

N

MBP1_SACCE/496-525

F

S

P

Q

Y

R

I

-

E

L

-

L

-

N

T

Q

D

K

N

G

D

T

A

L

H

I

A

S

K

N

G

D

CD00204/1-19

-

N

A

R

D

E

D

G

R

T

P

L

H

L

A

S

N

G

H

CD00204/99-118

-

V

-

N

A

R

D

K

D

G

R

T

P

L

H

L

A

K

N

G

H

1SW6/203-232

L

D

L

K

W

I

-

A

N

-

M

L

-

N

A

Q

D

S

N

G

D

T

C

L

N

I

A

R

L

G

N

SecStruc/203-232

t

_

H

-

H

-

_

-

_

t

_

H

_

Aligned sequences before editing. The algorithm has placed gaps into the Swi6 helix LKWIIAN and the four-residue gaps before the block of well aligned sequence on the right are poorly supported.

10
|

20
|

30
|

40
|

MBP1_USTMA/341-368

-

Y

G

D

Q

L

A

D

-

I

L

N

F

Q

D

E

G

E

T

P

L

T

M

A

R

A

R

S

MBP1B_SCHCO/470-498

-

R

E

D

G

D

Y

K

S

-

F

L

D

L

Q

D

E

H

G

D

T

A

L

N

I

A

R

V

G

N

MBP1_ASHGO/465-494

F

S

P

Q

Y

R

I

E

T

-

L

I

N

A

Q

D

C

K

G

S

T

P

L

H

I

A

M

N

R

D

MBP1_CLALU/550-586

G

N

Q

N

G

N

S

N

D

K

E

-

L

I

S

K

F

L

N

H

Q

D

N

E

G

N

T

A

F

H

I

A

Y

N

M

S

MBPA_COPCI/514-542

-

H

E

G

D

F

R

S

-

L

V

D

L

Q

D

E

H

G

D

T

A

I

N

I

A

R

V

G

N

MBP1_DEBHA/507-550

I

R

D

S

Q

E

I

E

N

K

L

S

L

S

D

K

E

L

I

A

K

F

I

N

H

Q

D

I

D

G

N

T

A

F

H

I

V

A

Y

N

L

N

MBP1A_SCHCO/388-415

-

Y

P

K

E

L

A

D

-

V

L

N

F

Q

D

E

D

G

E

T

A

L

T

M

A

R

C

R

S

MBP1_AJECA/374-403

T

L

P

H

Q

I

S

M

-

L

S

Q

D

S

N

G

D

T

A

L

A

K

N

G

C

MBP1_PARBR/380-409

I

L

P

H

Q

I

S

L

-

L

S

Q

D

S

N

G

D

T

A

L

A

K

N

G

C

MBP1_NEOFI/363-392

T

C

S

Q

D

E

I

D

L

-

L

S

C

Q

D

S

N

G

D

T

A

L

V

A

R

N

G

A

MBP1_ASPNI/365-394

T

F

S

P

E

V

D

L

-

L

S

C

Q

D

S

V

G

D

T

A

V

L

V

A

R

N

G

V

MBP1_UNCRE/377-406

M

Y

P

H

E

V

G

L

-

L

A

S

Q

D

S

N

G

D

T

A

L

T

A

K

N

G

C

MBP1_PENCH/439-468

T

C

S

Q

D

E

I

Q

M

-

L

S

C

Q

D

Q

N

G

D

T

A

V

L

V

A

R

N

G

A

MBPA_TRIVE/407-436

V

F

P

R

H

E

I

S

L

-

L

S

Q

D

A

N

G

D

T

A

L

T

A

K

N

G

C

MBP1_PHANO/400-429

T

W

I

P

E

V

T

R

-

L

N

A

Q

D

Q

N

G

D

T

A

I

M

I

A

R

N

G

A

MBPA_SCLSC/294-313

-

L

D

A

R

D

I

N

G

N

T

A

I

H

I

A

K

N

K

A

MBPA_PYRIS/363-392

T

W

I

P

E

V

T

R

-

L

N

A

D

Q

N

G

D

T

A

I

M

I

A

R

N

G

A

MBP1_/361-390

N

H

S

L

G

V

L

S

Q

-

F

M

D

T

Q

N

E

G

D

T

A

L

H

I

L

A

R

S

G

A

MBP1_ASPFL/328-364

T

E

Q

P

G

E

V

I

T

L

G

R

F

I

S

E

-

I

V

N

L

R

D

Q

G

D

T

A

L

N

L

A

G

R

A

R

S

MBPA_MAGOR/375-404

Q

H

D

P

N

F

V

Q

-

L

D

A

Q

D

N

D

G

N

T

A

V

H

L

A

Q

R

G

S

MBP1_CHAGL/361-390

S

R

S

A

D

E

L

Q

-

L

D

S

Q

D

N

E

G

N

T

A

V

H

L

A

M

R

D

A

MBP1_PODAN/372-401

V

R

Q

P

E

V

Q

A

-

L

D

A

Q

D

E

G

N

T

A

L

H

L

A

R

V

N

A

MBP1_LACTH/458-487

F

S

P

R

Y

R

I

E

N

-

L

I

N

A

Q

D

Q

N

G

D

T

A

V

H

L

A

Q

N

G

D

MBP1_FILNE/433-460

-

Y

P

Q

E

L

A

D

-

V

I

N

F

Q

D

E

G

E

T

A

L

T

I

A

R

A

R

S

MBP1_KLULA/477-506

F

T

P

Q

Y

R

I

D

V

-

L

I

N

Q

D

N

D

G

N

S

P

L

H

Y

A

T

N

K

D

MBP1_SCHST/468-501

A

K

D

P

D

N

K

D

-

L

I

A

K

F

I

N

H

Q

D

S

D

G

N

T

A

F

H

I

C

S

H

N

L

N

MBP1_SACCE/496-525

F

S

P

Q

Y

R

I

E

L

-

L

N

T

Q

D

K

N

G

D

T

A

L

H

I

A

S

K

N

G

D

CD00204/1-19

-

N

A

R

D

E

D

G

R

T

P

L

H

L

A

S

N

G

H

CD00204/99-118

-

V

N

A

R

D

K

D

G

R

T

P

L

H

L

A

K

N

G

H

1SW6/203-232

L

D

L

K

W

I

A

N

-

M

L

N

A

Q

D

S

N

G

D

T

C

L

N

I

A

R

L

G

N

SecStruc/203-232

t

_

H

-

_

t

_

H

_

Aligned sequence after editing. A significant cleanup of the frayed region is possible. Now there is only one insertion event, and it is placed into the loop that connects two helices of the 1SW6 structure.

Notes

↑ A good example how systematic tweaking of parameters can improve alignments is here:

Long et al. (2016) Determination of optimal parameters of MAFFT program based on BAliBASE3.0 database. Springerplus 5:736. (pmid: 27376004)

[ PubMed ] [ DOI ] Abstract

↑ "indel": insertion / deletion – a difference in sequence length between two aligned sequences that is accommodated by gaps in the alignment. Since we can't tell from the comparison of two sequences whether such a change was introduced by insertion into or deletion from the ancestral sequence, we join both into a portmanteau.

About ...

Author:

Boris Steipe <boris.steipe@utoronto.ca>

Created:

2017-08-05

Modified:

2020-10-07

Version:

1.1

Version history:

1.1 Edit policy update
1.0 2020 Updates
0.1 First stub

This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.

[1] A good example how systematic tweaking of parameters can improve alignments is here:
Long et al. (2016) Determination of optimal parameters of MAFFT program based on BAliBASE3.0 database. Springerplus 5:736. (pmid: 27376004)

[ PubMed ] [ DOI ] Abstract
BACKGROUND: Multiple sequence alignment (MSA) is one of the most important research contents in bioinformatics. A number of MSA programs have emerged. The accuracy of MSA programs highly depends on the parameters setting, mainly including gap open penalties (GOP), gap extension penalties (GEP) and substitution matrix (SM). This research tries to obtain the optimal GOP, GEP and SM rather than MAFFT default parameters. RESULTS: The paper discusses the MAFFT program benchmarked on BAliBASE3.0 database, and the optimal parameters of MAFFT program are obtained, which are better than the default parameters of CLUSTALW and MAFFT program. CONCLUSIONS: The optimal parameters can improve the results of multiple sequence alignment, which is feasible and efficient.

[2] "indel": insertion / deletion – a difference in sequence length between two aligned sequences that is accommodated by gaps in the alignment. Since we can't tell from the comparison of two sequences whether such a change was introduced by insertion into or deletion from the ancestral sequence, we join both into a portmanteau.

[1]

[2]

BIN-ALI-MSA

Contents

Evaluation

Contents

Multiple Sequence Alignment

MSA's on the web at the EBI

MSA's in R

Sequence alignment editors

Alignment Editing

Further reading, links and resources

Notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Sections

Tools