Glossary

From "A B C"
Revision as of 22:20, 18 November 2006 by Boris (talk | contribs)
Jump to navigation Jump to search


E-value

Expectation-value of a BLAST search

Thee E-value reported for each BLAST-hit represents the number of alternate alignments, with the same or better total score, that could be expected to occur within the database purely by chance. Thus, the lower the E-value, the more significant the match. The value depends upon the quality and length of the alignment, as well as the size of the database.


FASTA format

FASTA is a simple, ASCII based, text-file format for biological sequences. Minimally a FASTA file comprises a header line, initiated with the ">" character, followed by one or more lines containing nucleic acid or protein sequence in one-letter code. This is the most common input format for bioinformatics analysis programs and services.
Example
>gi|3402004|pdb|1MB1|  Mbp1 From Saccharomyces Cerevisiae
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH

(Detailed information)


HSP

High-scoring Sequence Pair

The fundamental unit of BLAST output. An HSP consists of two sequence fragments of arbitrary but equal length (i.e. if necessary adjusted through gaps) whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score.

Example
>ref|NP_012165.1| Transcriptional repressor that binds to promoter sequences of 
the cyclin genes, CYS3, and SMF2; expression is induced by 
stress or starvation during mitosis, and late in meiosis; member 
of the Swi4p/Mbp1p family; potential Cdc28p substrate; 
Xbp1p [Saccharomyces cerevisiae]
Length=647

 Score = 50.7 bits (122),  Expect = 1e-06, Method: Composition-based stats.
 Identities = 16/43 (37%), Positives = 27/43 (62%), Gaps = 4/43 (9%)

Query  42   KVQGGFGKYQGTWVPLNIAKQLAEK--FSVYDQLKPLF--DFT  80
            +++GG+ K QGTW+P+ I++ L  +  F +   L P+F  DF 
Sbjct  347  RIRGGYIKIQGTWLPMEISRLLCLRFCFPIRYFLVPIFGPDFP  389


multi FASTA file

A sequence file that contains more than one FASTA formatted sequence. The sequences are simply concatenated. This is a common input format for multiple sequence alignment or motif-finding programs.
Example
>Homeobox associated Leucine Zipper from gi|3868845  (134..178)
KQTEVDCELLRKCCASLTEENRRLQMEVDQLRALSTTQLHFSDFV
>Homeobox associated Leucine Zipper from gi 21264431 (168..212)
KQTEVDCEFLKKCCETLADENIRLQKEIQELKTLKLTQPFYMHMP
>Homeobox associated Leucine Zipper from gi|6634483  (212.. 256)
KQTEVDCELLKRCCETLTDENRRLHRELQELRALKLATAAAAPHH

(Detailed information)