Glossary
Jump to navigation
Jump to search
E-value
- Expectation-value of a BLAST search
Thee E-value reported for each BLAST-hit represents the number of alternate alignments, with the same or better total score, that could be expected to occur within the database purely by chance. Thus, the lower the E-value, the more significant the match. The value depends upon the quality and length of the alignment, as well as the size of the database.
FASTA format
- FASTA is a simple, ASCII based, text-file format for biological sequences. Minimally a FASTA file comprises a header line, initiated with the ">" character, followed by one or more lines containing nucleic acid or protein sequence in one-letter code. This is the most common input format for bioinformatics analysis programs and services.
- Example
>gi|3402004|pdb|1MB1| Mbp1 From Saccharomyces Cerevisiae MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH
HSP
- High-scoring Sequence Pair
The fundamental unit of BLAST output. An HSP consists of two sequence fragments of arbitrary but equal length (i.e. if necessary adjusted through gaps) whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score.
- Example
>ref|NP_012165.1| Transcriptional repressor that binds to promoter sequences of the cyclin genes, CYS3, and SMF2; expression is induced by stress or starvation during mitosis, and late in meiosis; member of the Swi4p/Mbp1p family; potential Cdc28p substrate; Xbp1p [Saccharomyces cerevisiae] Length=647 Score = 50.7 bits (122), Expect = 1e-06, Method: Composition-based stats. Identities = 16/43 (37%), Positives = 27/43 (62%), Gaps = 4/43 (9%) Query 42 KVQGGFGKYQGTWVPLNIAKQLAEK--FSVYDQLKPLF--DFT 80 +++GG+ K QGTW+P+ I++ L + F + L P+F DF Sbjct 347 RIRGGYIKIQGTWLPMEISRLLCLRFCFPIRYFLVPIFGPDFP 389
multi FASTA file
- A sequence file that contains more than one FASTA formatted sequence. The sequences are simply concatenated. This is a common input format for multiple sequence alignment or motif-finding programs.
- Example
>Homeobox associated Leucine Zipper from gi|3868845 (134..178) KQTEVDCELLRKCCASLTEENRRLQMEVDQLRALSTTQLHFSDFV >Homeobox associated Leucine Zipper from gi 21264431 (168..212) KQTEVDCEFLKKCCETLADENIRLQKEIQELKTLKLTQPFYMHMP >Homeobox associated Leucine Zipper from gi|6634483 (212.. 256) KQTEVDCELLKRCCETLTDENRRLHRELQELRALKLATAAAAPHH