Difference between revisions of "Glossary"
Jump to navigation
Jump to search
(→HSP) |
(→HSP) |
||
Line 21: | Line 21: | ||
====HSP==== | ====HSP==== | ||
;High Scoring Pair | ;High Scoring Pair | ||
− | The fundamental ''unit'' of BLAST output. An HSP consists of an ungapped, local alignment result. | + | The fundamental ''unit'' of BLAST output. An HSP consists of an ungapped, local alignment result. HSPs are extended by the algorithm to so-called BLAST hits. |
;Example | ;Example | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Query 42 KVQGGFGKYQGTWVPLNIAKQLAEK | Query 42 KVQGGFGKYQGTWVPLNIAKQLAEK | ||
− | +++GG+ K QGTW+P+ I++ L + | + | +++GG+ K QGTW+P+ I++ L + |
− | Sbjct 347 | + | Sbjct 347 RIRGGYIKIQGTWLPMEISRLLCLR |
====multi FASTA file==== | ====multi FASTA file==== |
Revision as of 23:26, 18 November 2006
E-value
- Expectation-value of a BLAST search
The E-value reported for each BLAST-hit represents the number of alternate alignments, with the same or better total score, that could be expected to occur in the database purely by chance. Thus, the lower the E-value, the more significant the match. The value depends upon the quality and length of the alignment, as well as the size of the database.
FASTA format
- FASTA is a simple, ASCII based, text-file format for biological sequences. Minimally a FASTA file comprises a header line, initiated with the ">" character, followed by one or more lines containing nucleic acid or protein sequence in one-letter code. This is the most common input format for bioinformatics analysis programs and services.
- Example
>gi|3402004|pdb|1MB1| Mbp1 From Saccharomyces Cerevisiae MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH
HSP
- High Scoring Pair
The fundamental unit of BLAST output. An HSP consists of an ungapped, local alignment result. HSPs are extended by the algorithm to so-called BLAST hits.
- Example
Query 42 KVQGGFGKYQGTWVPLNIAKQLAEK +++GG+ K QGTW+P+ I++ L + Sbjct 347 RIRGGYIKIQGTWLPMEISRLLCLR
multi FASTA file
- A sequence file that contains more than one FASTA formatted sequence. The sequences are simply concatenated. This is a common input format for multiple sequence alignment or motif-finding programs.
- Example
>Homeobox associated Leucine Zipper from gi|3868845 (134..178) KQTEVDCELLRKCCASLTEENRRLQMEVDQLRALSTTQLHFSDFV >Homeobox associated Leucine Zipper from gi 21264431 (168..212) KQTEVDCEFLKKCCETLADENIRLQKEIQELKTLKLTQPFYMHMP >Homeobox associated Leucine Zipper from gi|6634483 (212.. 256) KQTEVDCELLKRCCETLTDENRRLHRELQELRALKLATAAAAPHH