Difference between revisions of "Glossary"
Jump to navigation
Jump to search
Line 5: | Line 5: | ||
;Expectation-value of a BLAST search | ;Expectation-value of a BLAST search | ||
− | + | The E-value reported for each BLAST-hit represents the number of alternate alignments, with the same or better total score, that could be expected to occur in the database purely by chance. Thus, the lower the E-value, the more significant the match. The value depends upon the quality and length of the alignment, as well as the size of the database. | |
− | |||
====FASTA format==== | ====FASTA format==== |
Revision as of 22:21, 18 November 2006
E-value
- Expectation-value of a BLAST search
The E-value reported for each BLAST-hit represents the number of alternate alignments, with the same or better total score, that could be expected to occur in the database purely by chance. Thus, the lower the E-value, the more significant the match. The value depends upon the quality and length of the alignment, as well as the size of the database.
FASTA format
- FASTA is a simple, ASCII based, text-file format for biological sequences. Minimally a FASTA file comprises a header line, initiated with the ">" character, followed by one or more lines containing nucleic acid or protein sequence in one-letter code. This is the most common input format for bioinformatics analysis programs and services.
- Example
>gi|3402004|pdb|1MB1| Mbp1 From Saccharomyces Cerevisiae MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH
HSP
- High-scoring Sequence Pair
The fundamental unit of BLAST output. An HSP consists of two sequence fragments of arbitrary but equal length (i.e. if necessary adjusted through gaps) whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score.
- Example
>ref|NP_012165.1| Transcriptional repressor that binds to promoter sequences of the cyclin genes, CYS3, and SMF2; expression is induced by stress or starvation during mitosis, and late in meiosis; member of the Swi4p/Mbp1p family; potential Cdc28p substrate; Xbp1p [Saccharomyces cerevisiae] Length=647 Score = 50.7 bits (122), Expect = 1e-06, Method: Composition-based stats. Identities = 16/43 (37%), Positives = 27/43 (62%), Gaps = 4/43 (9%) Query 42 KVQGGFGKYQGTWVPLNIAKQLAEK--FSVYDQLKPLF--DFT 80 +++GG+ K QGTW+P+ I++ L + F + L P+F DF Sbjct 347 RIRGGYIKIQGTWLPMEISRLLCLRFCFPIRYFLVPIFGPDFP 389
multi FASTA file
- A sequence file that contains more than one FASTA formatted sequence. The sequences are simply concatenated. This is a common input format for multiple sequence alignment or motif-finding programs.
- Example
>Homeobox associated Leucine Zipper from gi|3868845 (134..178) KQTEVDCELLRKCCASLTEENRRLQMEVDQLRALSTTQLHFSDFV >Homeobox associated Leucine Zipper from gi 21264431 (168..212) KQTEVDCEFLKKCCETLADENIRLQKEIQELKTLKLTQPFYMHMP >Homeobox associated Leucine Zipper from gi|6634483 (212.. 256) KQTEVDCELLKRCCETLTDENRRLHRELQELRALKLATAAAAPHH