Difference between revisions of "Glossary"

From "A B C"
Jump to navigation Jump to search
Line 21: Line 21:
 
====HSP====
 
====HSP====
 
;High Scoring Pair
 
;High Scoring Pair
The fundamental ''unit'' of BLAST output. An HSP consists of an ungapped, local alignment result.
+
The fundamental ''unit'' of BLAST output. An HSP consists of an ungapped, local alignment result. HSPs are extended by the algorithm to so-called BLAST hits.
  
 
;Example
 
;Example
 
>ref|NP_012165.1| Transcriptional repressor that binds to promoter sequences of
 
the cyclin genes, CYS3, and SMF2; expression is induced by
 
stress or starvation during mitosis, and late in meiosis; member
 
of the Swi4p/Mbp1p family; potential Cdc28p substrate;
 
Xbp1p [Saccharomyces cerevisiae]
 
Length=647
 
 
  Score = 50.7 bits (122),  Expect = 1e-06, Method: Composition-based stats.
 
  Identities = 16/43 (37%), Positives = 27/43 (62%), Gaps = 4/43 (9%)
 
 
   
 
   
 
  Query  42  KVQGGFGKYQGTWVPLNIAKQLAEK
 
  Query  42  KVQGGFGKYQGTWVPLNIAKQLAEK
             +++GG+ K QGTW+P+ I++ L  + F +  L P+F  DF
+
             +++GG+ K QGTW+P+ I++ L  +  
  Sbjct  347  RIRGGYIKIQGTWLPMEISRLLCLRFCFPIRYFLVPIFGPDFP  389
+
  Sbjct  347  RIRGGYIKIQGTWLPMEISRLLCLR
  
 
====multi FASTA file====
 
====multi FASTA file====

Revision as of 23:26, 18 November 2006


E-value

Expectation-value of a BLAST search

The E-value reported for each BLAST-hit represents the number of alternate alignments, with the same or better total score, that could be expected to occur in the database purely by chance. Thus, the lower the E-value, the more significant the match. The value depends upon the quality and length of the alignment, as well as the size of the database.

FASTA format

FASTA is a simple, ASCII based, text-file format for biological sequences. Minimally a FASTA file comprises a header line, initiated with the ">" character, followed by one or more lines containing nucleic acid or protein sequence in one-letter code. This is the most common input format for bioinformatics analysis programs and services.
Example
>gi|3402004|pdb|1MB1|  Mbp1 From Saccharomyces Cerevisiae
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH

(Detailed information)


HSP

High Scoring Pair

The fundamental unit of BLAST output. An HSP consists of an ungapped, local alignment result. HSPs are extended by the algorithm to so-called BLAST hits.

Example
Query  42   KVQGGFGKYQGTWVPLNIAKQLAEK
            +++GG+ K QGTW+P+ I++ L  + 
Sbjct  347  RIRGGYIKIQGTWLPMEISRLLCLR

multi FASTA file

A sequence file that contains more than one FASTA formatted sequence. The sequences are simply concatenated. This is a common input format for multiple sequence alignment or motif-finding programs.
Example
>Homeobox associated Leucine Zipper from gi|3868845  (134..178)
KQTEVDCELLRKCCASLTEENRRLQMEVDQLRALSTTQLHFSDFV
>Homeobox associated Leucine Zipper from gi 21264431 (168..212)
KQTEVDCEFLKKCCETLADENIRLQKEIQELKTLKLTQPFYMHMP
>Homeobox associated Leucine Zipper from gi|6634483  (212.. 256)
KQTEVDCELLKRCCETLTDENRRLHRELQELRALKLATAAAAPHH

(Detailed information)