Difference between revisions of "Glossary"

From "A B C"
Jump to navigation Jump to search
Line 2: Line 2:
  
  
 +
====E-value====
 +
;Expectation-value of a BLAST search
  
 +
Thee E-value reported for each BLAST-hit represents the number of alternate alignments, with the same or better total score, that could be expected to occur within the database purely by chance. Thus, the lower the E-value, the more significant the match. The value depends upon the quality and length of the alignment, as well as the size of the database.
  
  
Line 17: Line 20:
  
  
 
+
====HSP====
 +
;High-scoring Sequence Pair
 +
The fundamental ''unit'' of BLAST output. An HSP consists of two sequence fragments of arbitrary but equal length (i.e. if necessary adjusted through gaps) whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score.
  
 
+
;Example
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 
+
>ref|NP_012165.1| Transcriptional repressor that binds to promoter sequences of
 +
the cyclin genes, CYS3, and SMF2; expression is induced by
 +
stress or starvation during mitosis, and late in meiosis; member
 +
of the Swi4p/Mbp1p family; potential Cdc28p substrate;
 +
Xbp1p [Saccharomyces cerevisiae]
 +
Length=647
 +
 +
  Score = 50.7 bits (122),  Expect = 1e-06, Method: Composition-based stats.
 +
  Identities = 16/43 (37%), Positives = 27/43 (62%), Gaps = 4/43 (9%)
 +
 +
Query  42  KVQGGFGKYQGTWVPLNIAKQLAEK--FSVYDQLKPLF--DFT  80
 +
            +++GG+ K QGTW+P+ I++ L  +  F +  L P+F  DF
 +
Sbjct  347  RIRGGYIKIQGTWLPMEISRLLCLRFCFPIRYFLVPIFGPDFP  389
  
 
 
  
 
 
  
====multi FASTA format====
+
====multi FASTA file====
 
:A sequence file that contains more than one [[#FASTA_format|FASTA formatted]] sequence. The sequences are simply concatenated. This is a common input format for multiple sequence alignment or motif-finding programs.
 
:A sequence file that contains more than one [[#FASTA_format|FASTA formatted]] sequence. The sequences are simply concatenated. This is a common input format for multiple sequence alignment or motif-finding programs.
  

Revision as of 22:20, 18 November 2006


E-value

Expectation-value of a BLAST search

Thee E-value reported for each BLAST-hit represents the number of alternate alignments, with the same or better total score, that could be expected to occur within the database purely by chance. Thus, the lower the E-value, the more significant the match. The value depends upon the quality and length of the alignment, as well as the size of the database.


FASTA format

FASTA is a simple, ASCII based, text-file format for biological sequences. Minimally a FASTA file comprises a header line, initiated with the ">" character, followed by one or more lines containing nucleic acid or protein sequence in one-letter code. This is the most common input format for bioinformatics analysis programs and services.
Example
>gi|3402004|pdb|1MB1|  Mbp1 From Saccharomyces Cerevisiae
MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDHHHHHH

(Detailed information)


HSP

High-scoring Sequence Pair

The fundamental unit of BLAST output. An HSP consists of two sequence fragments of arbitrary but equal length (i.e. if necessary adjusted through gaps) whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score.

Example
>ref|NP_012165.1| Transcriptional repressor that binds to promoter sequences of 
the cyclin genes, CYS3, and SMF2; expression is induced by 
stress or starvation during mitosis, and late in meiosis; member 
of the Swi4p/Mbp1p family; potential Cdc28p substrate; 
Xbp1p [Saccharomyces cerevisiae]
Length=647

 Score = 50.7 bits (122),  Expect = 1e-06, Method: Composition-based stats.
 Identities = 16/43 (37%), Positives = 27/43 (62%), Gaps = 4/43 (9%)

Query  42   KVQGGFGKYQGTWVPLNIAKQLAEK--FSVYDQLKPLF--DFT  80
            +++GG+ K QGTW+P+ I++ L  +  F +   L P+F  DF 
Sbjct  347  RIRGGYIKIQGTWLPMEISRLLCLRFCFPIRYFLVPIFGPDFP  389


multi FASTA file

A sequence file that contains more than one FASTA formatted sequence. The sequences are simply concatenated. This is a common input format for multiple sequence alignment or motif-finding programs.
Example
>Homeobox associated Leucine Zipper from gi|3868845  (134..178)
KQTEVDCELLRKCCASLTEENRRLQMEVDQLRALSTTQLHFSDFV
>Homeobox associated Leucine Zipper from gi 21264431 (168..212)
KQTEVDCEFLKKCCETLADENIRLQKEIQELKTLKLTQPFYMHMP
>Homeobox associated Leucine Zipper from gi|6634483  (212.. 256)
KQTEVDCELLKRCCETLTDENRRLHRELQELRALKLATAAAAPHH

(Detailed information)