Difference between revisions of "Information theory"
m (→Contents) |
|||
Line 38: | Line 38: | ||
==Further reading and resources== | ==Further reading and resources== | ||
+ | {{#pmid: 16916457}} | ||
+ | {{#pmid: 19808039}} | ||
+ | |||
+ | |||
<!-- {{#pmid:21627854}} --> | <!-- {{#pmid:21627854}} --> | ||
<!-- {{WWW|WWW_UniProt}} --> | <!-- {{WWW|WWW_UniProt}} --> |
Revision as of 03:48, 28 October 2012
Information theory
This is an introduction to information theory for the bioinformatics lab.
Contents
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle H = - \sum_{i=0}^n p_i \log_{2} p_i}
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle I = H_{ref} - H_{obs}}
Further reading and resources
Wang & Samudrala (2006) Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics 7:385. (pmid: 16916457) |
[ PubMed ] [ DOI ] BACKGROUND: Several entropy-based methods have been developed for scoring sequence conservation in protein multiple sequence alignments. High scoring amino acid positions may correlate with structurally or functionally important residues. However, amino acid background frequencies are usually not taken into account in these entropy-based scoring schemes. RESULTS: We demonstrate that using a relative entropy measure that incorporates amino acid background frequency results in improved performance in identifying functional sites from protein multiple sequence alignments. CONCLUSION: Our results suggest that the application of appropriate background frequency information may lead to more biologically relevant results in many areas of bioinformatics. |
Dou et al. (2010) Several appropriate background distributions for entropy-based protein sequence conservation measures. J Theor Biol 262:317-22. (pmid: 19808039) |
[ PubMed ] [ DOI ] Amino acid background distribution is an important factor for entropy-based methods which extract sequence conservation information from protein multiple sequence alignments (MSAs). However, MSAs are usually not large enough to allow a reliable observed background distribution. In this paper, we propose two new estimations of background distribution. One is an integration of the observed background distribution and the position-specific residue distribution, and the other is a normalized square root of observed background frequency. To validate these new background distributions, they are applied to the relative entropy model to find catalytic sites and ligand binding sites from protein MSAs. Experimental results show that they are superior to the observed background distribution in predicting functionally important residues. |