Difference between revisions of "Information theory"

From "A B C"
Jump to navigation Jump to search
Line 40: Line 40:
 
{{#pmid: 16916457}}
 
{{#pmid: 16916457}}
 
{{#pmid: 19808039}}
 
{{#pmid: 19808039}}
 
+
<div class="reference-box">[http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf Shannon's "Mathematical Theory of Communication"] (at Bell labs)</div>
  
 
<!-- {{#pmid:21627854}} -->
 
<!-- {{#pmid:21627854}} -->

Revision as of 14:30, 28 October 2012

Information theory


This page is a placeholder, or under current development; it is here principally to establish the logical framework of the site. The material on this page is correct, but incomplete.


This is an introduction to information theory for the bioinformatics lab.



 

Contents


   

Further reading and resources

Wang & Samudrala (2006) Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics 7:385. (pmid: 16916457)

PubMed ] [ DOI ] BACKGROUND: Several entropy-based methods have been developed for scoring sequence conservation in protein multiple sequence alignments. High scoring amino acid positions may correlate with structurally or functionally important residues. However, amino acid background frequencies are usually not taken into account in these entropy-based scoring schemes. RESULTS: We demonstrate that using a relative entropy measure that incorporates amino acid background frequency results in improved performance in identifying functional sites from protein multiple sequence alignments. CONCLUSION: Our results suggest that the application of appropriate background frequency information may lead to more biologically relevant results in many areas of bioinformatics.

Dou et al. (2010) Several appropriate background distributions for entropy-based protein sequence conservation measures. J Theor Biol 262:317-22. (pmid: 19808039)

PubMed ] [ DOI ] Amino acid background distribution is an important factor for entropy-based methods which extract sequence conservation information from protein multiple sequence alignments (MSAs). However, MSAs are usually not large enough to allow a reliable observed background distribution. In this paper, we propose two new estimations of background distribution. One is an integration of the observed background distribution and the position-specific residue distribution, and the other is a normalized square root of observed background frequency. To validate these new background distributions, they are applied to the relative entropy model to find catalytic sites and ligand binding sites from protein MSAs. Experimental results show that they are superior to the observed background distribution in predicting functionally important residues.