Optimality of the Genetic Code: an R Exploration

Contents
Further Reading
Questions, comments
References

Expected Preparations:

	Biomolecules: The molecules of life; The genetic code; Nucleic acids; Amino acids; Protein folding; Post-translational modifications and protein biochemistry; Membrane proteins; Biological function.		The Central Dogma: Regulation of transcription and translation; Protein biosynthesis and degradation; Quality control.		Evolution: Theory of evolution; Variation, neutral drift and selection.		[BIN] Sequence
	If you are not already familiar with the prior knowledge listed above, you need to prepare yourself from other information sources.						The units listed above are part of this course and contain important preparatory material.

Keywords: Simulating genetic code optimality

Objectives:

This unit will …

… introduce the concept of estimating evolutionary pressure on the genetic code by quantifying the effect of mutations;
… demonstrate how a computational experiment is conducted;
… teach some programming techniques for working with sequences and sequence variations;

Outcomes:

After working through this unit you …

… are familar with the concept of an optimized genetic code;
… can set up a computational experiment;
… can write code to mutate and translate sequences.

Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.

Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these.

Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

Evaluation:

NA: This unit is not evaluated for course marks.

This unit explores R code to test the idea that the genetic code is not random.

Task…

Open RStudio and load the ABC-units R project. If you have loaded it before, choose File ▸ Recent projects ▸ ABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit.
Choose Tools ▸ Version Control ▸ Pull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included. This ensures that your data and code remain up to date when we update, or fix bugs.
Type init() if requested.
Open the file RPR-Genetic_code_optimality.R and follow the instructions.

Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.

Symmetry is one of the essential and most visible patterns that can be seen in nature. Starting from the left-right symmetry of the human body, all types of symmetry can be found in crystals, plants, animals and nature as a whole. Similarly, principals of symmetry are also some of the fundamental and most useful tools in modern mathematical natural science that play a major role in theory and applications. As a consequence, it is not surprising that the desire to understand the origin of life, based on the genetic code, forces us to involve symmetry as a mathematical concept. The genetic code can be seen as a key to biological self-organisation. All living organisms have the same molecular bases - an alphabet consisting of four letters (nitrogenous bases): adenine, cytosine, guanine, and thymine. Linearly ordered sequences of these bases contain the genetic information for synthesis of proteins in all forms of life. Thus, one of the most fascinating riddles of nature is to explain why the genetic code is as it is. Genetic coding possesses noise immunity which is the fundamental feature that allows to pass on the genetic information from parents to their descendants. Hence, since the time of the discovery of the genetic code, scientists have tried to explain the noise immunity of the genetic information. In this chapter we will discuss recent results in mathematical modelling of the genetic code with respect to noise immunity, in particular error-detection and error-correction. We will focus on two central properties: Degeneracy and frameshift correction.

Koonin, Eugene V and Artem S Novozhilov. (2017). “Origin and Evolution of the Universal Genetic Code”. Annual Review of Genetics 51:45–62 .
[PMID: 28853922] [DOI: 10.1146/annurev-genet-120116-024713]

Abstract …

The standard genetic code (SGC) is virtually universal among extant life forms. Although many deviations from the universal code exist, particularly in organelles and prokaryotes with small genomes, they are limited in scope and obviously secondary. The universality of the code likely results from the combination of a frozen accident, i.e., the deleterious effect of codon reassignment in the SGC, and the inhibitory effect of changes in the code on horizontal gene transfer. The structure of the SGC is nonrandom and ensures high robustness of the code to mutational and translational errors. However, this error minimization is most likely a by-product of the primordial code expansion driven by the diversification of the repertoire of protein amino acids, rather than a direct result of selection. Phylogenetic analysis of translation system components, in particular aminoacyl-tRNA synthetases, shows that, at a stage of evolution when the translation system had already attained high fidelity, the correspondence between amino acids and cognate codons was determined by recognition of amino acids by RNA molecules, i.e., proto-tRNAs. We propose an experimentally testable scenario for the evolution of the code that combines recognition of amino acids by unique sites on proto-tRNAs (distinct from the anticodons), expansion of the code via proto-tRNA duplication, and frozen accident.

Questions, comments

If in doubt, ask! If anything about this contents is not clear to you, do not proceed but ask for clarification. If you have ideas about how to make this material better, let’s hear them. We are aiming to compile a list of FAQs for all learning units, and your contributions will count towards your participation marks.

Improve this page! If you have questions or comments, please post them on the Quercus Discussion board with a subject line that includes the name of the unit.

References

About this page …

[END]

Optimality of the Genetic Code: an R Exploration

Boris Steipe

Contents

Further Reading

Questions, comments

References