HOME      SITE MAP      research     TEACHING      PEOPLE      OPENINGS      LAB      ADDRESS

[Overview]    [Bioinformatics]    [Protein engineering]    [Molecular systems engineering ]    [Publications ]   


The methods and topics of life science research have changed profoundly in the last decade. Genomic science has shifted our attention from hypothesis-driven studies of individual molecules to phenomenological approaches that target comprehensive views of the living cell and its dynamics. Proteomics is setting its sights on a description of the entire complement of cellular components and their interactions.

This development has been technology driven to a large extent; as a corollary we see that the applications domain, especially the biotechnology industry, contributes an unusually large fraction of its momentum. We - as basic scientists in the traditional centres of collaboration, education and non-directed discovery - are challenged not to fall behind: experience shows that sustained progress in the private sector critically depends on the academic sector to provide an uninterrupted stream of insight and innovations in the long-term.

With this in mind, our program of biomolecular engineering research emphasizes the investigation of engineering principles and strategies that may contribute to an understanding of biomolecular systems at a level that goes beyond phenomenological descriptions and short-term application issues. Our topics span bioinformatics, protein engineering and biomolecular nanotechnology. At the core of the program is the analysis and integration of genomics and proteomics data to guide protein engineering, the experimental validation of our predictions, and the application of results for molecular medicine and bio-nanotechnology in a mid- to long-term timeframe.

The cohesive element in our research projects ist the quest to understand complexity in biomolecular systems. Complexity arises from a context dependent behaviour of system components and we observe complexity in many hierarchical layers of structure formation and generation of function, from the genome to the living cell. We focus our work mainly on proteins since protein folding is the quintessential paradigm of self-organising molecular systems. Based on our concepts to address complexity, we develop strategies and algorithms to analyse proteins and engineer them in predictable ways. We apply our understanding to interesting model proteins and biomolecular assemblies and we aim to spin out successful concepts to biotechnology and medicine.

"Engineering" implies the rational application of well understood principles with predicted outcome. Since structured biomolecules are complex systems, the effects of changes are difficult to predict. "Protein Engineering" thus may be regarded to be an oxymoron, a contradiction in terms. Nevertheless, we have been able to provide at least some rationality to the engineering of antibody domains and to the designed construction of biomolecular nanoassemblies. We anticipate further progress in the application of our strategies.

Layers of biological complexity
Layers of biological complexity: (a) Protein sequences are represented in the genome, but transcriptional regulation already provides the first unknown, as to which genes are being transcribed under which circumstances. As the transcript is further processed from left to right in this schematic, regulated alternative splicing will significantly change the m-RNA message of most genes. Finally, after translating the protein, the polypeptide folds to a strcuture that cannot be predicted from knowledge of the sequence information. Thus while cellular behavior as a result of protein function as a result of protein structure as a result of protein sequence is wholly represented in the genome, we have no way of assembling the latter from the former, unless the entire context is completely specified at the same time. This is a little like having to know the answer, in order to be able to ask a question. (b) The problem of hierarchical layers of complexity, where one layer cannot be defined before the entire underlying layer is specified, is complicated further when individual proteins assemble structurally into functional molecular machines, or functionally into metabolic or signalling pathways, and other higher-order cellular subsystems. Dynamic switching by posttranslational modifications becomes an issue, and protein sorting into different compartments must be taken into account, among many other aspects. (c) No less complex, and no more predictable from first principles, is the self-assembly of cells into organs and organisms during development.
Yet the entire process is self-organized, robust and reproducible.

Theory and Bioinformatics

Theoretical and applied bioinformatics provides core technologies for our work. We have contributed two strategies to address the complexity issues that limit rational protein engineering.

The Canonical Sequence Approximation

A first-order approximation is to view amino acids as context-independent elements of protein structure. The hypothesis of a canonical sequence approximation which we have developed, views mutation and selection of the immunoglobulin sequence repertoire in analogy to the concept of an ensemble in statistical thermodynamics. To the degree that mutations are independent and randomly distributed, the most probable distribution of amino acid residues (states) in a canonical immunoglobulin sequence will be described by Boltzmann's law, where the concept of "energy" is replaced by the "fitness" of a domain in selection. To a large degree, the contribution to fitness will be a free energy contribution to thermodynamic stability of the protein. In the simplest application of this hypothesis, the consensus residues of a domain sequence are predicted to be the most stabilizing residues in their respective positions. This is essentially a mean-field approach, in which amino acid residues are approximated to interact with a context that is averaged over a large number of specific sequences by evolution.

IG Domain ensemble The Immunoglobulin Domain "Ensemble": If a number of immunoglobulin domains are superimposed, at low resolution they all look very similar, despite significant sequence differences in typically more than 20 % of their residues. In the Canonical Sequence Approximation, we imagine this low-resolution superposition to comprise an averaged environment for individual amino acid changes, which are otherwise random and independent. The fitness of each domain (approximated by its thermodynamic stability) is assumed to be (nearly) constant - severly disruptive mutations are not observed in the sequence database since they do not lead to functional immunoglobulins. Under these assumptions, the most probable amino acid frequency distribution is a Boltzmann distribution and the consensus residue is the fittest (most stable) choice. We have corroborated this by experiment.

Motif engineering

A second order approximation considers local interactions of amino acid residues only. The concept is the same as above, but this time sequences are aligned from recurring, similar structural fragments from a database of non-related protein structures. We have compiled consensus sequences for these structural motifs and we have been able to show that these sequences can be used for protein engineering.

Structural motifs
A Structural Motif: This stereo view structural motif is found as a result of a database query, where the anchoring residues of an immunoglobulin variable domain loop are used as a search template on a set of unrelated protein structures. We find a cluster of structures from fragments of a different length (thin tubes), yet equivalent anchor-residue conformation, compared to the immunoglobulin loop (thick tubes). We find a cluster of similar conformations and a few isolated alternatives. Whenever we observe clusters of conformatrions such as these, they invariably have very strong sequence propensities in specific poistions - corresponding to locally interacting residues. Consensus sequences from these clusters can be compiled and we have shown that such consensus sequences recreate the amino acid interactions independently of their context.

Future work

Future bioinformatics projects in Toronto will generalize our motif concept and apply it to an exhaustive ananlysis of recurring patterns in proteins, assess strategies for applied bioinformatics and map out concepts of reverse engineering the cell in order to integrate biological knowledge.

Protein Engineering

Rational protein engineering is our approach to validate our theoretical concepts and provides tools and modules for future engineering projects.


We consider Intrabodies to be exemplary results of our rational protein engineering strategies. The immunoglobulin domain is characterised by a conserved structural disulfide bond. Structural disulfides do not form in the reducing environment of the cellular cytoplasm and attempts to express immunglobulin domains in the cytoplasm regularly lead to protein aggregation and degradation. We have applied our stability engineering strategies to the construction of intracellular antibodies (intrabodies). The combination of individual, planned mutations has increased the stability of our prototype domains - a variable domain of a light-chain - from 13 kJ / mol to over 36 kJ / mol. With these hyperstable prototype domains we have shown that it is indeed stability - and not the absence of the disulfide bond as such - that limit folding in the cytoplasm and that the absence of the disulfide bond can be compensated by stability engineering. We were able to use our hyperstable frameworks for the construction of intrabodies in a number of examples, ranging from model domains to a designed catalytic intrabody. This opens the cytoplasm of the living cell for recombinant antibody biotechnology.

Immunoglobulin stabilization Immunoglobulin stability engineering: This schematic shows the location of a number of point mutations that were predicted by the Canonical Sequence Approximation and which we have constructed in an immunoglobulin VL domain. The mutations are stabilizing, as predicted, in 8 of 11 cases and none of the mutations were severly destabilizing. In fact, assuming prior knowledge of domain associations, the predictions would not be made for the positions labelled in gray, since these are involved in specific domain-domain interactions that are absent in the experimental sytem we use. Excluding these residues increases the prediction success to 7 of 8. Since the effect of the indicvidual exchanges ist almost perfectly additive, a combination of these mutations stabilizes the domain so much, that the structural disulfide bridge is not required anymore for folding. Such domains can be expressed as Intrabodies. The method is straightforward, does not require knowledge of the structure, is applicable to scFvs as well as isolated domains and improves expression yields and solubility at the same time.


Autofluorescent proteins have been modified in past work in my laboratory with rational and random mutagenesis and extensively studied by single-molecule- and femtosecond-resolution spectroscopy in various collaborations. As one example, we have been able to construct mutants that represent the three major, discrete, ground-state isoforms observed in wild-type GFP in equilibrium. Our minimal, isosteric mutants have allowed us to characterize these equilibrium states in detail.

Deconvolution of GFP spectra Dissecting GFP spectra: These spectra show the absorption of wild-type Green Fluorescent Protein and its deconvolution into the protonated A-state, the deprotonated B-state and the deprotonated I-state. The latter differs from the B-state in that a conformational rearrangement that donates a hydrogen-bond to the charged, deprotonated phenolic oxygen of the fluorophore has not yet taken place. The deconvolution was made possible by the generation of three isosteric mutants of GFP, in which elements of the hydrogen-bonding network of the fluorophore were perturbed (T203V, E222Q and the corresponding double mutant). Being able to deconvolute the spectra into their underlying populated sub-states has allowed us to quantitate subtle changes of the fluorophore environment.


PH-domains have been chosen as platforms for a hybrid approach of rational and evolutionary engineering. Our goal was to rationally construct a framework for the display of peptide epitopes and then engineer large libraries of epitopes for evolutionary screening of proteins with novel functions. This is best constructed as a a bipartite architecture of stable frameworks and variable loop structures; one example of such an architecture is the Pleckstrin-Homology (PH) domain. Such a library was successfully constructed and validated in terms of expression and stability.To improve our frameworks, we have constructed circular-permuted domains with improved expression and secretion behavior. Ultimately we will use such synthetic epitope carriers as tools for functional proteomics - to modulate the activity of expressed proteins in vivo and provide effectors and interacting reagents for the purification and analysis of targets at the same time.

PH-Domain architecture PH-domain architecture: The bipartite architecture of PH domains (variable loops on structurally conserved framework) makes them well suited for epitope display. This schematic shows how the variable loops of the PH-domain (blue) align with the three major antigen-binding loops of an immunoglobulin domain (green). We propose that comparable structural variability can be realized with the PH-domain, which is only half as large as an Fv-fragment, is monomeric and requires no structural disulfide bridge for folding.

Future work

Future protein engineering work will develop some of the principles we have defined into applications. Applying our scaffold libraries to functional proteomics is one direction we will take, the other direction is to investigate the potential of applying our intrabodies in molecular medicine.

Molecular Systems Engineering

This is the term we give to an integration of our theoretical concepts, our protein engineering results and new ideas about constructing functional assemblies.

Biomolecular Nanoassemblies

Nanobiotechnology is an extension of our engineering interests to higher-order biomolecular assemblies. Biomolecules posess properties that make them attractive for nanotechnology. The can self-assemble into complex structures, a wide variety of functions is found in nature, they are comparatively easy to prepare and to modify and they can be optimized through evolutionary engineering. Their main disadvantage is the lack of recognized principles for the rational design of biomolecules - and especially biomolecular assemblies - with new, desired properties. To circumvent this problem, we have dissociated the aspects of "function" and "organization" in the construction of biomolecular nanoassemblies. Proteins as functional modules are spatially organized on a nanometer scale by site-specific binding to DNA-scaffold structures. First results have been obtained with a prototype of a modular biomolecular sensor constructed according to these principles.

Biomolecular nanoassembly A biomolecular nanoassembly: This concept sketch shows the spatial arrangement of a sensing IG domain / DNA binding protein fusion (gray) , its cognate DNA scaffold (blue) and a fluorophore effector module (purple). These modular components comprise a sensor, realized as a biomolecular nanomachine.

Future work

Since we have recently demonstrated the principle of our nanoassembly strategy, we are currently thinking about various options to build non-trivial applications.


publications Publications: Most of our publications are available as PDFs from our publications-page. If something is not there, please e-mail.


U of T