What Innovation Is Not

myths-of-innovationInnovation is seen to be something as American as apple pie, that everybody from the US President on down is talking about. From university presidents and corporate leaders to Silicon Valley tycoons, all agree that we need more of it. Against this background of hype, my colleague and fellow scientist Alex Lancaster finds that Scott Berkun’s book “The Myths of Innovation” is a refreshing and unpretentious take on this overused buzzword.

© The Digital Biologist

Complex adaptive systems pioneer, John Holland passes away at 86

fitness-landscapeSad to see the passing of John Holland, one of the great thinkers in the field of complex adaptive systems. I greatly admired his work on evolutionary optimization and his unconventional approach. In the early 2000s I even published a research paper of my own, describing an evolutionary computational approach to the phase problem in x-ray crystallography, that was directly inspired by his work. He was a great scientist and a great communicator.

My colleague Alex Lancaster who was similarly inspired and influenced by Holland’s groundbreaking work, wrote a very nice piece to mark his passing, on his blog. You can read it here.

© The Digital Biologist

The Future Of Research

forThese are difficult times for researchers. In inflation-adjusted terms, research funding is actually down compared to recent years and everybody is talking about the apparent surplus of researchers being produced by graduate school and post-doctoral training programs. If you care about the future of research, the Symposium on the Future of Research will interest you.

This event is exceptional insofar as it is actually being organized and run by the people that are at the heart of this crisis – the postdocs themselves. In this respect, attendees should expect to hear a range of insights and perspectives on this crisis that is much more wide-ranging than those of the scientific establishment voices that we are more typically used to hearing in the media. It’s great to see a group of Boston area postdocs from several of the region’s excellent schools, taking matters into their own hands.

So if you care about the future of research, would like to hear from those who are on the front line of this issue and even add your own voice to the conversation, you should register for the symposium that will be held at the Boston University campus on October 2nd and 3rd, 2014.

 © The Digital Biologist | All Rights Reserved

The art of deimmunizing therapeutic proteins

Antibody-EpitopesThe consideration of potential immunogenicity is an essential component in the development work flow of any protein molecule destined for use as a therapeutic in a clinical setting. If a patient develops an immune response to the molecule, in the best case scenario, the patient’s own antibodies can neutralize the drug, blunting or even completely ablating its therapeutic activity. In the worst case scenario, the immune response to the drug can endanger the health or even the life of the patient.

Thanks to the incredible molecular diversity that can be achieved by VDJ recombination in antibody-producing lymphocytes (B-cells), the antibody repertoire of even a single individual is so vast (as many as 1011 distinct antibodies for a single individual) that it is difficult to imagine ever being able to design all potential antibody (or B-Cell) epitopes out of a protein while still preserving its structure and function. There is however a chink in the antibody defense’s armor that can be successfully exploited to make therapeutic proteins less visible to the immune system – the presentation of antigens to T-cells by antigen-presenting cells (APCs), a critical first step in the development of an adaptive immune response to an antigen.

Protein antigens captured by antigen-presenting cells such as B-cells, are digested into peptide fragments that are subsequently presented on the cell surface as a complex of the peptide bound to a dual chain receptor coded for by the family of Major Histocompatibility Complex (MHC) Class II genes. If this peptide/MHC II complex is recognized by a T-cell antigen receptor of one of the population of circulating T- helper (Th) cells, the B-cell and its cognate Tcell will form a co-stimulatory complex that activates the B-cell, causing it to proliferate. Eventually, the continued presence of the B-cell antigen that was captured by the surface-bound antibody on the B-cell, will result not only in the proliferation of that particular B-cell clone, but also in the production of the free circulating form of the antibody (it should be noted that antibody responses to an antigen are typically polyclonal in nature, i.e. a family of cognate antibodies is generated against a specific antigen). It is through this stimulatory T-cell pathway that the initial detection of an antigen by the B-cell is escalated into a full antibody response to the antigen. Incidentally, one of the major mechanisms of self-tolerance by the immune system is also facilitated by this pathway via the suppression of T-cell clones that recognize self-antigens that are presented to the immune system during the course of its early development.

This T-Helper pathway is therefore a key process in mounting an antibody-based immune reponse to a protein antigen and while the repertoire of structural epitopes that can be recognized by B-cells is probably far too vast to practically design a viable therapeutic protein that is completely free of them, the repertoire of peptides that are recognized by the family of MHC Class II receptors and presented to T-cells (T-cell epitopes), while still considerable in scope, is orders of magnitude smaller than the set of potential B-cell epitopes.

So as designers of therapeutic proteins and antibodies, how can we take advantage of this immunological “short-cut”, to make our molecules more “stealthy” with regard to our patient’s immune system?

mhcThe solution lies in remodeling any peptide sequences within our molecules, that are determined to have a significant binding affinity for the MHC Class II receptors. The two chains of an MHC Class II receptor form a binding cleft on the surface of an APC into which peptide sequences of approximately 9 amino acids can fit. The ends of the cleft are actually open, so longer peptides can be bound, but the binding cleft itself is only long enough to sample about 9 amino acid side chains. It is this cleft with the bound peptide that is presented on the surface of an APC for recognition by T-cells.

The genetic evolution of MHC Class II alleles in humans is such that there are about 50 very common alleles that account for more than 90% of all the MHC Class II receptors found in the human population. There are of course, many more alleles in the entire human population, but they become ever rarer as you go down the list from the 50 most common ones, with some of the rarer alleles being entirely confined to very specific populations and ethnicities. What this means for us as engineers of therapeutic proteins is that if we can predict potential T-cell epitopes for the 50 or so most common MHC Class II alleles, we can predict the likelihood of a given peptide sequence being immunogenic for the vast majority of the human population.

It actually turns out that some researchers have published experimental peptide binding data for the 50 most common MHC Class II alleles and their results are very encouraging for the would-be immuno-engineer. The peptide binding motif of the MHC II receptor essentially consists of 9 pockets, each of which has a variable binding affinity across the 20 amino acid side chains that is independent of the side chains bound in the other 8 pockets. This last property is of particular importance because it means that we can calculate the relative MHC II binding affinity for any particular 9-mer peptide by the simple summation of the discrete binding pocket/side chain affinities, rather than having to consider the vast combinatorial space of binding affinities that would be possible if the amino acid binding affinity of each pocket was dependent upon the side chains bound in the other 8 pockets.

This is the point at which a computer and some clever software can be enormously helpful. While I was employed at a major biotechnology company, I created software that could use a library of this kind of MHC II peptide affinity data, in order to scan the peptide sequences of protein drugs and antibodies that we were developing for the clinic. The software not only predicted the regions of the peptide sequence containing potential T-Cell epitopes, but it also used other structural and bioinformatics algorithms to help the scientist to successfully re-engineer the molecule to reduce its immunogenicity while preserving its structure and function.

This last phrase explains why I used the word “art” in the title of this article.

What we learned from experience was that while it is relatively easy to predict T-cell epitopes in a peptide sequence, reengineering the sequences while preserving the structure and function of the protein is the much greater challenge.

Based upon this experience, it was no surprise to me that the great majority of the thousands of lines of Java code that I wrote developing our deimmunization software, was dedicated to functionality that guided the scientist in selecting amino acid substitutions that would have the highest probability of preserving the structure and function of the protein. Even with this software however, the essential elements in this process were still the eyes and brain of the scientist, guided by training and experience in protein structure and biochemistry.

In other words, the art and craft of the experienced protein engineer.

Much like the old joke “My car is an automatic but I still have to be there” – the software could not substitute for the knowledge and experience of a skilled protein engineer, but it could make her life a lot easier by suggesting amino acid substitutions with a high probability of being structurally and functionally conservative; and by keeping track of all the changes and their impact upon the sequence and structure.

The software really showed its value in the improvement it brought to our success rate in converting our computational designs to successful molecules in the laboratory. For any given project with a new biologic, we would typically design a bunch of variants to be tested in the lab, of which one or two might have all the properties we were shooting for. Once we  started using the software, there was a noticeable increase in the proportion of our designs that tested well in the lab, compared to previously. This was interesting to me insofar as it showed that while the software could not replace the scientist’s knowledge and experience, it could certainly enhance and augment its application to the problem at hand – probably by keeping track of the many moving parts in the deimmunization process, so that the scientist is free to think more carefully about the actual science.

In spite of all this technological support however, a successful deimmunization depends heavily upon skill and experience in protein engineering, and there’s arguably still as much art in successfully re-engineering T-cell epitopes as there is science in predicting them.

© The Digital Biologist | All Rights Reserved

Ebola: The next big frontier for protease inhibitor therapies?

Ebola_virusWhile I was on vacation this summer, the news was full of stories about the Ebola Virus outbreak in Africa and the health workers who had contracted the virus through working with the infected population there. Then on the heels of all of this, comes a very timely paper High Content Image-Based Screening of a Protease Inhibitor Library Reveals Compounds Broadly Active against Rift Valley Fever Virus and Other Highly Pathogenic RNA Viruses in the journal PLOS Neglected Tropical Diseases.

While the primary pathogen tested in the article is the Rift Valley Fever Virus, the library of protease inhibitors was also screened for efficacy against Ebola Virus and a range of other related RNA viruses, and shown to have activity against those pathogens as well.

Given the incredible successes we have seen with the use of protease inhibitors in other virally induced diseases like HIV/AIDS, it is tempting to wonder whether there might be a similarly promisng new medical frontier for protease inhibitors in the treatment of these extremely dangerous viral hemorrhagic fevers.

Interestingly, most of the compound screening for these kinds of antiviral therapies in the last couple of years, has been focused upon signaling molecules like kinases, phosphatases and G-Protein Coupled Receptors (GPCRs). The use of protease inhibitors as antiviral compounds therefore, represents something of a departure from the mainstream in this research field. The authors of the current study however, felt that the success of protease inhibitors in the treatment of other diseases, were grounds to merit a study of their efficacy against RNA viruses.

When I think about all of the people who are still alive today thanks to the use of protease inhibitors to control their HIV/AIDS, the early signs of a similarly efficacious class of compounds for treating hemorrhagic fevers, described in this new research article, definitely give one hope for the prospect of a future in which there are successful therapies to treat deadly (and really scary) diseases like Ebola.

© The Digital Biologist | All Rights Reserved

An excellent R&D team is more like a jazz ensemble than an orchestra

gw-sample-20090607203856-bw-1As digital biologists, the world of research and development is for many of us, our professional arena. Here is an article I recently published on LinkedIn in which I talk about the “round hole” that managing an R&D program presents to the “square peg” that is the traditional management model typically applied to this problem, particularly in some of the larger R&D organizations.

© The Digital Biologist | All Rights Reserved

The limitations of deterministic modeling in biology

newtonOver the three centuries that have elapsed since its invention, calculus has become the lingua franca for describing dynamic systems mathematically. Across a vast array of applications from the behavior of a suspension bridge under load to the orbit of a communications satellite, calculus has often been the intellectual foundation upon which our science and technology have advanced. It was only natural therefore that researchers in the relatively new field of biology would eagerly embrace an approach that has yielded such transformative advances in other fields. The application of calculus to the deterministic modeling of biological systems however, can be problematic for a number of different reasons.

The distinction typically drawn between biology and fields such as chemistry and physics is that biology is the study of living systems whereas the objects of interest to the chemist and the physicist are “dead” matter. This distinction is far from a clear one to say the least, especially when you consider that much of modern biological research is very much concerned with these “dead” objects from which living systems are constructed. In this light then, the term “molecular biology” could almost be considered an oxymoron.

From the perspective of the modern biologist, what demands to be understood are the emergent properties of the myriad interactions of all this “dead” matter, at that higher level of abstraction that can truly be called “biology”. Under the hood as it were, molecules diffuse, collide and react in a dance whose choreography can be described by the rules of physics and chemistry – but the living systems that are composed of these molecules, adapt, reproduce and respond to their ever changing environments in staggeringly subtle, complex and orchestrated ways, many of which we have barely even begun to understand.

The dilemma for the biologist is that the kind of deterministic models applied to such great effect in other fields, are often a very poor description of the biological system being studied, particularly when it is “biological” insights that are being sought. On the other hand, after three centuries of application across almost every conceivable domain of dynamic analysis, calculus is a tried and tested tool whose analytical power is hard to ignore.

One of the challenges confronting biologists in their attempts to model the dynamic behaviors of biological systems is having too many moving parts to describe. This problem arises from the combinatorial explosion of species and states that confounds the mathematical biologist’s attempts to use differential equations to model anything but the simplest of cell signaling or metabolic networks. The two equally unappealing options available under these circumstances are to build a very descriptive model of one tiny corner of the system being studied, or to build a very low resolution model of the larger system, replete with simplifying assumptions, approximations and even wholesale omissions. I have previously characterized this situation as an Uncertainty Principle for Traditional Mathematical Approaches to Biological Modeling in which you are able to have scope or resolution, but not both at the same time.

There is a potentially even greater danger in this scenario that poses a threat to the scientific integrity of the modeling study itself, since decisions about what to simplify, aggregate or omit from the model to make its assembly and execution tractable, require a set of a priori hypotheses about what are and what are not, the important features of the system i.e. it requires a decision about what subset of components of the model will determine its overall behavior before the model has ever been run.

Interestingly, there is also a potential upside to this rather glum scenario. The behavior of a model with such omissions and/or simplifications may fail even to approximate our observations in the laboratory. This could be evidence that the features of the model we omitted or simplified might actually be far more important to its behavior than we initially suspected. Of course, they might not be and the disconnect between the model and the observations may be entirely due to other flaws in the model, in its underlying assumptions,or even in the observations themselves. The point worth noting here however, and one that many researchers with minimal exposure to modeling often fail to appreciate, is that even incomplete or “wrong” models can be very useful and illuminating.

The role of stochasticity and noise in the function of biological systems is an area that is only just starting to be widely recognized and explored, and it is an aspect of biology that is not captured using the kind of modeling approaches based upon the bulk properties of the system, that we are discussing here. A certain degree of noise is a characteristic of almost any complex system and there is a tendency to think of it only in terms of its nuisance value – like the kind of unwanted noise that reduces the fidelity of an audio reproduction or a cellphone conversation. There is however evidence that biological noise might even be useful to organisms under certain circumstances – even something that can be exploited to confer an evolutionary advantage.

The low copy number of certain genes for example, will yield noisy expression patterns in which the fluctuations in the rates of gene expression are significant relative to the overall gene expression “signal”. Certain microorganisms under conditions of biological stress, can exploit the stochasticity inherent in the expression of stress-related genes with low copy numbers, to essentially subdivide into smaller, micro-populations differentiated by their stress responses. This is a form of hedging strategy that spreads and thereby mitigates risk, not unlike the kind of strategy used in the world of finance to reduce the risk of major losses in an investment portfolio.

In spite of all these shortcomings, I certainly don’t want to leave anybody with the impression that deterministic modeling is a poor approach, or that it has no place in computational biology. In the right context, it is an extremely powerful and useful approach to understanding the dynamic behaviors of systems. It is however important to recognize that many of the emergent properties (like adaptation for example) of the physico-chemical systems that organisms are at a fundamental level – where on the conceptual spectrum from physics through chemistry to biology we are positioned much further towards the biology end – are often ill-suited to analysis using the deterministic modeling approach. Given the nature of this intellectual roadblock, it may well be time for computational biologists to consider looking beyond differential equations as their modeling tool of choice, and developing new approaches for biological modeling, better suited to the task at hand.

© The Digital Biologist | All Rights Reserved

Creation of a bacterial cell controlled by a chemically synthesized genome

A new research article by Gibson et al that appeared in Science this month, describes if not quite a wholly synthetic organism, at least a big step towards one.

Here is the author’s own abstract.

We report the design, synthesis, and assembly of the 1.08-mega-base pair Mycoplasma mycoides JCVI-syn1.0 genome starting from digitized genome sequence information and its transplantation into a M. capricolum recipient cell to create new M. mycoides cells that are controlled only by the synthetic chromosome. The only DNA in the cells is the designed synthetic DNA sequence, including “watermark” sequences and other designed gene deletions and polymorphisms, and mutations acquired during the building process. The new cells have expected phenotypic properties and are capable of continuous self-replication.

Inspires me to read some Aldous Huxley again 🙂

© The Digital Biologist | All Rights Reserved

Not yet as popular as Grumpy Cat but …

Usage-Q1-2014… in the first quarter of 2014, “The Digital Biologist” was read in 70 countries around the world. So while it’s very unlikely that the high octane, rollercoaster world of computational biology will ever have the pulling power of internet memes like a comically sour-faced kitty or the lyrical stylings of Justin Bieber, what we lack in numbers we certainly make up for in geographical diversity. Can Grumpy Cat or Justin Bieber claim a following in Liechtenstein for example? Neither can we actually, but we can make that claim for Sierra Leone (if you can really call a single reader a “following” – thank you mysterious Sierra Leone Ranger, whoever you are :-).

Anyway, wherever you are visiting from, thank you all so much for your readership and don’t forget that you can also join the conversation via the LinkedIn Digital Biology Group and at our FaceBook page.

 © The Digital Biologist | All Rights Reserved