Python For The Life Sciences: Update

After about a year of work and barring a few last-minute tweaks to the layout, it looks like we finally have our book Python For The Life Sciences, ready to publish. UPDATE 10/6/2016: THE BOOK IS NOW PUBLISHED AND AVAILABLE AT LEANPUB

I think it’s fair to say that we probably got a little carried away with the project, and the book has ended up somewhat larger than we thought it would be, coming in at over 300 pages. For sure it’s not quite the concise, slim volume, quick introduction to biocomputing with Python that I think we had envisaged in the beginning.

The book does however cover an incredible range of life science research topics from biochemistry and gene sequencing, to molecular mechanics and agent-based models of complex systems. We hope that there’s something in it for anybody who’s a life scientist with little or no computer programming experience, but who would love to learn to code.

For the latest news on the book, including all the free revisions and updates that are included with any purchase of the book, sign up for the (zero spam) Python For The Life Sciences Mailing List

© The Digital Biologist

Python For Handling 96-Well Plate Data and Automation

book-graphics.005There’s hardly a life science lab you can walk into these days, without seeing a ton of 96-well plates and instruments that read and handle them. That’s why we’ve dedicated an entire chapter of our forthcoming book Python For The Life Sciences, to the humble 96-well plate.

The chapter introduces the use of Python for handling laboratory assay plates of many different sizes and configurations. It shows the reader how to read plate assay data from files formatted as comma-separated values (CSV), how to implement basic row and column computations, how to plot multi-well plates with the wells color-coded by their properties, and even how to implement the high level code necessary for driving instruments and robots through devices like Arduinos.

And this is just one of about 20 chapters designed to introduce the life scientist who wants to learn how to code, to the wonderful and versatile Python programming language.

Almost all of the code and examples in the book are biology-based and in addition to teaching the Python programming language, the book aims to inspire the life scientist reader to bring the power of computation to his or her research, by demonstrating the application of Python using real-world examples from across a wide range of biological research disciplines.

The book includes code and examples covering next-generation sequencing, molecular modeling, biomarkers, systems biology, chemical kinetics, population dynamics, evolution and much more.

Python For The Life Sciences should be available as an eBook this fall (2016), so if you’re a life scientist interested in bringing a computational skill set to your research and your career, visit the book’s web page and sign up to our (no spam) mailing list for updates about the book’s progress and publication.

© The Digital Biologist

Big Data Does Not Equal Big Knowledge

Raf-Mek-Erk-IM-bordered.001

… the life science Big Data scene is largely Big Hype. This is not because the the data itself is not valuable, but rather because its real value is almost invariably buried under mountains of well-meaning but fruitless data analytics and data visualization. The fancy data dashboards that big pharmaceutical companies spend big bucks on for handling their big data, are for the most part, little more than eye candy whose colorful renderings convey an illusion of progress without the reality of it.

Read the full article on LinkedIn.

© The Digital Biologist

A sample chapter from our forthcoming book “Python For The Life Sciences”

chapter-comicsIn conjunction with my business partner Alex Lancaster, we are very excited for this early release of a sample chapter from our forthcoming book Python For The Life Sciences. This book is written primarily for life scientists with little or no experience writing computer code, who would like to develop enough programming knowledge to be able to create software and algorithms that they can use to advance or accelerate their own research. These are probably scientists who are currently using spreadsheets and calculators to handle their data, but who have probably promised themselves that at some point when the opportunity arises, they will learn to write code. If this pretty well describes your situation, then your wait is over and the opportunity is knocking. This could very well be just the book you have been waiting for!

In short, this is the book that would like to have read when we were learning computational biology.

The aim of this book

The aim of this book is to teach the working biologist enough Python that he or she can get started using this incredibly versatile programming language in their own research, whether in academia or in industry. It also aims to furnish a Python foundation upon which the biologist can build by extrapolating from the broad set of Python fundamentals that the book provides.

What this book is not

This book is not another comprehensive guide to the Python programming language, nor is it intended to be a Python language reference. There are already plenty of those out there, and easily accessible online. For this reason, you will find that there are many (many) aspects and areas of the Python language that are not covered. In a similar vein, this book is not intended to be a life science primer for programmers and computer scientists.

A tour of computational biology beyond bioinformatics

This book is all about using computational tools to jumpstart your biological imaginations. We will show the reader the range of quantitative biology questions that can be addressed using just one language from a range of life sciences. The examples are deliberately eclectic and cover bioinformatics, structural biology, systems biology to modeling cellular dynamics, ecology, evolution and artificial life.

Like a good tour, these biological examples were deliberately chosen to be simple enough not to impede the reader’s ability to assimilate the Python coding principles being presented – but at the same time each scientific problem illustrates a simple, yet powerful principle or idea. By covering a wide variety of examples from different parts of biology, we also hope that the reader can identify common features between different kinds of models and data and encounter unfamiliar, yet useful ideas and approaches. We provide pointers and references to other code, software, books and papers where they can explore each area in greater depth.

We believe that exploring biological data and biological systems should be fun! We want to take you from the nuts-and-bolts of writing Python code, to the cutting edge as quickly as possible, so that you can get up and running quickly on your own creative scientific projects.

The sample chapter shows how to use Python to mine and understand data from transcription factor networks and you can get it here.

© The Digital Biologist

Are you still using calculators and spreadsheets for research projects that would be much better tackled with computer code?

Coming Soon ...With my co-author Alex Lancaster, I am in the latter stages of writing a book “Python for the Life Sciences” which will be an accessible introduction to Python programming for biologists with no prior experience in coding, who would like to be able to bring a computational approach to their own research. In the book, we are trying to cover a broad enough array of biological application areas for the book to be a valuable guide and inspiration to life scientists in fields as diverse as NGS, Systems Biology, Genomics, Protein Engineering, Evolutionary Biology and so on.

It has been our experience that there are a lot of researchers working in diverse areas of biology currently using calculators and spreadsheets for their work, but who would be able to do a great deal more using a versatile scripting and programming language like Python.

If you are one of these people, we would really love to hear from you.

What are the life science research problems that you would tackle computationally, if you were able to use code?

You can reply to this request either directly in the comments (where your response will be publicly visible), or if you prefer to do so privately, offline via info@amberbiology.com

If your area of interest is something we have not covered in the chapters we have written already and we think it could be sufficiently interesting to the life science research community, we will endeavor to include it in the book, and also in the Python training sessions that we will be doing for researchers following the book’s release.

Many Thanks to all who take the time to respond.

© The Digital Biologist

The Central Role Of User Experience Design In Scientific Modeling

Big-Simulation-BorderedData is not knowledge.

Data can reveal relationships between events – correlations that may or  may not be causal in nature; but by itself, data explains nothing without some form of conceptual model with which it can be assimilated into an intellectual framework that allows one to reason about it.

Computational modeling is not in the mainstream of life science research in the way that it is in other fields such as physics and engineering. And while all scientific concepts are implicitly models, most biologists have had relatively little experience of the kind of explicit modeling that we’re talking about here. In fields like biology where exposure to computational models is more limited, there is a tendency to consider their utility largely in terms of their ability to make predictions – but what often gets overlooked is the fact that models also facilitate the communication and discussion of concepts by serving as cognitive frameworks for understanding them.

Next to the challenge of representing the sheer complexity of biological systems, this cognitive element of modeling may be the single biggest reason why modeling is not in the mainstream of the life sciences. Most biological models use idioms borrowed from other fields such as physics, where modeling is both more mature, and firmly in the mainstream of research.

For a model to be truly useful and meaningful in a particular field of intellectual activity, it needs to support the conceptual idioms by which ideas and knowledge are shared by those in the field.

In other words, it should be possible to put questions to the model that are couched in the conceptual idiom of the field, and to receive similarly structured answers. To the extent that this is not true of a model, there will be some degree of cognitive disconnect between the model and the user which will impede the meaningful interaction of the user with the model.

Nowhere can this be more clearly seen than in the field of software design. Software applications make extensive use of cognitive models in order to facilitate a meaningful and intuitive interaction with the user. As a very simple example – software that plays digital music reproduces the play, forward and reverse buttons that were common on physical media devices like cassette and VHS players. This is because almost everybody has the same expectations about how these interface components are to be used, based upon their prior experience with these devices. As an aside, it’s interesting to reflect on the fact that while the younger generation may see these interaction motifs everywhere in the user interfaces of software media players, many of them will never have seen the original devices whose mechanical interfaces inspired their design.

blog-graphics-2015.001

The psychology and design that determines these interactions with the objects and devices that we use, is such an important area of study that it has given rise to an entire field that is commonly referred to as User Experience (UX) or User Experience Design. UX lies at the intersection of psychology, design and engineering and is concerned with the way that humans interact with everything in the physical world from a sliding door to the instrument panel of an airliner – and of course, their analogs in the virtual world; web browsers, electronic books, photo editing software, online shopping carts and so on.

Affordances and signifiers are the currency of UX design, facilitating the interaction between the user and the object or software. If you consider an affordance as a means of interaction (like the handle on a door for example), signifiers are signs for the user that suggest how the affordances might work. To use our very simple door handle example – a handle that consists of a flat metal plate  on the door suggests that the door be pushed open. A handle consisting of a metal loop more strongly suggests that the door should be pulled open. For the purposes of illustration, this is just a very superficial and simple example of the kind of cognitive facilitation that effective UX design can support. By contrast, consider the role that UX design plays in highly complex, human-built systems whose interactions with the user are predicated on multiple and often interdependent conceptual models, each of enormous complexity in its own right. In some cases, a single, erroneous interaction with such a system might even destroy the system and/or lead to the loss of human life.

So what does all of this have to do with scientific modeling?

By facilitating a cognitive connection between the user and an object, a device or a piece of software, effective UX design makes the interaction easier, more intuitive and more meaningful. Insofar as a computational model is being used to develop a conceptual framework that explains data, effective UX design similarly facilitates the cognitive leap from data to knowledge.

To be very clear, what we’re discussing here is user experience writ large. It encompasses considerations of the user experience design for any software that a researcher might be using to implement a model, but also a great deal more besides. The conceptual model being used to describe a biological system has a user experience component in and of itself, that when it works, provides a cognitive handle by which the system being modeled can be understood.

In a non-computational approach to understanding the system for example, this might be manifest in something as simple as the ability to draw an explicative diagram of the system on a piece of paper. In biology, think for example of the kind of pathway diagrams that biologists often draw to explain cell signaling (there’s even one in this article). In physics, the Feynman diagram that is used to intuitively describe the behavior of subatomic particles, is a perfect example of a piece of brilliant user experience design that provides a cognitive handle on a complex conceptual model.

In the case then, where the conceptual model is being implemented on a computational platform – to the extent to which the conceptual model can be mapped to the software, areas of overlap between the user experience design of the model and of the software are inevitable and often even inextricable.

As we have already seen, a very common theme in the user experience design of software, is the replication of components of the physical world that create an intuitive and familiar framework for the user – think for example of the near universal adoption of conventions like files and folders in computer file-handling systems, borrowed directly from office environments that pre-date the use of computers. Such an approach can be a very useful tool for enhancing the user experience.

As the VP of Biology at a venture-funded software startup building a collaborative, cloud-computing platform to model complex biological pathways, a major part of my role in the company was to serve as the product manager for the software. In practice, this actually comprised two roles. The first was an internal role as the interface between the company’s biology team tasked with developing the applications for our product, and our software engineering team who were tasked with building the product. The second was an external-facing role as a product evangelist and the liaison between our company and the life science research community – the potential client base for whom we were building our product.

One component of our cloud-computing platform was an agent-based simulation module for modeling cell signaling pathways. The ‘players’ in these simulations were as you would expect, mostly proteins involved in cell signaling pathways – kinases, phosphatases etc. and any kind of phosphoprotein whose cellular activity is typically modulated by the kind of post-translational modification events that proteins like kinases and phosphatases mediate.

As a simulation proceeded on the cloud, it could be tracked by the user through a range of different visualizations in their web browser. One of these displayed the concentrations of the different molecular species present in the simulation, over time. This was initially presented as a graph like this:

Big-Simulation-Bordered-graph-crop

But if you think about the way that a biologist in the laboratory would do this experiment, this presentation of the results, while being information-rich, would not be what he or she was used to. The analogous lab experiment would probably involve sampling the reaction mixture at regular intervals and for example, running these aliquots as a time series on a gel to visualize their fluctuations over the course of the experiment.

My initial proposal that we add a visual element to the graph that reproduced what the biologist would see if they were to run the reaction mixture from a particular time point on a gel, was met with some degree of skepticism from the software engineers .

To be fair, it has to be said at this point that any good software engineering team (consisting of developers, business analysts, product managers etc.) always will (and should) set a high bar for the approval of new features in the code, especially where there is any kind of significant cost in time, money or resources required for their implementation. We were fortunate in our company, to have just such an excellent software engineering team and so their initial resistance to this idea was not wholly unexpected. The main argument against it was that it would not be an information-rich visual presentation of the simulation results in the way that the graph already was, and furthermore, that it was redundant since this information was already presented at a much higher resolution in the graph.

When however, in my capacity as external liaison with our potential client base,  I tested the response of the life science research community to a mock-up of this feature, the results were amazingly positive.

Big-Simulation-crop

We asked biologists who agreed to be interviewed, to compare the version of the simulation interface that contained only the graph, with a mock-up of an updated version (shown above), that also contained a simulated Western blot display with a time slider that could be moved across the graph to show what the Western blot gel would look like at each sampled time point.

Their responses were striking. What we heard most often from them (and I’m aggregating and paraphrasing the majority response here), was that the version of the interface with the Western blot display made a great deal more sense to them because it helped them to make the mental leap between the data being output from the model and what the model was actually telling them. Perhaps most importantly – in their minds it also reinforced the idea of the computational simulation as a virtual experiment whose results could help guide their decisions about which physical experiments to do in the lab.

Despite this new visualization not being information-rich as the software engineers had rightly pointed out – in its ability to frame the output from the simulation model in an idiom that was meaningful to the biologist,  it created a richer and deeper cognitive connection between the biologist-modeler and the biology that was being represented and explored in the model.

Recognizing that if modeling is ever to really become a part of the mainstream in life science research in the way that it is in physics, we took very seriously, the idea of doing biological modeling in an idiom that is appropriate for biology. This idea permeated every aspect of the development of our collaborative computational modeling platform, especially since it was also clear from our own product and market research, that biologists were no more willing to become mathematicians or computer scientists to use models in their own research, than people were willing to become mechanics in order to drive cars.

Signaling-CartoonTake a look for example, at this cartoon a biologist drew of a cell signaling pathway (thanks Russ). It illustrates perfectly the paradigm of an interconnected network of signaling proteins that is in essence, the consensus model in the biology community for how cell signaling works. At some level, it matters little that we cannot consider this to be a realistic, physical model of cell signaling since it implies the existence of static ‘biological circuits’ that in reality do not exist in the cell. In using this model however, biologists are not suggesting this at all. This model does a very good job of representing conceptually, the network of interactions that determine the functional properties of a cell signaling pathway.

There are some obvious intuitive benefits to this model (and many more very subtle ones). For example, if we were to try to trace the network edges from one protein (node) to another and discovered that they were not connected by any of the other proteins, we could infer that none of the states available to the first protein, could ever have an influence on the states of the second.

Raf-Mek-Erk-CMHere for comparison, is the analogous representation of that same cell signaling pathway, assembled on our cloud computing platform using a set of lexical rules that describe each of the ‘players’ and their interactions. Even the underlying semantic formalism that we used as as a kind of biological assembly language to represent the players (usually proteins) and their interactions, was couched in terms of a familiar and relatively small set of biological events (binding, unbinding, modification etc.) that are in themselves, sufficient to represent almost everything that happens in a cell at the level of its signaling pathways.

In summary then, insofar as computational models facilitate thinking and reasoning about the biological systems we study and collect data from, they can help us much more effectively if they allow us to work in the idioms that are familiar and appropriate to our field. This notion can be more fully grasped by considering its antithesis – the use of ordinary differential equations (ODEs) to model biological systems, which still tends to be the dominant paradigm for biological modeling despite being an exceedingly opaque, unintuitive and largely incompatible approach for modeling systems at a biological scale.

It is also clear that software developers need to work closely with experts who have specialized domain knowledge if they are to create computational modeling platforms that will not only be effective for their particular domain, but also widely adopted by its practitioners. In the case of biology, it was clear to us when we were developing our modeling platform, that its success would depend in no small part, on the appeal that it could make to the imagination and intuition of the biologist. With computational modeling as with software development, even the most meticulously crafted of tools will have little or no impact or utility in its field if a cognitively dissonant user experience results in it rarely or never being used.

© The Digital Biologist

Complex adaptive systems pioneer, John Holland passes away at 86

fitness-landscapeSad to see the passing of John Holland, one of the great thinkers in the field of complex adaptive systems. I greatly admired his work on evolutionary optimization and his unconventional approach. In the early 2000s I even published a research paper of my own, describing an evolutionary computational approach to the phase problem in x-ray crystallography, that was directly inspired by his work. He was a great scientist and a great communicator.

My colleague Alex Lancaster who was similarly inspired and influenced by Holland’s groundbreaking work, wrote a very nice piece to mark his passing, on his blog. You can read it here.

© The Digital Biologist

The art of deimmunizing therapeutic proteins

Antibody-EpitopesThe consideration of potential immunogenicity is an essential component in the development work flow of any protein molecule destined for use as a therapeutic in a clinical setting. If a patient develops an immune response to the molecule, in the best case scenario, the patient’s own antibodies can neutralize the drug, blunting or even completely ablating its therapeutic activity. In the worst case scenario, the immune response to the drug can endanger the health or even the life of the patient.

Thanks to the incredible molecular diversity that can be achieved by VDJ recombination in antibody-producing lymphocytes (B-cells), the antibody repertoire of even a single individual is so vast (as many as 1011 distinct antibodies for a single individual) that it is difficult to imagine ever being able to design all potential antibody (or B-Cell) epitopes out of a protein while still preserving its structure and function. There is however a chink in the antibody defense’s armor that can be successfully exploited to make therapeutic proteins less visible to the immune system – the presentation of antigens to T-cells by antigen-presenting cells (APCs), a critical first step in the development of an adaptive immune response to an antigen.

Protein antigens captured by antigen-presenting cells such as B-cells, are digested into peptide fragments that are subsequently presented on the cell surface as a complex of the peptide bound to a dual chain receptor coded for by the family of Major Histocompatibility Complex (MHC) Class II genes. If this peptide/MHC II complex is recognized by a T-cell antigen receptor of one of the population of circulating T- helper (Th) cells, the B-cell and its cognate Tcell will form a co-stimulatory complex that activates the B-cell, causing it to proliferate. Eventually, the continued presence of the B-cell antigen that was captured by the surface-bound antibody on the B-cell, will result not only in the proliferation of that particular B-cell clone, but also in the production of the free circulating form of the antibody (it should be noted that antibody responses to an antigen are typically polyclonal in nature, i.e. a family of cognate antibodies is generated against a specific antigen). It is through this stimulatory T-cell pathway that the initial detection of an antigen by the B-cell is escalated into a full antibody response to the antigen. Incidentally, one of the major mechanisms of self-tolerance by the immune system is also facilitated by this pathway via the suppression of T-cell clones that recognize self-antigens that are presented to the immune system during the course of its early development.

This T-Helper pathway is therefore a key process in mounting an antibody-based immune reponse to a protein antigen and while the repertoire of structural epitopes that can be recognized by B-cells is probably far too vast to practically design a viable therapeutic protein that is completely free of them, the repertoire of peptides that are recognized by the family of MHC Class II receptors and presented to T-cells (T-cell epitopes), while still considerable in scope, is orders of magnitude smaller than the set of potential B-cell epitopes.

So as designers of therapeutic proteins and antibodies, how can we take advantage of this immunological “short-cut”, to make our molecules more “stealthy” with regard to our patient’s immune system?

mhcThe solution lies in remodeling any non-self peptide sequences within our molecules, that are determined to have a significant binding affinity for the MHC Class II receptors. The two chains of an MHC Class II receptor form a binding cleft on the surface of an APC into which peptide sequences of approximately 9 amino acids can fit. The ends of the cleft are actually open, so longer peptides can be bound, but the binding cleft itself is only long enough to sample about 9 amino acid side chains. It is this cleft with the bound peptide that is presented on the surface of an APC for recognition by T-cells.

The genetic evolution of MHC Class II alleles in humans is such that there are about 50 very common alleles that account for more than 90% of all the MHC Class II receptors found in the human population. There are of course, many more alleles in the entire human population, but they become ever rarer as you go down the list from the 50 most common ones, with some of the rarer alleles being entirely confined to very specific populations and ethnicities. What this means for us as engineers of therapeutic proteins is that if we can predict potential T-cell epitopes for the 50 or so most common MHC Class II alleles, we can predict the likelihood of a given peptide sequence being immunogenic for the vast majority of the human population.

It actually turns out that some researchers have published experimental peptide binding data for the 50 most common MHC Class II alleles and their results are very encouraging for the would-be immuno-engineer. The peptide binding motif of the MHC II receptor essentially consists of 9 pockets, each of which has a variable binding affinity across the 20 amino acid side chains that is independent of the side chains bound in the other 8 pockets. This last property is of particular importance because it means that we can calculate the relative MHC II binding affinity for any particular 9-mer peptide by the simple summation of the discrete binding pocket/side chain affinities, rather than having to consider the vast combinatorial space of binding affinities that would be possible if the amino acid binding affinity of each pocket was dependent upon the side chains bound in the other 8 pockets.

This is the point at which a computer and some clever software can be enormously helpful. While I was employed at a major biotechnology company, I created software that could use a library of this kind of MHC II peptide affinity data, in order to scan the peptide sequences of protein drugs and antibodies that we were developing for the clinic. The software not only predicted the regions of the peptide sequence containing potential T-Cell epitopes, but it also used other structural and bioinformatics algorithms to help the scientist to successfully re-engineer the molecule to reduce its immunogenicity while preserving its structure and function.

This last phrase explains why I used the word “art” in the title of this article.

What we learned from experience was that while it is relatively easy to predict T-cell epitopes in a peptide sequence, reengineering the sequences while preserving the structure and function of the protein is the much greater challenge.

Based upon this experience, it was no surprise to me that the great majority of the thousands of lines of Java code that I wrote developing our deimmunization software, was dedicated to functionality that guided the scientist in selecting amino acid substitutions that would have the highest probability of preserving the structure and function of the protein. Even with this software however, the essential elements in this process were still the eyes and brain of the scientist, guided by training and experience in protein structure and biochemistry.

In other words, the art and craft of the experienced protein engineer.

Much like the old joke “My car is an automatic but I still have to be there” – the software could not substitute for the knowledge and experience of a skilled protein engineer, but it could make her life a lot easier by suggesting amino acid substitutions with a high probability of being structurally and functionally conservative; and by keeping track of all the changes and their impact upon the sequence and structure.

The software really showed its value in the improvement it brought to our success rate in converting our computational designs to successful molecules in the laboratory. For any given project with a new biologic, we would typically design a bunch of variants to be tested in the lab, of which one or two might have all the properties we were shooting for. Once we  started using the software, there was a noticeable increase in the proportion of our designs that tested well in the lab, compared to previously. This was interesting to me insofar as it showed that while the software could not replace the scientist’s knowledge and experience, it could certainly enhance and augment its application to the problem at hand – probably by keeping track of the many moving parts in the deimmunization process, so that the scientist is free to think more carefully about the actual science.

In spite of all this technological support however, a successful deimmunization depends heavily upon skill and experience in protein engineering, and there’s arguably still as much art in successfully re-engineering T-cell epitopes as there is science in predicting them.

© The Digital Biologist | All Rights Reserved

The limitations of deterministic modeling in biology

newtonOver the three centuries that have elapsed since its invention, calculus has become the lingua franca for describing dynamic systems mathematically. Across a vast array of applications from the behavior of a suspension bridge under load to the orbit of a communications satellite, calculus has often been the intellectual foundation upon which our science and technology have advanced. It was only natural therefore that researchers in the relatively new field of biology would eagerly embrace an approach that has yielded such transformative advances in other fields. The application of calculus to the deterministic modeling of biological systems however, can be problematic for a number of different reasons.

The distinction typically drawn between biology and fields such as chemistry and physics is that biology is the study of living systems whereas the objects of interest to the chemist and the physicist are “dead” matter. This distinction is far from a clear one to say the least, especially when you consider that much of modern biological research is very much concerned with these “dead” objects from which living systems are constructed. In this light then, the term “molecular biology” could almost be considered an oxymoron.

From the perspective of the modern biologist, what demands to be understood are the emergent properties of the myriad interactions of all this “dead” matter, at that higher level of abstraction that can truly be called “biology”. Under the hood as it were, molecules diffuse, collide and react in a dance whose choreography can be described by the rules of physics and chemistry – but the living systems that are composed of these molecules, adapt, reproduce and respond to their ever changing environments in staggeringly subtle, complex and orchestrated ways, many of which we have barely even begun to understand.

The dilemma for the biologist is that the kind of deterministic models applied to such great effect in other fields, are often a very poor description of the biological system being studied, particularly when it is “biological” insights that are being sought. On the other hand, after three centuries of application across almost every conceivable domain of dynamic analysis, calculus is a tried and tested tool whose analytical power is hard to ignore.

One of the challenges confronting biologists in their attempts to model the dynamic behaviors of biological systems is having too many moving parts to describe. This problem arises from the combinatorial explosion of species and states that confounds the mathematical biologist’s attempts to use differential equations to model anything but the simplest of cell signaling or metabolic networks. The two equally unappealing options available under these circumstances are to build a very descriptive model of one tiny corner of the system being studied, or to build a very low resolution model of the larger system, replete with simplifying assumptions, approximations and even wholesale omissions. I have previously characterized this situation as an Uncertainty Principle for Traditional Mathematical Approaches to Biological Modeling in which you are able to have scope or resolution, but not both at the same time.

There is a potentially even greater danger in this scenario that poses a threat to the scientific integrity of the modeling study itself, since decisions about what to simplify, aggregate or omit from the model to make its assembly and execution tractable, require a set of a priori hypotheses about what are and what are not, the important features of the system i.e. it requires a decision about what subset of components of the model will determine its overall behavior before the model has ever been run.

Interestingly, there is also a potential upside to this rather glum scenario. The behavior of a model with such omissions and/or simplifications may fail even to approximate our observations in the laboratory. This could be evidence that the features of the model we omitted or simplified might actually be far more important to its behavior than we initially suspected. Of course, they might not be and the disconnect between the model and the observations may be entirely due to other flaws in the model, in its underlying assumptions,or even in the observations themselves. The point worth noting here however, and one that many researchers with minimal exposure to modeling often fail to appreciate, is that even incomplete or “wrong” models can be very useful and illuminating.

The role of stochasticity and noise in the function of biological systems is an area that is only just starting to be widely recognized and explored, and it is an aspect of biology that is not captured using the kind of modeling approaches based upon the bulk properties of the system, that we are discussing here. A certain degree of noise is a characteristic of almost any complex system and there is a tendency to think of it only in terms of its nuisance value – like the kind of unwanted noise that reduces the fidelity of an audio reproduction or a cellphone conversation. There is however evidence that biological noise might even be useful to organisms under certain circumstances – even something that can be exploited to confer an evolutionary advantage.

The low copy number of certain genes for example, will yield noisy expression patterns in which the fluctuations in the rates of gene expression are significant relative to the overall gene expression “signal”. Certain microorganisms under conditions of biological stress, can exploit the stochasticity inherent in the expression of stress-related genes with low copy numbers, to essentially subdivide into smaller, micro-populations differentiated by their stress responses. This is a form of hedging strategy that spreads and thereby mitigates risk, not unlike the kind of strategy used in the world of finance to reduce the risk of major losses in an investment portfolio.

In spite of all these shortcomings, I certainly don’t want to leave anybody with the impression that deterministic modeling is a poor approach, or that it has no place in computational biology. In the right context, it is an extremely powerful and useful approach to understanding the dynamic behaviors of systems. It is however important to recognize that many of the emergent properties (like adaptation for example) of the physico-chemical systems that organisms are at a fundamental level – where on the conceptual spectrum from physics through chemistry to biology we are positioned much further towards the biology end – are often ill-suited to analysis using the deterministic modeling approach. Given the nature of this intellectual roadblock, it may well be time for computational biologists to consider looking beyond differential equations as their modeling tool of choice, and developing new approaches for biological modeling, better suited to the task at hand.

© The Digital Biologist | All Rights Reserved

Ten Simple Rules for Effective Computational Research

Big-Influence-Map-BorderedI have written at some length about what I feel is necessary to make computational modeling really practical and useful in the life sciences. In the article Biologists Flirt With Models for example, that appeared in Drug Discovery World in 2009,  and in the light-hearted video that I made for the Google Sci Foo Conference, I have argued for computational models that can be encoded in the kind of language that biologists themselves use to describe their systems of interest, and which deliver their results in a similarly intuitive fashion. It is clear that the great majority of biologists are interested in asking biological questions rather than solving theoretical problems in the field of computer science.

Similarly, it is important that these models can translate data (of which we typically have an abundance) into real knowledge (for which we are almost invariably starving). If Big Data is to live up to its big hype, it will need to deliver “Big Knowledge”, preferably in the form of actionable insights that can be tested in the laboratory. Beyond their ability to translate data into knowledge, models are also excellent vehicles for the collaborative exchange and communication of scientific ideas.

With this in mind, it is really gratifying to see researchers in the field of computational biology, reaching out to the mainstream life science research community in an effort to address these kinds of issues, as in this article “Ten Simple Rules for Effective Computational Research” that appeared in a recent issue of PLOS Computational Biology. The 10 simple rules presented in the article touch upon many of the issues that we have discussed here, although they are for the most part, much easier said than done. Rule 3. in the article for example “Make Your Code Understandable to Others (and Yourself)” is something of a doozy that may ultimately require biologists to abandon the traditional mathematical approaches borrowed from other fields and create their own computational languages for describing living systems.

To be fair to the authors of the article however, recognizing that there is a problem is an invaluable first step in dealing with it, even if you don’t yet have a ready solution – and for that I salute them.

Postscript: Very much on topic, this article about the challenges facing the “Big Data” approach subsequently appeared in Wired on April 11th.

© The Digital Biologist | All Rights Reserved