If you really want to understand how a biological pathway works, build one.

Our consulting firm Amber Biology was recently commissioned by a biotechnology company, to build a dynamic, agent-based model of the biological pathways that the company's scientists were studying as part of their research and development program. As I was presenting our model to the company's researchers, I could sense their anticipation at seeing, perhaps for the first time, the concerted activities of all of the genes, proteins and other molecules that they had been studying in their research. It's something of an epiphany to witness what was essentially a bunch of data points, suddenly organized and integrated into a biological system whose behavior is an emergent property of all that data.

With the model in hand, we were immediately able to do some very interesting, virtual "what-if" experiments with it. What if the activity of this regulatory protein was only at 50% of its level in normal, healthy cells?How would the response of this pathway be different in a cell where this gene was deficient or knocked out? Moreover, as the discussion developed around the results from the model, some of the less commonly considered functions of models (beyond their ability just to make predictions) started to come to the fore - the generation and testing of hypotheses, the conception and development of new laboratory experiments and the ability of the model itself to serve as a vehicle for communication and the discussion of ideas. Significant insights can be gained from having a working, dynamic model of the biological pathways that you are studying. Within the space of the (approximately) one hour that my presentation to this company lasted, we were able to demonstrate (among other things) the detailed chain of cause-and-effect by which a clinically-observed deficiency in a particular regulatory molecule can lead to a disease state; and also that the model correctly predicts a biomarker for this disease state derived from the relative levels of production of certain metabolites generated by the pathway.

From the perspective of providing answers to biological questions, the data that you need to build such a model are exactly that - just data. Without some kind of conceptual model with which to reason about them, biochemical parameters such as molar concentrations, kinetic rates for the binding, unbinding and enzymatic modification of molecules etc. do not really constitute "knowledge" at the biological level any more than measurements of the position of Mars constitute "knowledge" at the astronomical level. Framed within the context of a model however, these observations can become a foundation for reasoning about the system at a higher, biological level - moving the conversation from the microscopic domain of molecules, kinetic rates, affinities and so on - to considerations of cellular responses, biological dysfunction and disease states.

In this era of automated, digitized laboratory instruments, and data storage on the cloud, there is certainly no shortage of biological data - yet the investment by most life science research organizations in the kind of infrastructure or expertise needed to turn data into real knowledge, has certainly not kept pace with their ability to generate new data. To be perfectly clear - the data are valuable, but data alone seldom yield any kind of actionable insights that would for example, assist anyone trying to develop a therapeutic intervention for a biological system. Rather than being organized and integrated into conceptual models that could explain it, the real value of the data is often buried under mountains of well-meaning but fruitless data analytics and data visualization. For the kind of complex biological systems that life scientists study - the kind of systems that produce big data sets whose complexity mirrors their own - it’s a long shot to hope to draw much in the way of meaningful biological insight from some purely mathematical fitting, filtering, and visualization of such data.

Computational modeling is certainly not in the mainstream of life science research in the way that it is in other fields such as physics and engineering, and a lot of this probably has to do with the difficulty of overcoming the challenges posed by the sheer complexity of biological systems. The most commonly used dynamic modeling paradigms for modeling biological systems, such as ordinary differential equations (ODEs) for example, have largely been borrowed from these other fields where they have enjoyed great success. Their application to biology however is more problematic - principally because of their inability to adequately represent the aforementioned complexity of the systems that biologists would like to model, but also because of the disconnect between the causal descriptions that biologists typically use to describe the systems they study, and the more abstract (and opaque) idioms used to construct and query such models. Despite this however, more recent approaches that include agent and rule-based modeling platforms developed specifically for biology are starting to mature to the point at which we may soon witness a much greater level of adoption of computational modeling in the mainstream of life science research, and a much wider recognition of its role as an invaluable adjunct to laboratory-based research and development programs.

© 2018, The Digital Biologist