Programming biology

A few years back, I was fortunate enough to be hired as VP of Biology at Plectix BioSystems, an exciting start-up company that was funded by a couple of West coast venture capital firms. We were developing an approach to modeling complex cellular signaling and metabolic pathways, based upon a modeling language called Kappa. Kappa has a very sparse and intuitive syntax that is both machine and human readable, whilst also adhering to a strict semantic formalism that allows the modeler to compute many interesting properties of the modeled system. For example, even in a single cell signaling pathway like EGFR, there are upwards of 1030 possible states in the system, where a "state" is defined as a molecular species that may involve individual or complexed molecules, all of which potentially carry any number of post-translational modifications. With traditional ODE-based modeling, it is impossible to model such systems in fine detail due to this combinatorial explosion of molecular species and the requirement for a rate equation for  every possible molecular species in the model. Even if you can build such a a model by other means (such as in Kappa), you might run an awful lot of simulations and still never observe a state that you are interested in, leading you to wonder whether that state is unreachable in the model you have constructed, or you simply have not adequately sampled the model's astronomically vast state space in your simulations. The semantic formalism of languages like Kappa however, allows the modeler to do the kind of static analysis that computer language compilers perform, to get ready analytical answers to questions like "Is state x possible in this system?", without the need for eons of simulation.

There are however some drawbacks with languages like Kappa - the biologist effectively needs to learn a new programming language that is very highly specialized for the task at hand and therefore not easy to interface with the other kinds of analyses of the model and its results, that one would generally like to do on the computer. For example, after running simulations, one would generally want to graph the results and perform some kind of statistical analysis of them. Something we considered at Plectix BioSystems, was to create an Application Programming Interface (API) that would allow Kappa to be interfaced with other popular programming languages like Python or Java. We never got as far as doing this however, before the rug got pulled out from under our feet by the big economic crash of 2007-2008 that left our VCs scrambling to reduce their financial exposure.

Recently however, a collaboration of academic groups at Vanderbilt and Harvard universities have created just such an API for Kappa and the closely-related BioNetGen Language to interface with Python. The new API is called PySB and it offers the biological modeler access to the vast array of libraries and applications that are available in Python, particularly in the areas of science and mathematics which are very well represented within the Python community. An additional advantage to this approach is that it facilitates biological modeling using the Object Oriented Programming (OOP) paradigm that has proved so successful in enabling computer programmers to write more robust and accessible code, and to better manage the complexity of very large applications and code libraries. Kappa and BNGL are both in their own way, attempts by biologists to manage the vast complexity of the cell signaling and metabolic systems that they are striving to understand, and the ability to frame the myriad components of these complex systems in terms of OOP objects and classes could prove to be extremely valuable. Just as the OOP paradigm has helped programmers to manage bigger and more complex code libraries and to collaborate more seamlessly on their design and construction, one might hope that an OOP approach to modeling biological systems might prove to be similarly enabling for the biological modeler.

Speaking as somebody who has been personally involved in the evolution of these biological modeling approaches, it makes me very happy and encouraged to see this kind of approach being pursued. If modeling is ever to be a part of mainstream research in the life sciences, in the way that it is in physics and engineering, the tools that we develop for it will need to be more flexible, accessible and robust than they are now. PySB represents in my opinion, an important and valuable step in this direction.

You can read about the development of PySB in this article that was published recently in Molecular Systems Biology by the collaborating teams.

  © The Digital Biologist | All Rights Reserved