If you're a biological modeler, chances are there are two words that keep you up at night and that on occasion, might even have given you serious pause to question the wisdom of your choice of profession.
Those two words are combinatorial complexity.
For anyone not entirely familiar with this concept, imagine one of the simplest possible biomolecular systems, with a kinase K that can bind to and phosphorylate a substrate S at either of two positions a and b, as shown in this first diagram. Even this very simple system can produce 13 possible molecular species: K unbound, S unbound in one of its 4 phosphorylation states, K bound at a with S in one of its 4 states, K bound at b with S in one of its 4 states.
Taking the traditional biological modeling approach of using ordinary differential equations (ODEs), you would therefore have to write 13 rate equations to describe this system. So far so good.
But now let's add the phosphatase P that dephosphorylates the sites a and b on S. If we do a similar analysis of the possible molecular species for our new system, taking into account now, the possible bound and unbound states of P and K on S, we discover that the addition of this single agent P yields 21 new molecular species in addition to the 13 that we already had! Furthermore, since we are working with a model of interdependent ODEs, we will also need to rewrite our original set of rate equations.
If it takes all this work to describe what is almost the simplest imaginable kind of system, how many equations would we need for a real biological system? How many rate equations would we need to describe the canonical epidermal growth factor receptor (EGFR) signaling pathway for example?
Hold on to your hats ... drum roll ... somewhere north of 1030 equations.
Yikes!
All this said however, biological modelers have built models of complex cellular systems like the EGFR pathway, so how on earth have they done it? The answer is by simplifying the system, typically by either ignoring features that are presumed to minimally impact the system's behavior, or by aggregating features to create a less granular description of the system, again under the presumption that this will not significantly affect the model's behavior.
The danger inherent in such approaches is that they require a set of a priori hypotheses about what are and what are not, the important features of the system i.e. they require a decision about what aspects of the model will least affect its behavior before the model has ever been run.
The famous Uncertainty Principle that we all learned in high school physics states that it is impossible to simultaneously determine with any accuracy, both the position and the momentum of an electron. A recasting of this principle for traditional biological modeling might be
"Scope or resolution, but not both at the same time"
One could argue that whereas the original physical principle is absolute, in the case of biological modeling the limitation is one of technology - "If we had a big enough, fast enough computer ..." etc. Perhaps, but when you compare the storage and processing time required to solve a system of 1030 equations with the scale of our universe, the biological modeling version of the principle seems pretty darn absolute to me.
Did I hear someone say "quantum computer"?
Just let me know when they've built one that could address this problem and I will gladly publish an update :-)
The author Gordon Webster, has spent his career working at the intersection of biology and computation and specializes in computational approaches to life science research and development.
© The Digital Biologist | All Rights Reserved
I agree that combinatorial complexity forms a serious problem when modeling biological signal transduction pathways. At least, if you use ODEs as modeling framework. Rule-based models are an excellent method to circumvent combinatorial complexity.
However, I would like to add another two words that form even a bigger challenge for biological modelers: parameter values. The general lack of experimentally determined (in vivo), quantitative parameter values forms a serious problem when simulating and analyzing the dynamics of biological models, irrespective of the modeling framework (deterministic, stochastic, hybrid) and biological processes (signaling, metabolism, etc.). In fact, the network stochiometry, rate law definitions and parameter values are the 3 pillars that define the dynamics of a biological model. In practice, not all parameters can be determined experimentally, certainly not in vivo. Therefore modelers are forced to make an educated guess about one or more parameters or apply computationally intensive brute force methods.
Despite combinatorial complexity is a serious problem in signaling pathways, I personally believe that (the lack of) parameter values is a more general challenge when building and understanding biological models.
Posted by: Mark | Sunday, February 12, 2012 at 08:34 AM
I agree with the above comment. I'm new to modelling biological system and have already experienced building models that increase in size exponentially with control loops. The models reach critical mass and then end in frustration with no parameter values for any of the digital mess.
For me, this has happened when there's a weak or no underlying question behind the model, other than "I don't know whole the whole system works and I'm hoping my computer will tell me."
The truly elegant models are those such as the Hodgin-Huxley neuron model that condense the system down to the fundamental underlying question, and then drop out a single equation or 2 that replicates the whole thing.
This however, is the true mastery of the art!
Personally, I think we are seeing a failing in hypothesis driven science in favor of technology driven science.
I believe the most important question a modeller can ask is "what is the question?"
Posted by: Steve | Monday, May 28, 2012 at 05:28 PM