If you’re a biological modeler, chances are there are two words that keep you up at night and that on occasion, might even have given you serious pause to question the wisdom of your choice of profession.
Those two words are combinatorial complexity.
For anyone not entirely familiar with this concept, imagine one of the simplest possible biomolecular systems, with a kinase K that can bind to and phosphorylate a substrate S at either of two positions a and b, as shown in this first diagram. Even this very simple system can produce 13 possible molecular species: K unbound, S unbound in one of its 4 phosphorylation states, K bound at a with S in one of its 4 states, K bound at b with S in one of its 4 states.
Taking the traditional biological modeling approach of using ordinary differential equations (ODEs), you would therefore have to write 13 rate equations to describe this system. So far so good.
But now let’s add the phosphatase P that dephosphorylates the sites a and b on S. If we do a similar analysis of the possible molecular species for our new system, taking into account now, the possible bound and unbound states of P and K on S, we discover that the addition of this single agent P yields 21 new molecular species in addition to the 13 that we already had! Furthermore, since we are working with a model of interdependent ODEs, we will also need to rewrite our original set of rate equations.
If it takes all this work to describe what is almost the simplest imaginable kind of system, how many equations would we need for a real biological system? How many rate equations would we need to describe the canonical epidermal growth factor receptor (EGFR) signaling pathway for example?
Hold on to your hats … drum roll … somewhere north of 1030 equations.
All this said however, biological modelers have built models of complex cellular systems like the EGFR pathway, so how on earth have they done it? The answer is by simplifying the system, typically by either ignoring features that are presumed to minimally impact the system’s behavior, or by aggregating features to create a less granular description of the system, again under the presumption that this will not significantly affect the model’s behavior.
The danger inherent in such approaches is that they require a set of a priori hypotheses about what are and what are not, the important features of the system i.e. they require a decision about what aspects of the model will least affect its behavior before the model has ever been run.
The famous Uncertainty Principle that we all learned in high school physics states that it is impossible to simultaneously determine with any accuracy, both the position and the momentum of an electron. A recasting of this principle for traditional biological modeling might be
“Scope or resolution, but not both at the same time”
One could argue that whereas the original physical principle is absolute, in the case of biological modeling the limitation is one of technology – “If we had a big enough, fast enough computer …” etc. Perhaps, but when you compare the storage and processing time required to solve a system of 1030 equations with the scale of our universe, the biological modeling version of the principle seems pretty darn absolute to me.
Did I hear someone say “quantum computer”?
Just let me know when they’ve built one that could address this problem and I will gladly publish an update 🙂
© The Digital Biologist