Tutorial: How to build an agent-based model of a cell signaling network


If you're a life science researcher studying a complex biological system such as a cell signaling network, either as part of an academic research project, or with a view to the commercial development of a new therapeutic or diagnostic platform - there are enormous benefits to having a dynamic model of the system that you are studying as an adjunct to your laboratory-based R&D program. In addition to the most commonly recognized ability of models to make predictions, they can also serve as invaluable intellectual frameworks within which you can organize your data and reason about them. Furthermore, models are excellent vehicles for the communication of ideas and for the development and testing of new hypotheses.

In this tutorial, we will demonstrate the approach that we like to use to build models of cell signaling networks for the clients of our digital biology consulting firm. In order to demonstrate the basic principles of agent-based modeling while keeping this tutorial concise, we will model one of the simplest but widely occurring cell signaling motifs in biology - the Goldbeter-Koshland loop.

We have discussed at length on The Digital Biologist, the limitations of the widely-used deterministic modeling approaches such as ordinary differential equations, in their application to biology, - so here we will instead develop an example of the kind of agent-based, stochastic model that we use in the life science consulting work that we do for our clients. A few years back, I was fortunate enough to be hired by a Cambridge-based startup funded by two West venture capital firms, to participate in the development of a collaborative cloud-based biological modeling platform. Underpinning this modeling platform was a language called Kappa that is in effect, a programming language for biology. Via the semantic formalism expressed in a very simple set of rules that include synthesis, degradation, binding, unbinding, and modification (as shown in the diagram), Kappa is able to represent the processes that occur in most biological systems, at a very granular level. We won't have time to include a tutorial on Kappa's syntax here, but if you take a look at the diagram below and browse some of the Kappa documentation, you can get a sense of how it works. We will also discuss the Kappa syntax a little when we get to the details of actually building the model. The main thing to keep in mind is that every Kappa rule expresses a reaction that has a left and right-hand side, and describes a unidirectional or reversible transformation in which one or more of these synthesis, degradation, binding, unbinding, and modification processes occur. There are other, similar biological modeling languages you could also choose from, such as the BioNetGen language (which has a very similar syntax to Kappa), or the Systems Biology Markup Language (SBML), but for the purposes of this tutorial, we will use Kappa since it's the biological modeling language that we know best, as well as being the one that we use for our consulting projects. Whichever biological modeling language or platform you choose, the core principles and techniques of agent-based modeling demonstrated in this tutorial, will be largely the same, with the differences being for the most part, syntactical.


In this example, we are going to use Kappa to build a dynamic model of our cell signaling system - in other words, a simulation of the behavior of the modeled system over time. This is just one of the ways that agent-based modeling languages like Kappa can be used. In effect, we are creating a "game" in which the "players" (agents) and the rules that govern how they interact, are defined ahead of time. Once we have populated the playing field with the players that we want (in their initial states, which we can also define), we blow the whistle and watch the game unfold. This is the dynamic aspect of agent-based modeling, but there is also a static analysis aspect in which certain very important and useful properties of the system being modeled, can be computed directly from the structure of the model itself, as defined by its rules. The semantic formalism of languages like Kappa allows the modeler to do the kind of static analysis that computer language compilers perform, to get ready analytical answers to questions like "Is state x possible in this system?", without recourse to simulation. When building a model of a biological system like the EGFR signaling pathway for example, in which it is possible to generate upwards of 10^30 unique states, this question is decidedly non-trivial! You might run an awful lot of simulations of your modeled system and still never observe the state that you are interested in - leading you to wonder whether that state is unreachable in the model you have constructed, or that you simply have not adequately sampled the model's astronomically vast state space in your simulations so far. So while we will be focusing upon building and running a dynamic model in this article, it is worth keeping in mind that this is only half the story when it comes to building models with formal languages like Kappa - the static analysis component being an extremely powerful and in many ways, very complementary approach to the use of dynamic simulations.

Before we move on to the actual biology, it's worth spending a few minutes to discuss the algorithm we're going to use for the dynamic simulation of our cell-signaling network of Kappa agents and rules. The Gillespie algorithm, generates a stochastic, but kinetically correct trajectory for the modeled system via its consideration of event probabilities within the model. Based upon a set of initial conditions in which the quantities and states of all agents are defined at the outset, the likelihood for each possible event in the system is computed based upon the quantity of current states in the system that match the starting condition for the rule that describes that particular event, as well as the kinetic rate applied to the rule itself. The total number of possible events that can occur per unit time (determined by the units of time in the supplied kinetic rates) is then used to select a time interval from a normal distribution whose mean is the expected time interval for the next event. Part of the beauty of the Gillespie algorithm is the fact that the simulation time step is not fixed, but rather, it is based upon the current event probability distribution - for example, the simulation does not waste lots of computational resources computing event probabilities for time intervals that are too short for there to be any significant probability of an event occurring. Once the event probability distribution and the next time step interval have been calculated, a random event is selected based upon the the event probability distribution. This means that it is not necessarily the most likely event that will be selected, and it is in this weighted, random selection of the next event and its elapsed time interval, that we see the stochastic nature of the Gillespie simulation. This means that the algorithm, while yielding a kinetically correct trajectory for the modeled system, will also capture its stochastic behaviors including noise. This can be extremely important in the modeling of biological systems, since this kind of stochastic simulation can reproduce for example, the noise seen in gene expression and the resulting heterogeneity observed in populations of identical cells. Once this weighted random event and time interval have been selected, the event is executed, the states in the model are updated to reflect the changes generated by the event, and the simulation clock is advanced by the selected time interval. Then the whole cycle is repeated. This description of the Gillespie algorithm is a very brief overview of the approach, but the best way to really understand how it works, is to implement it yourself in a programming language like Python. This is such a valuable exercise for understanding stochastic simulation methods, that we actually teach our students how to implement the Gillespie algorithm in Python, in our Python For Life Scientists workshop course that we teach on-site for our clients.

Now that we've discussed our modeling approach and the language that we're going to use, let's take a look at the cell signaling network we're going to build. In order to make the model complex enough to reproduce some interesting real-world behaviors of cell signaling networks, but not so large that the complexity of the model overwhelms the clarity of the tutorial, we are going to model a very simple but biologically important cell signaling motif, that is widely seen in cell signaling networks.

The Goldbeter-Koshland (GK) loop is a very common motif in cell signaling pathways. The GK loop is a system whose kinetics allow it to switch between dramatically different equilibrium states, with only a very small change in its input conditions. The classic example of this is the cell signaling module shown in the diagram, in which a kinase and a phosphatase work in tandem to phosphorylate and dephosphorylate a substrate respectively. When the activities of the kinase and phosphatase are roughly equal, the equilibrium state of this system is a roughly even population mix of phosphorylated and unphosphorylated substrate. However, a very slight increase in the relative activity of either the kinase or the phosphatase, can switch the equilibrium state of the system to a population of predominantly phosphorylated or unphosphorylated substrate, depending upon whether it is the relative activity of the kinase or the phosphatase that is incrementally increased. The GK loop is in essence a kind of biological amplifier that allows the cell to generate a large response to a very small stimulus. To quote the authors of the original article:

"This amplification of the response to a stimulus can provide additional sensitivity in biological control, equivalent to that of allosteric proteins with high Hill coefficients".

GK loops can even be "stacked" in layers in a cell signaling network, wherein the phosphororylated substrate of one layer acts as the regulator for the kinase and/or the phosphatase of the next. The resulting cascade of GK loops can provide the cell with a kind of "photomultiplier" functionality in which even the tiniest of initial stimuli can be amplified to generate a huge response whose signal strength is many orders of magnitude larger than the initial stimulus.

And so to the model itself ...

Perhaps the most obvious feature of an agent-based model are the agents or "players" in the simulation. We will use three types of agents in our model - a kinase (to phosphorylate the substrate), a phosphatase (to dephosphorylate the substrate) and of course, the substrate itself. Kappa models require us to furnish a specification of each agent type in our model, so here's the Kappa code that describes the three agent types.

/* agents */
%agent: K(x)
%agent: P(x)
%agent: S(x{u, p})

All agents in Kappa models have sites (a list of which are enclosed in parentheses after the agent's name) and sites can have states (expressed in curly brackets). Here we have three agent descriptions, K (kinase) with a site x, P (phosphatase) with a site x and S (substrate) also with a site x. Site names must be unique on each agent but different agents can use the same site name. Later on we will see that all binding and unbinding interactions between agents also occur through their sites which can carry flags that denote their binding to other sites (usually, although not always, on other agents). We have specified that the site x on the substrate can also have two states u and p. These states are arbitrary labels that can mean whatever you want them to, but we have chosen u and p to represent the unphosphorylated and phosphorylated states respectively.

Another feature of Kappa is that like any good programming language, it allows us to define numerical variables that we can use in our code, instead of having to enter the numerical value everywhere we need it. We will use these variables to define the molar concentrations of our various agents in the cell like this:

/* concentrations */
%var: 'conc_K'                1.0e-6
%var: 'conc_P'                1.0e-6
%var: 'conc_S'                1.0e-5
%var: 'fraction_K'            1.0

The values used here are typical protein concentrations for a cell, and you will notice that there is a 10-fold excess of substrate over the kinase and phosphatase. We have also defined a value fraction_k that we can use as a multiplier for the kinase concentration, giving us a convenient "lever" in the model with which to vary the concentration of the kinase up or down (more on this later).

It should be noted that although we have defined molar concentrations for our agents, we are going to be dealing with actual, countable quantities of agents in our stochastic simulation, so we will need to convert these values from molar concentrations to actual numbers of molecules in the cell. In order to do this, we will also need to consider the volume of the cell that we are going to model. Our simulation volume could be the entire cell, but this is unnecessary and in practice, to save time and computing resources, we typically model some fraction of a cell volume. A typical eukaryotic cell volume is about 2.5 x 10^12 L so we will create a variable to represent this and to represent the fractional cell volume that we wish to use for our simulation volume. 

In order to convert from molar concentrations to number of molecules, we need to use Avogadro's Number and the simulation volume like this:


where n is the number of molecules of the agent a in the simulation, [a] is the molar concentration of the agent, A is Avogadro's number and V is the simulation volume. The quantity AV is a very useful one to compute since as we'll see later on, we will also need it to convert molar rate constants to stochastic rate constants. Since we are expressing our simulation volume as some fraction of a cell volume, we can define the variable 'stochastic' in our model as a useful stochastic conversion factor for our chosen simulation volume, like this:

/* Conversions from concentration-based to stochastic parameters */
%var: 'avogadro' 6.022E23
%var: 'cell_volume' 2.25E-12
%var: 'cell_volume_fraction' 0.001
%var: 'stochastic' 'avogadro' * 'cell_volume' * 'cell_volume_fraction'

The other place in our model that we will need this stochastic conversion factor, is for the conversion of molar reaction rates to stochastic rates, so now we can also add these kinetic rates to our model as variables:

/* rates */
%var: 'kon_K_S'    1.0e5 / stochastic   // M-1 s-1  on-rate of kinase binding to substrate
%var: 'koff_K_S'   1.0e-3               // s-1      off-rate of kinase unbinding from substrate
%var: 'kon_P_Sp'   1.0e5 / stochastic   // M-1 s-1  on-rate of phosphatase binding to substrate
%var: 'koff_P_Sp'  1.0e-3               // s-1      off-rate of phosphatase unbinding from substrate
%var: 'kphos'      0.1                  // s-1      rate of kinase phosphorylation of bound substrate
%var: 'kdephos'    0.1                  // s-1      rate of phosphatase dephosphorylation of bound substrate
%var: 'koff_K_Sp'  1.0e-2               // s-1      off-rate of kinase unbinding from phosphorylated substrate
%var: 'koff_P_S'   1.0e-2               // s-1      off-rate of phosphatase unbinding from unphosphorylated substrate

The first thing you will probably notice is that we are dividing all of the molar rate constants by our stochastic conversion factor. The expression for the conversion from kinetic to stochastic rate constants is:


where γ is the stochastic rate constant, k is the kinetic rate constant, AV is our stochastic conversion factor, and m is the molecularity of the reaction (usually 1 (unimolecular) or 2 (bimolecular), since we do not typically need to represent higher order reactions in our models). You will notice that for a bimolecular reaction such as binding, the kinetic rate constant is divided by our stochastic conversion factor AV, but for unimolecular reactions such as unbinding, the expression simply evaluates to the kinetic rate constant. This is why in the model code, you will see that we are dividing the kinetic rates by the stochastic conversion factor for the bimolecular binding reactions, but not for the unimolecular unbinding or modification reactions.

So with this discussion of rates as a segue, let's get to the Kappa rules that will define the interactions and processes in our dynamic model, and to which these rates will be applied in the simulation.

/* rules */
'K S assoc' K(x[.]), S(x[.]{u}) <-> K(x[1]), S(x[1]{u}) @ kon_K_S, koff_K_S
'P S assoc' P(x[.]), S(x[.]{p}) <-> P(x[1]), S(x[1]{p}) @ kon_P_Sp, koff_P_Sp
'K S mod' K(x[1]), S(x{u}[1]) -> K(x[1]), S(x{p}[1]) @ kphos
'P S mod' P(x[1]), S(x{p}[1]) -> P(x[1]), S(x{u}[1]) @ kdephos
'K S dissoc' K(x[1]), S(x{p}[1]) -> K(x[.]), S(x{p}[.]) @ koff_K_Sp
'P S dissoc' P(x[1]), S(x{u}[1]) -> P(x[.]), S(x{u}[.]) @ koff_K_Sp

At the heart of the Kappa language are its rules that define the possible interactions and behaviors of the agents in the model. Kappa syntax requires each rule to have a name (in quotes), a left hand side agent pattern and a right hand side agent pattern separated by a single or double-headed arrow (depending upon whether the rule is reversible or not), a forward rate and optionally, a reverse rate (if the rule is reversible). The forward and reverse rates follow the '@' symbol at the end of the rule. If we look at the first rule in the list as an example, we can see that it defines the reversible binding of the kinase (K) and the unphosphorylated substrate (S). The name of this rule is 'K S assoc'. The left hand side agent pattern:

K(x[.]), S(x[.]{u})

consists of the kinase K with its site x unbound, as indicated by the period in the square brackets (the site x on K does not have any specified states) - and the substrate S, also with its site x unbound, and in the unphosphorylated (u) state. The right hand side agent pattern consists of the same pair of agents, but now bound at their x sites, as indicated by the index 1 that appears in both of the agents' binding descriptions for their respective x sites. 

K(x[1]), S(x[1]{u})

This index can be thought of as the unique label for the edge that would connect the two agents' sites in a graph diagram of the network of agent interactions. In effect, you can think of a binding rule as creating a new edge in the network graph. You will also notice that the unphosphorylated (u) state of the substrate x site remains unchanged. This rule specifies a reversible binding reaction in which the kinase and the unphosphorylated substrate are able to associate and dissociate according to the rates 'kon_K_S' and 'koff_K_S' respectively. The phosphorylation of the substrate by the bound kinase is handled separately in the modification rule 'K S mod'.

'K S mod' K(x[1]), S(x{u}[1]) -> K(x[1]), S(x{p}[1]) @ kphos

In the 'K S mod' rule, the left hand side agent pattern is actually the bound kinase/substrate complex that was formed as the right hand side of the first rule, and the single arrow tells us that this new rule is unidirectional - resulting in the modification in its right hand side agent pattern, of the state of the substrate x site from u to p.

One important fact to consider is that the first, reversible kinase/substrate binding rule that we looked at, does not allow for the dissociation of the kinase from the phosphorylated substrate, since it specifies an unphosphorylated state for the site x on the substrate S. We could have written this rule without the state of the site x specified, in which case it would have described the reversible association of the kinase with the substrate, whether or not the substrate was phosphorylated at x. This kind of scenario is referred to in the Kappa documentation as the "Don't write, don't care" principle. If a specific state or binding is not mentioned in a rule, then it will not be checked when searching for patterns of agents that match the rule's agent patterns. This would have the consequence that the kinase would bind both the phosphorylated and unphosphorylated substrate, which we do not want. Furthermore, choosing not to write this rule more generically with regard to the specification of the substrate site x,  requires us to write another rule to explicitly handle the dissociation of the kinase from the phosphorylated substrate - and this in turn allows us to use a different rate constant for the dissociation of the kinase when the substrate is phosphorylated. This is a good example of the kind of choices that often need to be made when building models, and in our example, we chose to include an extra rule to handle the particular case of the dissociation of the kinase when the substrate is phosphorylated. Incidentally, if we did not include an additional rule to handle the dissociation of the kinase from the phosphorylated substrate, all of the kinase would quickly become irreversibly bound to the phosphorylated substrate. This would result in the simulation getting 'stuck' since no phosphatase would be able to bind the phosphorylated substrate (binding at Kappa sites is exclusive) and no further substrate modification would be possible. With that in mind, here is the additional rule needed to handle the dissociation of the kinase from the phosphorylated substrate.

'K S dissoc' K(x[1]), S(x{p}[1]) -> K(x[.]), S(x{p}[.]) @ koff_K_Sp

This choice to use the extra rule both prevents the kinase from binding the phosphorylated substrate, and allows us to have the kinase dissociate more rapidly from the phosphorylated substrate. If you look at the full set of rules, you will see that we also handled the binding of the phosphatase and the substrate in an analogous manner.

The six rules shown above capture the behaviors of our dynamic system - the kinase and the phosphatase reversibly associating with and modifying the substrate. All we need now to be able to run the model, is a set of initial conditions that define the number of each type of agent and their starting states (whether they are bound, unbound, modified etc. ). Here are the model's initial conditions in Kappa syntax.

/* initial conditions */
%init: conc_K * fraction_K * stochastic K(x[.])
%init: conc_P * stochastic P(x[.])
%init: conc_S * stochastic S(x{u}[.])

Once again we are using our stochastic conversion factor to convert from the molar concentrations that we entered as variables, to absolute numbers of molecules. We are also using our 'fraction_K' variable to control the level of kinase. Currently it is set to 1.0, so that the kinase and phosphatase levels are equal, but later on we will vary it up or down in order to create a small excess or depletion in the level of kinase, in order to demonstrate the hypersensitivity of the response of the GK loop to the relative levels of kinase and phosphatase activity in the system. You can see from the code for the initial conditions, that we are going to start our simulation with all of the substrate in its unphosphorylated state.

One of the great advantages of agent-based models like this one, is that you can track the trajectories of every single agent in the system at the most granular level of detail. The Kappa language syntax includes directives for setting up specified observables that are agents, agent groups or complexes of agents that can be tracked throughout the simulation. These can be specified right down to the details of the binding status and states of their individual sites, allowing the modeler to track very closely what is happening over the course of the simulation. For the purposes of our GK model, we would obviously like to track the response of the GK system to the relative levels of kinase and phosphatase activity - in other words, the quantities of phosphorylated and unphosphorylated substrate that the system produces over time. To this end we will define two observables to track during the simulation, like this:

%obs: 'Unphosphorylated Substrate'  |S(x{u})|
%obs: 'Phosphorylated Substrate'    |S(x{p})|

Each observable is a named agent pattern enclosed in straight brackets to denote that we would like to track their respective quantities. You will note that we have not specified the binding status of the substrate site x in either case, which means that we don't care whether it is bound or unbound substrate that is being included in the statistics (you will recall Kappa's "don't write, don't care" approach).

So finally, let's run our GK model with the 'fraction_K' variable set to 1.0 so that the levels of kinase and phosphatase in the simulated system are equal, and see what we get.

The graph above shows the evolution over time, of the substrate (S) in its phosphorylated (orange) and unphosphorylated (blue) states. The horizontal axis represents time in a sense, but more correctly, it is actually the number of events that have occurred in the simulation. Since the time interval between events in the Gillespie algorithm is not constant, the x-axis is a kind of irregular time axis. Strictly speaking, the Gillespie algorithm does not have or need its own internal units for time, but rather, these are determined by the units of time for the stochastic rates that are supplied to the algorithm.

Starting with all of the substrate in its unphosphorylated state, you can see that the amount of phosphorylated substrate quickly rises as  a result of the kinase activity in the system. The phosphatase activity is obviously minimal until the point at which there starts to be some phosphorylated substrate for it to dephosphorylate. Eventually, with the combined activities of the kinase and the phosphatase, the system settles into an equilibrium state in which the population of substrate molecules is equally divided between the phosphorylated and unphosphorylated states. It's important to understand that in a stochastic model such as this, an equilibrium state is a dynamic state just as it is in the real biological system that it models. There is a tendency to think of equilibrium states as conditions under which there is no longer any change occurring within the system, and this is also the impression that you get when you see the flat equilibrium curves in deterministic models. While it is true that the system's mean state perturbations might be at or close to zero, you can see from the graph below that shows a close-up of the GK model's equilibrium state, that there is still a very dynamic exchange of the substrate between its phosphorylated and unphosphorylated states.


To demonstrate the hypersensitivity of the GK loop to the relative levels of kinase and phosphatase activity in the system, let's try slightly increasing the kinase level. We will use our convenient model "lever" - the variable 'fraction_K' that we defined earlier, incrementing its value from 1.0 to 1.1. This will raise the kinase level by 10% because if you recall, the variable 'fraction_K' appears in our definition of the kinase initial condition, like this:

%init: conc_K * fraction_K * stochastic K(x[.])

Let's run the simulation with 'fraction_K' at 1.1 and see what the equilibrium state of our GK system looks like now.

As a result of the very small increase in the level of kinase relative to the phosphatase, the GK system now behaves dramatically differently, settling into an equilibrium state in which the population of phosphorylated substrate substantially predominates. This dramatic response to such a small change in the initial conditions is quite remarkable and it provides the cell with a powerful mechanism to respond strongly to changes in its environment such as the depletion of nutrients, changes in the levels of signaling and messenger molecules, sensory stimuli or changes in the interactions with its neighboring cells.

Similarly, if we now try decreasing the kinase level by 10% by setting  'fraction_K' to 0.9 and running the simulation again, we see a similarly dramatic response in the other direction - in favor of the population of unphosphorylated substrate.

The GK loop is essentially acting as a kind of sensitive cellular switch, enabling the cell to flip efficiently and rapidly between states as it responds to its environment. To obtain such a strong response to changing conditions under standard Michaelis-Menten kinetics for normal enzymes, would require for example, something like a fluctuation of about two orders of magnitude in the concentration of the enzyme's ligand. Even an allosteric enzyme displaying a considerable degree of cooperativity in its ligand binding, would still typically require a several-fold change in its ligand concentration to effect such a substantial, differential response.

In the interests of constraining the length and complexity of this tutorial, we have demonstrated how to build an agent-based model of a tiny and very simple cell signaling network. In our consulting and previous work, we have successfully applied Kappa modeling to much larger and more complex cell signaling systems across a range of biological areas including the role of insulin in glucose homeostasis, dopamine signaling, the development of drugs to treat osteoarthritis, immune regulation, and the roles of Her-2 receptor overexpression and wnt signaling in cancer, to name but a few examples.

Computational modeling is not yet in the mainstream of life science research in the way that it is in other fields, but we believe that it is an invaluable adjunct to any life science R&D program. A very small cadre of biologists have applied traditional, mathematical modeling approaches borrowed from other disciplines, but biological modeling is still a relatively new and sparsely populated field. With the advent of biological modeling approaches like Kappa that are much better suited to the representation of the stochastic, heterogeneous and astronomically complex systems of interest to biologists, and which allow biologists to construct models in a much more familiar idiom than mathematical equations - it is our hope that computational modeling  will soon become as indispensable a part of the life scientist's research toolbox as it is for the physicist or the engineer.

If you have any questions about the GK model demonstrated here or would like the Kappa code for the model to try it out for yourself, please contact me.

© 2018, The Digital Biologist

#computationalbiology #systemsbiology #modeling #simulation #stochasticmodeling #deterministicmodeling #differentialequations #complexadaptivesystems #cellsignaling #cellsignalingpathway #cellsignalingnetwork #cellbiology #kappa