Python For The Life Sciences: Update

After about a year of work and barring a few last-minute tweaks to the layout, it looks like we finally have our book Python For The Life Sciences, ready to publish. UPDATE 10/6/2016: THE BOOK IS NOW PUBLISHED AND AVAILABLE AT LEANPUB

I think it’s fair to say that we probably got a little carried away with the project, and the book has ended up somewhat larger than we thought it would be, coming in at over 300 pages. For sure it’s not quite the concise, slim volume, quick introduction to biocomputing with Python that I think we had envisaged in the beginning.

The book does however cover an incredible range of life science research topics from biochemistry and gene sequencing, to molecular mechanics and agent-based models of complex systems. We hope that there’s something in it for anybody who’s a life scientist with little or no computer programming experience, but who would love to learn to code.

For the latest news on the book, including all the free revisions and updates that are included with any purchase of the book, sign up for the (zero spam) Python For The Life Sciences Mailing List

© The Digital Biologist

Python For Handling 96-Well Plate Data and Automation

book-graphics.005There’s hardly a life science lab you can walk into these days, without seeing a ton of 96-well plates and instruments that read and handle them. That’s why we’ve dedicated an entire chapter of our forthcoming book Python For The Life Sciences, to the humble 96-well plate.

The chapter introduces the use of Python for handling laboratory assay plates of many different sizes and configurations. It shows the reader how to read plate assay data from files formatted as comma-separated values (CSV), how to implement basic row and column computations, how to plot multi-well plates with the wells color-coded by their properties, and even how to implement the high level code necessary for driving instruments and robots through devices like Arduinos.

And this is just one of about 20 chapters designed to introduce the life scientist who wants to learn how to code, to the wonderful and versatile Python programming language.

Almost all of the code and examples in the book are biology-based and in addition to teaching the Python programming language, the book aims to inspire the life scientist reader to bring the power of computation to his or her research, by demonstrating the application of Python using real-world examples from across a wide range of biological research disciplines.

The book includes code and examples covering next-generation sequencing, molecular modeling, biomarkers, systems biology, chemical kinetics, population dynamics, evolution and much more.

Python For The Life Sciences should be available as an eBook this fall (2016), so if you’re a life scientist interested in bringing a computational skill set to your research and your career, visit the book’s web page and sign up to our (no spam) mailing list for updates about the book’s progress and publication.

© The Digital Biologist

The Central Role Of User Experience Design In Scientific Modeling

Big-Simulation-BorderedData is not knowledge.

Data can reveal relationships between events – correlations that may or  may not be causal in nature; but by itself, data explains nothing without some form of conceptual model with which it can be assimilated into an intellectual framework that allows one to reason about it.

Computational modeling is not in the mainstream of life science research in the way that it is in other fields such as physics and engineering. And while all scientific concepts are implicitly models, most biologists have had relatively little experience of the kind of explicit modeling that we’re talking about here. In fields like biology where exposure to computational models is more limited, there is a tendency to consider their utility largely in terms of their ability to make predictions – but what often gets overlooked is the fact that models also facilitate the communication and discussion of concepts by serving as cognitive frameworks for understanding them.

Next to the challenge of representing the sheer complexity of biological systems, this cognitive element of modeling may be the single biggest reason why modeling is not in the mainstream of the life sciences. Most biological models use idioms borrowed from other fields such as physics, where modeling is both more mature, and firmly in the mainstream of research.

For a model to be truly useful and meaningful in a particular field of intellectual activity, it needs to support the conceptual idioms by which ideas and knowledge are shared by those in the field.

In other words, it should be possible to put questions to the model that are couched in the conceptual idiom of the field, and to receive similarly structured answers. To the extent that this is not true of a model, there will be some degree of cognitive disconnect between the model and the user which will impede the meaningful interaction of the user with the model.

Nowhere can this be more clearly seen than in the field of software design. Software applications make extensive use of cognitive models in order to facilitate a meaningful and intuitive interaction with the user. As a very simple example – software that plays digital music reproduces the play, forward and reverse buttons that were common on physical media devices like cassette and VHS players. This is because almost everybody has the same expectations about how these interface components are to be used, based upon their prior experience with these devices. As an aside, it’s interesting to reflect on the fact that while the younger generation may see these interaction motifs everywhere in the user interfaces of software media players, many of them will never have seen the original devices whose mechanical interfaces inspired their design.

blog-graphics-2015.001

The psychology and design that determines these interactions with the objects and devices that we use, is such an important area of study that it has given rise to an entire field that is commonly referred to as User Experience (UX) or User Experience Design. UX lies at the intersection of psychology, design and engineering and is concerned with the way that humans interact with everything in the physical world from a sliding door to the instrument panel of an airliner – and of course, their analogs in the virtual world; web browsers, electronic books, photo editing software, online shopping carts and so on.

Affordances and signifiers are the currency of UX design, facilitating the interaction between the user and the object or software. If you consider an affordance as a means of interaction (like the handle on a door for example), signifiers are signs for the user that suggest how the affordances might work. To use our very simple door handle example – a handle that consists of a flat metal plate  on the door suggests that the door be pushed open. A handle consisting of a metal loop more strongly suggests that the door should be pulled open. For the purposes of illustration, this is just a very superficial and simple example of the kind of cognitive facilitation that effective UX design can support. By contrast, consider the role that UX design plays in highly complex, human-built systems whose interactions with the user are predicated on multiple and often interdependent conceptual models, each of enormous complexity in its own right. In some cases, a single, erroneous interaction with such a system might even destroy the system and/or lead to the loss of human life.

So what does all of this have to do with scientific modeling?

By facilitating a cognitive connection between the user and an object, a device or a piece of software, effective UX design makes the interaction easier, more intuitive and more meaningful. Insofar as a computational model is being used to develop a conceptual framework that explains data, effective UX design similarly facilitates the cognitive leap from data to knowledge.

To be very clear, what we’re discussing here is user experience writ large. It encompasses considerations of the user experience design for any software that a researcher might be using to implement a model, but also a great deal more besides. The conceptual model being used to describe a biological system has a user experience component in and of itself, that when it works, provides a cognitive handle by which the system being modeled can be understood.

In a non-computational approach to understanding the system for example, this might be manifest in something as simple as the ability to draw an explicative diagram of the system on a piece of paper. In biology, think for example of the kind of pathway diagrams that biologists often draw to explain cell signaling (there’s even one in this article). In physics, the Feynman diagram that is used to intuitively describe the behavior of subatomic particles, is a perfect example of a piece of brilliant user experience design that provides a cognitive handle on a complex conceptual model.

In the case then, where the conceptual model is being implemented on a computational platform – to the extent to which the conceptual model can be mapped to the software, areas of overlap between the user experience design of the model and of the software are inevitable and often even inextricable.

As we have already seen, a very common theme in the user experience design of software, is the replication of components of the physical world that create an intuitive and familiar framework for the user – think for example of the near universal adoption of conventions like files and folders in computer file-handling systems, borrowed directly from office environments that pre-date the use of computers. Such an approach can be a very useful tool for enhancing the user experience.

As the VP of Biology at a venture-funded software startup building a collaborative, cloud-computing platform to model complex biological pathways, a major part of my role in the company was to serve as the product manager for the software. In practice, this actually comprised two roles. The first was an internal role as the interface between the company’s biology team tasked with developing the applications for our product, and our software engineering team who were tasked with building the product. The second was an external-facing role as a product evangelist and the liaison between our company and the life science research community – the potential client base for whom we were building our product.

One component of our cloud-computing platform was an agent-based simulation module for modeling cell signaling pathways. The ‘players’ in these simulations were as you would expect, mostly proteins involved in cell signaling pathways – kinases, phosphatases etc. and any kind of phosphoprotein whose cellular activity is typically modulated by the kind of post-translational modification events that proteins like kinases and phosphatases mediate.

As a simulation proceeded on the cloud, it could be tracked by the user through a range of different visualizations in their web browser. One of these displayed the concentrations of the different molecular species present in the simulation, over time. This was initially presented as a graph like this:

Big-Simulation-Bordered-graph-crop

But if you think about the way that a biologist in the laboratory would do this experiment, this presentation of the results, while being information-rich, would not be what he or she was used to. The analogous lab experiment would probably involve sampling the reaction mixture at regular intervals and for example, running these aliquots as a time series on a gel to visualize their fluctuations over the course of the experiment.

My initial proposal that we add a visual element to the graph that reproduced what the biologist would see if they were to run the reaction mixture from a particular time point on a gel, was met with some degree of skepticism from the software engineers .

To be fair, it has to be said at this point that any good software engineering team (consisting of developers, business analysts, product managers etc.) always will (and should) set a high bar for the approval of new features in the code, especially where there is any kind of significant cost in time, money or resources required for their implementation. We were fortunate in our company, to have just such an excellent software engineering team and so their initial resistance to this idea was not wholly unexpected. The main argument against it was that it would not be an information-rich visual presentation of the simulation results in the way that the graph already was, and furthermore, that it was redundant since this information was already presented at a much higher resolution in the graph.

When however, in my capacity as external liaison with our potential client base,  I tested the response of the life science research community to a mock-up of this feature, the results were amazingly positive.

Big-Simulation-crop

We asked biologists who agreed to be interviewed, to compare the version of the simulation interface that contained only the graph, with a mock-up of an updated version (shown above), that also contained a simulated Western blot display with a time slider that could be moved across the graph to show what the Western blot gel would look like at each sampled time point.

Their responses were striking. What we heard most often from them (and I’m aggregating and paraphrasing the majority response here), was that the version of the interface with the Western blot display made a great deal more sense to them because it helped them to make the mental leap between the data being output from the model and what the model was actually telling them. Perhaps most importantly – in their minds it also reinforced the idea of the computational simulation as a virtual experiment whose results could help guide their decisions about which physical experiments to do in the lab.

Despite this new visualization not being information-rich as the software engineers had rightly pointed out – in its ability to frame the output from the simulation model in an idiom that was meaningful to the biologist,  it created a richer and deeper cognitive connection between the biologist-modeler and the biology that was being represented and explored in the model.

Recognizing that if modeling is ever to really become a part of the mainstream in life science research in the way that it is in physics, we took very seriously, the idea of doing biological modeling in an idiom that is appropriate for biology. This idea permeated every aspect of the development of our collaborative computational modeling platform, especially since it was also clear from our own product and market research, that biologists were no more willing to become mathematicians or computer scientists to use models in their own research, than people were willing to become mechanics in order to drive cars.

Signaling-CartoonTake a look for example, at this cartoon a biologist drew of a cell signaling pathway (thanks Russ). It illustrates perfectly the paradigm of an interconnected network of signaling proteins that is in essence, the consensus model in the biology community for how cell signaling works. At some level, it matters little that we cannot consider this to be a realistic, physical model of cell signaling since it implies the existence of static ‘biological circuits’ that in reality do not exist in the cell. In using this model however, biologists are not suggesting this at all. This model does a very good job of representing conceptually, the network of interactions that determine the functional properties of a cell signaling pathway.

There are some obvious intuitive benefits to this model (and many more very subtle ones). For example, if we were to try to trace the network edges from one protein (node) to another and discovered that they were not connected by any of the other proteins, we could infer that none of the states available to the first protein, could ever have an influence on the states of the second.

Raf-Mek-Erk-CMHere for comparison, is the analogous representation of that same cell signaling pathway, assembled on our cloud computing platform using a set of lexical rules that describe each of the ‘players’ and their interactions. Even the underlying semantic formalism that we used as as a kind of biological assembly language to represent the players (usually proteins) and their interactions, was couched in terms of a familiar and relatively small set of biological events (binding, unbinding, modification etc.) that are in themselves, sufficient to represent almost everything that happens in a cell at the level of its signaling pathways.

In summary then, insofar as computational models facilitate thinking and reasoning about the biological systems we study and collect data from, they can help us much more effectively if they allow us to work in the idioms that are familiar and appropriate to our field. This notion can be more fully grasped by considering its antithesis – the use of ordinary differential equations (ODEs) to model biological systems, which still tends to be the dominant paradigm for biological modeling despite being an exceedingly opaque, unintuitive and largely incompatible approach for modeling systems at a biological scale.

It is also clear that software developers need to work closely with experts who have specialized domain knowledge if they are to create computational modeling platforms that will not only be effective for their particular domain, but also widely adopted by its practitioners. In the case of biology, it was clear to us when we were developing our modeling platform, that its success would depend in no small part, on the appeal that it could make to the imagination and intuition of the biologist. With computational modeling as with software development, even the most meticulously crafted of tools will have little or no impact or utility in its field if a cognitively dissonant user experience results in it rarely or never being used.

© The Digital Biologist

Welcome to “The Digital Biologist”

copy-Symbol.003.pngWith new technology comes new ways of getting stuff done and sometimes, it also creates new career possibilities as well. Just as the invention of the moving picture gave rise to the cinematographer, the technological advances that are now making it possible to model and simulate complex living systems on a computer are giving rise to a brand new profession – that of The Digital Biologist. I number myself amongst the members of this new profession and although they might not identify themselves as such, there are actually already quite a few other digital biologists out there as well.

“But wait”, you might be tempted to say. “Haven’t people been already doing biology on computers for some time now?”, to which my answer would be “Not really”. Biology (with a big B) is the study of the remarkable properties of complex living systems that set them apart from all of the dead stuff of which our universe seems to be primarily composed – their ability to self-organize, to grow and reproduce, to adapt to their environments etc. etc. Until now most of the models that we have have used to describe biological systems have been essentially borrowed from the older and more established fields of physics and chemistry. In fact, although biologists often speak of relatively macroscopic concepts like reproduction and adaptation, the burgeoning wealth of data that they are producing in the laboratory is overwhelmingly physicochemical in nature. Such a detailed and microscopic description of living systems, while undoubtedly valuable, deals mostly with the kind of chemical processes that also take place in non-living systems, albeit in a far less structured and concerted manner. This data alone, captured at a scale an order of magnitude or two below that at which the actual “biology” of these systems becomes apparent, does not really describe their biology any better than a three-dimensional map of the neurons in a brain captures human consciousness.

The traditional computational models for biology borrowed from the fields of physics, chemistry and mathematics are most commonly applied at the same level of detail as the physicochemical data that is collected in the laboratory, which makes for a good fit between the modeling and the experimental work. Unfortunately however, they only allow the biologist to build either extremely detailed models of tiny portions of living systems or rather vague and low resolution models of more macroscopic regions. The convergence of ideas from the intersections of these traditional fields with computer science however, is starting to bear fruit in the form of computational modeling approaches that have been designed with the astronomical complexity of biological systems in mind and which can serve as intellectual frameworks capable of transforming this wealth of physicochemical data into real biological knowledge and insight.

For the very first time, the biologist – the digital biologist – is being presented with the opportunity to capture the essential properties and behaviors of a complex living system in a computational model that is as meaningful and relevant from a biological (with a big B) perspective as the physicist’s model of an atom or the engineer’s model of a suspension bridge.

Welcome to the age of the Digital Biologist!

© The Digital Biologist | All Rights Reserved