The Future of Research is now a thing

FOR-250To be fair, it always was a ‘thing’, but now that The Future of Research has made it official by becoming a fully-fledged non-profit organization, it’s now really a ‘Thing’ (with a big T). This is no small feat for a group that started out as a loosely-knit assembly of grassroots activists who share a common interest in improving things for academic researchers. In only three years, they have built both national  and international recognition for their movement. They were named 2015 People of the Year by Science Careers, as well as landing a 2-year, $300,000 grant from the Open Philanthropy Project, to help them in their work in assisting junior scientists in grassroots efforts to change science policy.

So congratulations to the Future of Research on becoming a Thing with a big T. Anyone who cares about the future of academic research and the working conditions and job prospects of those who pursue careers in it, should consider joining this very important conversation that the Future of Research has started.

As history has taught us over and over, systems that are broken, dysfunctional and unfair, rarely if ever transform or dismantle themselves from the top down, but rather, from the bottom up. Those who benefit the most from such systems are also usually its gatekeepers, and they will generally strive (consciously or unconsciously) to preserve it since they potentially have the most to lose from changing it. This is precisely why grassroots organizations like The Future of Research are so important as instruments of social change.

© The Digital Biologist

Big Data Does Not Equal Big Knowledge


… the life science Big Data scene is largely Big Hype. This is not because the the data itself is not valuable, but rather because its real value is almost invariably buried under mountains of well-meaning but fruitless data analytics and data visualization. The fancy data dashboards that big pharmaceutical companies spend big bucks on for handling their big data, are for the most part, little more than eye candy whose colorful renderings convey an illusion of progress without the reality of it.

Read the full article on LinkedIn.

© The Digital Biologist

Python For The Life Sciences: Table of contents now available

Coming Soon ...Our book Python For The Life Sciences is now nearing publication – we anticipate sometime in the early summer of 2016 for the publication date. Quite a number of people have asked us to release a table of contents for the book, so without further ado, here is the first draft of the table of contents.

If you would like to receive updates about the book, please sign up for our book mailing list.

Python at the bench:
In which we introduce some Python fundamentals and show you how to ditch those calculators and spreadsheets and let Python relieve the drudgery of basic lab calculations (freeing up more valuable time to drink coffee and play Minecraft)

Building biological sequences:
In which we introduce basic Python string and character handling and demonstrate Python’s innate awesomeness for handling nucleic acid and protein sequences.

Of biomarkers and Bayes:
In which we discuss Bayes’ Theorem and implement it in Python, illustrating in the process why even your doctor might not always estimate your risk of cancer correctly.

Reading, parsing and handling biological sequence data files:
Did we already mention how great Python is for handling biological sequence data? In this chapter we expand our discussion to sequence file formats like FASTA.

Regular expressions for genomics:
In which we show how to search even the largest of biological sequences quickly and efficiently using Python Regular Expressions – and in the process, blow the lid off the myth that Python has to be slow because it is an interpreted language.

Biological sequences as Python objects:
Just when you thought you had heard the last about sequences, we explore the foundational concept of Object Oriented Programming in Python, and demonstrate a more advanced and robust approach to handling biological sequences using Python objects.

Slicing and dicing genomic data:
In which we demonstrate how easy it is to use Python to create a simple next-generation sequencing pipeline – and how it can be used to extract data from many kinds of genomic sources, up to and including whole genomes.

Managing plate assay data:
In which we use Python to manage data from that trusty workhorse of biological assays, the 96-well plate.

Python for structural biology and molecular modeling:
In which we demonstrate Python’s ability to implement three-dimensional mathematics and linear algebra for molecular mechanics. It’s nano but it’s still biology folks!

Modeling biochemical kinetics:
In which we use Python to recreate what happens in the biochemist’s beaker (minus the nasty smells) – as well as using Python to model the cooperative binding effects of allosteric proteins.

Systems biology data mining:
In which we demonstrate how to parse and interrogate network data using Python sets, and in the process, tame the complex network “hairball”.

Modeling cellular systems:
In which we introduce the Gillespie algorithm to model biological noise and switches in cells, and use Python to implement it and visualize the results – along with some pretty pictures to delight the eye.

Modeling development with cellular automata:
In which we use the power of cellular automata to grow some dandy leopard skin pants using Turing’s model of morphogenesis with Python 2D graphics. Note to our readers: no leopards were harmed in the writing of this chapter.

Modeling development with artificial life:
In which we introduce Lindemeyer systems to grow virtual plants and use Python’s implementation of Turtle LOGO. Don’t worry, these plants will not invade your garden (but they might take over your computer).

Predator-prey dynamics in ecology:
In which we let loose chickens and foxes into an ecosystem and let ‘em duke it out in a state-space that is visualized using Python’s animation features.

Modeling virus population dynamics with agent-based simulation:
In which we create a virtual zombie apocalypse with agents that have internal state and behaviors. These are definitely smarter-than-usual zombies that illustrate an approach in which Python’s object-oriented programming approach really shines.

Modeling evolution:
In which we use the Wright-Fisher model to demonstrate natural selection in action, and show how being the “fittest” doesn’t always mean that you will “win”. Think Homer Simpson winning a game of musical chairs.

© The Digital Biologist

New drug approvals slide in 2013

After the feeling that the life science sector had perhaps turned an important corner in 2012, based upon what was seen as a meaningful uptick in the rate of new drug approvals, 2013 actually turned out to be something of a dud – a return to the bleak "business as usual" scenario of poor returns from hefty investments that has beset the industry for some years now. You can see the full report here online at the Fierce Biotech website, but for those of you in a hurry, the gist of the news is summarized in the FB report's introductory paragraphs.

"The wave of new drug approvals that had been building at the FDA has broken. According to the official tally of new drug and biologics approvals at the agency, the biopharma industry registered only 27 OKs for new entities in 2013–a sharp plunge from 2012's high of 39 that once again raises big questions about the productivity and sustainability of the world's multibillion-dollar R&D business.

After 2012 some experts boasted that the industry had turned a corner, with the agency boasting that it was outstripping the Europeans in the speed and number of new drug approvals. But for 2013 the numbers look a lot closer to the bleak average of 24 new approvals per year seen in the first decade of the millennium than the 35 per year projected by McKinsey through 2016.

The agency says it was hampered by a sharp drop in the number of new drug applications, forcing a sudden plunge in the annual total–even after starting the year with a new breakthrough therapy designation (BTD) designed to speed the arrival of major therapeutic advancements."

 © The Digital Biologist | All Rights Reserved

Another HER-family signaling model

There has been a great deal of interest in the signaling of the HER family of receptors as a result of the central role that they appear to take in the proliferation of certain epithelial cancers. Some biotechnology companies have even built innovative portfolios of biotherapeutics around some of the q quantitative ideas that have been derived from such models.

Now a new study published in the August 2013 issue of PLOS Computational Biology, takes the modeling of the complex HER family signaling pathways, a step further. with a model based upon a comprehensive data set comprising the relative abundance and phosphorylation levels from a panel of human, mammary epithelial cells. Here is an excerpt from author's own summary of the article.

"We constructed an integrated mathematical model of HER activation, and trafficking to quantitatively link receptor expression levels to dimerization and activation. We parameterized the model with a comprehensive set of HER phosphorylation and abundance data collected in a panel of human mammary epithelial cells expressing varying levels of EGFR/HER1, HER2 and HER3. Although parameter estimation yielded multiple solutions, predictions for dimer phosphorylation were in agreement with each other. We validated the model using experiments where pertuzumab was used to block HER2 dimerization. We used the model to predict HER dimerization and activation patterns in a panel of human mammary epithelial cells lines with known HER expression levels in response to stimulations with ligands EGF and HRG. Simulations over the range of expression levels seen in various cell lines indicate that: i) EGFR phosphorylation is driven by HER1-HER1 and HER1-HER2 dimers, and not HER1-HER3 dimers, ii) HER1-HER2 and HER2-HER3 dimers both contribute significantly to HER2 activation with the EGFR expression level determining the relative importance of these species, and iii) the HER2-HER3 dimer is largely responsible for HER3 activation. The model can be used to predict phosphorylated dimer levels for any given HER expression profile. This information in turn can be used to quantify the potencies of the various HER dimers, and can potentially inform personalized therapeutic approaches."

  © The Digital Biologist | All Rights Reserved

Shopping for innovation

The commercial life science sector is a challenging arena to say the least and not a place for the risk-averse. The freedom to try stuff and fail is a prequisite for success. Biological systems are inherently and astronomicallly complex and our attempts to modulate their behavior to clinically beneficial ends are always fraught with some degree of uncertainty. Managing the risk and complexity of the drug development process is arguably the major challenge faced by the commercial life science sector and this has been a recurring theme at The Digital Biologist – see for example A new prescription for the pharmaceutical industry, The truly staggering cost of inventing new drugs and A new report by "The Economist" laments the lack of innovation in the biopharma sector.

One of the interesting trends that I have highlighted in these discussions, is the increased emphasis among many larger companies, on licensing in new intellectual property from smaller specialized R&D firms, as opposed to developing it in-house. The most recent announcement by Merck-Serono, of its decision to effectively double its own venture investments in external biotechs, aligns well with this trend and is an example of an alternative kind of R&D outsourcing. Rather than re-investing this money in its own, in-house R&D, Merck Serono is instead, seeking to tap the rich vein of entrepreneurial innovation that exists amongst the smaller, more specialized biotech companies and startups in the Boston biotech cluster.

Such an approach potentially addresses a couple of the perennial problems that the larger life science companies face. Creating a genuinely entrepreneurial environment within large organizations is challenging, especially given their innately hieracrchical management structures. And while the risk associated with such venture-funded biotech investments is significant, it is arguably a more flexible and manageable risk for Merck Serono than expanding its own internal pipeline in such a challenging climate for commercial life science, with all of the concomitant overheads and commitments that this would entail.

At the heart of these changes, there seems to be something of a dilemma for the companies involved. While they would all like to enjoy the kind of kudos and prestige that come from doing cutting-edge research, at least some of them may end up having to accept that what they really do best (and which cannot easily be replicated by the smaller companies) are the long and resource-intensive steps from early clinical to market.

 © The Digital Biologist | All Rights Reserved

Biologists Flirt With Models

Figure-2 copyThis is an updated version of an article that I published in 2009. Alas in the 3 years or so that have passed since I wrote it, little seems to have changed beyond the fact that the crisis in the pharmaceutical industry has deepened to the point that even the biggest companies in the sector are starting to question whether their current business model is sustainable.


The enormous challenge posed by the complexity of biological systems represents a potential intellectual impasse to researchers and threatens to stall future progress in basic biology and healthcare. In recent years, increasing reliance on correlative approaches to biology has failed to resolve this situation. The burgeoning volumes of laboratory data gathered in support of these approaches pose more questions than they answer until such time as they can be assimilated as real knowledge. Modeling can provide the kind of intellectual frameworks needed to transform data into knowledge, yet very few modeling methodologies currently exist that are applicable to the large, complex systems of interest to biologists. Early developments in biological modeling, driven largely by the convergence in systems biology of complementary approaches from other disciplines, hold the potential to forge a knowledge revolution in biology and to bring modeling into the mainstream of biological research in the way that it is in other fields. This will require the development of modeling platforms that allow biological systems to be described using idioms familiar to biologists, that encourage experimentation and that leverage the connectivity of the internet for scientific collaboration and communication.

The challenge of biological complexity

Researchers in biology and the life sciences have yet to embrace modeling in the way that their peers in the fields of physics, chemistry and engineering have done. There are some very good reasons for this, not the least of which is that the systems of interest to biologists tend to be far more refractory to modeling than the systems that are studied in these other fields. It is much more straightforward, for example, to capture in lines of computer code the orbit of a satellite or the load on a suspension bridge than it is to build a computational model of even the simplest and most well-studied of biological organisms. At the molecular level the fundamental processes that occur in living systems can also be described in terms of physics and chemistry. However, at the more macroscopic scales at which these systems can be studied as “biological” entities, the researcher is confronted with an enormous number of moving parts, a web of interactions of astronomical complexity, significant heterogeneity between the many “copies” of the system and a degree of stochasticity that challenges any intuitive notion of how living systems function, let alone survive and thrive in hostile environments. Biology is, in a word, messy.

Complexity seems to be the mot du jour in biology right now and the field arguably stands at a crossroads from which significant future progress will largely depend upon the ability of its researchers to at least tame the monster of complexity, if not to master it. Driven by the rapid development of the technology available to scientists in the laboratory, the quantity, scope and resolution of biological data continues to grow at an accelerating pace. Invaluable though this burgeoning new wealth of data may be, it often poses more questions than it really answers until such time as it can be assimilated as real knowledge. The expectations for the Human Genome Project for example were as huge as the tsunami of data that it generated, spawning a whole new genomics industry within the life sciences sector and even moving some leaders in the field to make predictions about cures for cancer within a few years. Looking back over almost a decade since the first working draft of the human genome was completed – while there have certainly been some medical benefits from this work, I think it is fair to say that the impact has not been on anything like the scale that was initially anticipated, largely as a result of having underestimated how difficult it is to translate such a large and complex body of data into real knowledge. Even today, our understanding of the human genome remains far from complete and the research to fill the gaps in our knowledge continues apace. The as yet unfulfilled promise of genomics is also reflected in the fact that many of the biotechnology companies that were founded with the aim of commercializing its medical applications, have disappeared almost as rapidly as they arose. This is not to say that the Human Genome Project was in any way a failure – quite the opposite. In its stated goals to map out the human genome it succeeded admirably, even ahead of schedule, and the data that it generated will only become more valuable with time as we become better able to understand what it is telling us. The crucial lesson for biology however, is that as our capacity to make scientific observations and measurements grows, the need to deal with the complexity of the studied systems becomes more not less of an issue, requiring the concomitant development of the means by which to synthesize knowledge from the data. Real knowledge is much more than just data – it does not come solely from our ability to make measurements, but rather from the intellectual frameworks that we create to organize the data and to reason with them.

Models: Frameworks to turn data into knowledge

So what are these intellectual frameworks? Modeling is one of the most powerful and widely used forms of intellectual framework in science and engineering . There is a tendency to think of models only in terms of explicit modeling – the creation of tangible representations of real world objects such as a clay model of a car in a wind tunnel or a simulation of a chemical reaction on a computer for example. In actuality however, scientists of all persuasions are (and always have been) modelers, whether or not they recognize this fact or would actually apply the label to themselves. All scientific concepts are essentially implicit models since they are a description of things and not the things themselves. The advancement of science has been largely founded upon the relentless testing and improvement of these models, and their rejection in the case where they fail as consistent descriptions of the world that we observe through experimentation. As in other fields, implicit models are in fact already prevalent in biology and are applied in the daily research of even the most empirical of biologists. Every experimental biologist who constructs a plasmid or designs a DNA primer for a PCR reaction, is working from an implicit model of the gene as a dimer of self-describing polymers defined by a 4-letter alphabet of paired, complementary monomers. Interestingly, the double-helical structure of DNA published by Watson and Crick in 1953, remained essentially just a hypothetical model until it could actually be observed by x-ray crystallography in the 1970s, yet it earned them a Nobel prize a full decade earlier as a result of the profound and wide-ranging synthesis and integration of knowledge that it brought to the field.

It is no accident that the majority of the successes that explicit modeling approaches have enjoyed in biology tend to be confined to a rather limited set of circumstances in which already established modeling methodologies are applicable. One example is at the molecular level where the quantitative methods of physics and chemistry can be successfully applied to objects of relatively low complexity (by comparison with even a single living cell). Modeling approaches can also be successfully applied to biological systems that exhibit behavior that can be captured by the language of classical mathematics, as for example in the cyclic population dynamics of the interaction of a predatory species with its prey, expressed in the Lotka-Volterra equations. Unfortunately for the modern biologist, many if not most of the big questions in biology today deal with large, complex systems that do not lend themselves readily to the kind of modeling approaches just described. How do cells make decisions based upon the information processed in cell signaling networks? How does phenotype arise? How do co-expressing networks of genes affect one another? These are the kinds of questions for which the considerable expenditure of time, effort and resources to collect the relevant data typically stands in stark contrast to the relative paucity of models with which to organize and understand these data.

Correlation versus causation

One response to the challenges posed by biological complexity in both academia and the healthcare industry has been the pursuit of more correlative approaches to biology in which an attempt is made to sidestep the complexity issue altogether by focusing (at least initially) on the data alone. If the data are measured carefully enough and can be weighted and scaled meaningfully with respect to one another, parametric divergences that can be detected between similar biological systems under differing conditions may reveal important clues about the underlying biology as well as identify the critical components in the system. This is very much a discovery process that aims in effect to pinpoint the needle in the haystack from the outside, without all the mess and fuss of having to get in and pull the haystack apart. These phenomenological approaches have been in widespread use for some time in both academia and industry in the form of genomic and proteomic profiling, biomarker discovery and drug target identification, and have also given rise to a plethora of high-throughput screening methodologies.

While these correlative approaches have enjoyed some successes, they do tend to compound the central problem alluded to earlier of generating data without knowledge. Moreover, their limitations are now starting to become apparent. In recent years for example, steadily intensifying investment and activity in these areas by drug companies has seen the approval rate of new drugs continue to fall as levels of R&D investment soar. The field of biomarkers has also seen a similar stagnation, despite years of significant investment in correlative approaches. Cancer biomarkers are a prime example of this stagnation. Since we don’t yet have a good handle on the subtle chains of cause and effect that divert a cell down the path towards neoplasia, we are forced to wait until there are obvious alarm bells ringing, signaling that something has already gone horribly wrong. The ovarian cancer marker CA125 for example, only achieves any significant prognostic accuracy once the cancer has already progressed beyond the point at which therapeutic intervention would have had a good chance of being effective. To use an analogy from the behavioral sciences, broken glass and blood on the streets are the “markers” of a riot already in progress but what you really need for successful intervention are the early signs of unrest in the crowd before any real damage is done. The lack of new approaches has also created a situation in which many of the biomarkers in current use are years or even decades old and most of them have not been substantially improved upon since their discovery. Much more useful than the current PSA test for prostate cancer for example, would be a test with which a physician could confidently triage prostate cancer patients into groups of those who would probably require medical intervention and those whose disease is relatively quiescent and who would likely die of old age before their prostate cancer ever became a real health problem.

Given the general lack of useful mechanistic models or suitable intellectual frameworks for managing biological complexity, the tendency to fall back on phenomenology is easy to understand. Technology in the laboratory continues to advance, and the temptation to simply measure more data to try to get to where you need to be, grows ever stronger as the barriers to doing so get lower and lower. We should also not underestimate the bias in the private sector towards activities that generate data, since they are far more quantifiable in terms of milestones and deliverables than the kind of research needed to translate that data into knowledge; this also makes these data-generating activities much more amenable to the kinds of R&D workflow management models that are common in many large biotechnology and pharmaceutical organizations. In effect what we have witnessed in biology over the last decade or so is a secular movement away from approaches that deal with underlying causation, in favor of approaches that emphasize correlation. However, true to the famous universal law that there’s no such thing as a free lunch, the price to be paid for avoiding biological complexity in this way is a significant sacrifice with respect to knowledge about mechanism of action in the system being studied. Any disquieting feeling in the healthcare sector that it is probably a waste of time and money to simply invest more heavily in current approaches is perhaps the result of an uneasy acknowledgement that much of the low-hanging fruit has already been picked and that any significant future progress will depend upon a return to more mechanistic approaches to disease and medicine.

Modelling biological systems

Having arrived at such an intellectual impasse, there has arguably never been a more opportune time for biologists to incorporate modeling into their research. Biological modeling has to date tended to be almost exclusively the realm of theoretical biology, but as platforms for generating and testing hypotheses, models can also be an invaluable adjunct to experimental work. This is already well recognized in other fields where modeling is more established in the mainstream of research. Astronomers for example, use computer-based gravitational models to accurately position telescopes for observations of celestial objects. and civil engineers routinely test the load-bearing capacity of new structures using computers, before and during their construction, comparing the simulation results with experimental data to ensure that the limits of the structure’s design are well understood.

One misconception that is common amongst scientists who are relatively new to modeling is that models need to be complete to be useful. Many (arguably all) of the models that are currently accepted by the scientific community are incomplete to some degree or other, but even an incomplete model will often have great value as the best description that we have to date of the phenomena that it describes. Scientists have also learned to accept the incomplete and transient nature of such models since it is recognized that they provide a foundation upon which more accurate or even radically new (and hopefully better) models can be arrived at through the diligent application of the scientific method. Physicists understand this paradigm well. For example, the discrepancies that astronomers observe trying to accurately map the positions and motions of certain binary star systems to the standard Newtonian gravitational model, can actually provide the means to discover new planets orbiting distant stars. The divergence of the astronomical observations from the gravitational model can be used to predict not only the amount of mass that is unaccounted for (in this case, a missing planet), but also its position in the sky. Models therefore, can clearly have predictive value, even when they diverge from experimental observations and appear to be “wrong”.

For models to be successful in fulfilling their role as intellectual frameworks for the synthesis of new knowledge, it is essential that the chosen modeling system be transparent and flexible. If these requirements are satisfied, a partial model can still be of great value since the missing pieces of the model are a fertile breeding ground for new hypotheses and the model itself can even provide a framework for testing them. What is meant by transparent and flexible in this case? Transparency here refers to the ease with which the model can be read and understood by the modeler (or a collaborator). Flexibility is a measure of how easily the model can be modified, for example when new knowledge becomes available or existing knowledge turns out to be wrong. A model that is hard to read and understand is also difficult to modify and, very importantly in this age of interconnectedness, difficult to share with others. The importance of this last point cannot be overstated since one of the most often ignored and underestimated benefits of models is their utility as vehicles for collaboration and communication. A model that must be completely rewritten in order to add or remove a single component is not flexible and once built, there exists a significant barrier to modifying it that discourages experimentation. Flexibility also implies that the models should be capable of being built in a simple, stepwise fashion, based only upon what is known – free of any significant constraints on scope and resolution or any a priori hypotheses about how the assembled system will behave. The co-existence of these last two properties is essential for flexibility in a modeling system, since decisions about which components of a model to omit in order to keep it within workable limits (e.g. for the purposes of memory, storage or execution), must of necessity be predicated upon some pre-existing notions of how the assembled system will behave and which components will be critical for that behavior. It is interesting to note that biological models based upon classical mathematical approaches generally fall far short of these ideals with respect to both transparency and flexibility.

A cornerstone of any modeling approach should also be the recognition that biological systems are dynamic entities – in fact “biology” is essentially the term that we apply to the complex, dynamic behavior that results from the combinatorial expression of their myriad components. For this reason, models that can truly capture the “biology” of these complex systems are also going to need to be dynamic representations. This entails a big step beyond the computer-generated pathway maps that biologists working in cell signaling are already starting to use. These pathway maps are essentially static models in which the elements of causality and time are absent. They are a useful first step, but just like a street map, a pathway map does not provide any information about the actual flow of traffic. An ideal modeling platform would offer a “Play” button on such maps, allowing the biologist to set the system in motion and explore the its dynamic properties. Notwithstanding these requirements for transparency, flexibility and dynamics, the biology community in large part are unlikely to adopt modeling approaches that require them to become either mathematicians or computer scientists, irrespective of how much math and/or computer science there might be “under the hood”. How fewer cars would there be on our roads if all drivers were required to be mechanics? Ideally biologists will be able to couch their questions to these new frameworks in an idiom that approximates the descriptions of biological systems that they are used to working with – and in a similar fashion, receive results from them whose biological interpretation is clear. Finally, let us not forget that thanks to the internet, we live in an era of connectivity that offers hitherto unimaginable possibilities for communication and collaboration. The monster of biological complexity is in all likelihood, too huge to ever be tamed by any single research group or laboratory working in isolation and it is for this reason that collaboration will be key. With knowledge and data distributed widely throughout the global scientific community, a constellation of tiny pieces of a colossal puzzle resides in the hands of many individual researchers who now have the possibility to connect and to work together as never before, and to assemble a richer and more complete picture of the machinery of life than we have ever seen. Potentially huge opportunities might therefore be squandered if future modeling platforms are not purposefully designed around the connectivity and semantic web potential that the internet offers. It is interesting to reflect on a potential future in which the monster of biological complexity, the emergent property of a multitude of tiny interactions within living cells, is eventually tamed by an analogous emergent property of the semantic web – the concerted scientific efforts of an interconnected, global research community.

© The Digital Biologist | All Rights Reserved

Welcome to “The Digital Biologist”

copy-Symbol.003.pngWith new technology comes new ways of getting stuff done and sometimes, it also creates new career possibilities as well. Just as the invention of the moving picture gave rise to the cinematographer, the technological advances that are now making it possible to model and simulate complex living systems on a computer are giving rise to a brand new profession – that of The Digital Biologist. I number myself amongst the members of this new profession and although they might not identify themselves as such, there are actually already quite a few other digital biologists out there as well.

“But wait”, you might be tempted to say. “Haven’t people been already doing biology on computers for some time now?”, to which my answer would be “Not really”. Biology (with a big B) is the study of the remarkable properties of complex living systems that set them apart from all of the dead stuff of which our universe seems to be primarily composed – their ability to self-organize, to grow and reproduce, to adapt to their environments etc. etc. Until now most of the models that we have have used to describe biological systems have been essentially borrowed from the older and more established fields of physics and chemistry. In fact, although biologists often speak of relatively macroscopic concepts like reproduction and adaptation, the burgeoning wealth of data that they are producing in the laboratory is overwhelmingly physicochemical in nature. Such a detailed and microscopic description of living systems, while undoubtedly valuable, deals mostly with the kind of chemical processes that also take place in non-living systems, albeit in a far less structured and concerted manner. This data alone, captured at a scale an order of magnitude or two below that at which the actual “biology” of these systems becomes apparent, does not really describe their biology any better than a three-dimensional map of the neurons in a brain captures human consciousness.

The traditional computational models for biology borrowed from the fields of physics, chemistry and mathematics are most commonly applied at the same level of detail as the physicochemical data that is collected in the laboratory, which makes for a good fit between the modeling and the experimental work. Unfortunately however, they only allow the biologist to build either extremely detailed models of tiny portions of living systems or rather vague and low resolution models of more macroscopic regions. The convergence of ideas from the intersections of these traditional fields with computer science however, is starting to bear fruit in the form of computational modeling approaches that have been designed with the astronomical complexity of biological systems in mind and which can serve as intellectual frameworks capable of transforming this wealth of physicochemical data into real biological knowledge and insight.

For the very first time, the biologist – the digital biologist – is being presented with the opportunity to capture the essential properties and behaviors of a complex living system in a computational model that is as meaningful and relevant from a biological (with a big B) perspective as the physicist’s model of an atom or the engineer’s model of a suspension bridge.

Welcome to the age of the Digital Biologist!

© The Digital Biologist | All Rights Reserved