You are about to embark in a system biology project which will involve some modelling. Here are a few tips to make this adventure more productive and more pleasant.
1 – Think ahead
Do not start building the model without knowing where you are going. What do you want to achieve by building this model? Is it only a quick exercise, a one-off? Or do you want this model to become an important part of your current and future projects? Will the model evolve with your questions and the data you acquire? A model with a handful of variables, created to quickly explore an idea, and a model that will be parameterized with experimental measurements, whose predictions will be tested and that will be further expanded are two completely different beasts. Following the 9 tips below in the former case is
2- Focus on the biology
A good systems biology model aims at being anchored in biological knowledge, and even (generally) reflects the biological mechanisms underlying the behaviours of the system. We are using modelling to understand biology, and not using biology as an illustration of modelling techniques (which is a perfectly respectable activity, but not the focus of this blog post). In order to do so, the model must be built from the processes we want to represent (hence complying with the Minimum Information Requested in the Annotation of Models). Therefore, try to build up your model from reactions (or transitions if this is a Petri Net, rules for a Rule-based model, influences for a Logic model), rather than writing directly the equations controlling the evolution of variables.
Another aspect which is worth a thought is the existence of different “compartments”. In systems biology, compartments are the “spaces” that contain the biological entities represented by your variables (the word has a slightly different meaning in PKPD modeling, where it means the variable itself). Because compartments can have different sizes, that these sizes can change and can be used to affect other aspects of the models, it is important to represent them correctly, rather than ignoring them altogether, which was the case for decades.
Many tools have been developed to help you building models that way, such as (but absolutely not limited to) CellDesigner and the excellent COPASI. These software tools are in general very user-friendly and more approachable for biologists. A large list of tools is available from the SBML software guide.
3- Document as you build
Bookkeeping is a cornerstone of any professional activity, and lab notebooks are scientists’ best friends. Modeling is no exception. If you do not log why you created a variable or a reaction, what biological entities they represent, how you chose the initial values or the boundaries for
Several software tools, such as COPASI, provide means of annotating every single element of a model, either with free
4- Choose a consistent naming scheme
This sounds like a mundane concern. But it is not! The names of variables and parameters are the first layer of documentation (see tip 3). It also anchors your model in biology (tip 2). A naming scheme that is logical and consistent while easy to remember and use will also greatly facilitate future extensions of your model (tip 1). NB: we do not want to open a debate “identifiers versus accession number versus usable name” or the pros and cons of semantics in identifiers (see the paper by McMurry et al for a great discussion on that topic). Here, we are talking of the short names one sees in equations, model graphs,
Avoid very long names if not needed (“adenosine triphosphate”), but do not be over-parsimonious (“a”). “
5- Choose granularity and independent variables wisely
Two mistakes are often made when it comes to
The second, mirroring,
It is therefore paramount to choose the right level of granularity. There is no simple and universal solution, and extreme cases can be encountered. In d’Alcantara et al 2003, calmodulin is represented by two variables (total concentration and concentration of active molecules). In Stefan et al 2008, calmodulin is represented by 96 variables (all calcium binding combinations plus binding to other proteins and different structural conformations). However, both papers study the same question.
The right answer is to pick the variable granularity depending on the questions asked and the data available. A rule of thumb is to start with a small number of variables, that can be matched (directly or via mathematical transformations) with the quantities you have measurements for. Then you can progressively make your model more complex and expressive as
6- Create your relationships
Once you have defined your variables, you can create the necessary relationships, which are all the mathematical constructs that link variables and parameters together. Graphical software such as CellDesigner or GINsim permit to draw the diagrams representing the processes or the influences respectively.
Note that some software tool provide shorthand notations which permit to create variables and parameters directly when writing the reactions. This is very handy for creating small models instantly. However, I would refrain from doing so if you want to document your model properly (it also makes easier to create spurious variables and “dangling ends” through typos in the variable names).
Working on the relationships after defining the variables also permits to modify the model easily. You can add or remove a reaction without having to go through the entire model as you would with a list of ordinary differential equations.
7- Choose your math
The beauty of mathematical models is that you can explore a large diversity of possible linkages between molecular species, actual mechanisms hidden behind the “arrow” representing a process. A transformation of X in a compartment into Y in another compartment can be controlled for instance by a constant flux (don’t do that!), a passive diffusion, a rate-limited transport, or even exotic higher order kinetics. At that point we could write: [insert clone of tips 5 here]. Indeed, while the mathematical expressions you choose can be arbitrarily complex, the more parameters you have, the harder it will be to find proper values for them.
If the model is carefully designed, switching between kinetics should not be too difficult. A useful habit to take is to preferentially use global parameters (which scope is the entire model/module) rather than parameters defined for a given reaction/mathematical expression. Doing so will, of course, ease the use of the parameter in different expressions, but also facilitate the documentation and ease future model extensions, for instance where a parameter does no longer have a fixed value but is affected by other things happening in the model.
8- Plug holes and check for mistakes
Now that you have your shiny model, you need to make sure you did not forget to close a porthole that would sink it. Do you have rate-laws generating negative concentrations? Conversely, does your model generate umpteen amounts of certain molecules which are not consumed, resulting in preposterous concentrations? Software like COPASI have checks for this kind of things. In the example below, I created a reaction that consumes ATP to produce ADP and P, with a constant flux. This would result in infinite concentrations of ADP and infinitely negative concentrations of ATP. COPASI catches it, albeit returning a message that could be clearer.
Ideally, a model should be “homeostatic”. All molecular species should be produced and consumed. Pure “inputs” should be produced by creation/import reactions, while pure “outputs” should be consumed by degradation/export reactions. Simulating the model would not lead to any timecourse tending to either +∞ or -∞
9- Create output
“A picture is worth a thousand words”, and the impact of the results you obtained with such a nice will be greater if served in clear, attractive and expressive figures. Timecourses are useful. But they are not always the best way to present the key message. You want to show the effect of parameter values on molecular species’ steady-states? Try parameter scanning plots, and their derivatives, such as bifurcation plots. Try phase-portraits. Distributions of concentrations during stochastic simulations or after ensemble simulations can be represented with histograms. And why being limited to 2D-plots? Use 3D plots and surfaces instead, possibly in conjunction with interactive display (plot.ly …).
10- Save your work!
Finally, and this is quite important, save often and save all versions. Models are code, and code must be versioned. You never know when you will realize you made a mistake and will want to go back a few steps and start exploring a different direction. You certainly do not want to start all over again. Recent work explored ways of comparing model versions (see the works from the Waltemath group for instance). But we are still some way off the possibility of accurately “diff and merge” as it is done on text and programming code. The safest way is to save separately all the significant versions of a model.