More than accurate translations

By Nicolas Gambardella

[French version]

Can we deliver a more than accurate translation?

Delivering an accurate translation is the core mission for a language translator. Any professional translator should achieve this, and any failure to do so is tantamount to professional negligence. Accurate translation is also the gold standard on which to assess automated translation. However, should this not be considered as the minimum? If so, what is “more than accurate”?

To answer those questions, we must first define what we mean by accurate translation. To translate a text accurately, we must conserve the semantics of the source document. Firstly, we must convey the meaning of the words or expressions, within the context of sentences, paragraphs, and the entire text. In addition to choosing the right words, this includes respecting the correct spelling in the target language. Secondly, we must follow the rules of grammar and punctuation scrupulously. Following these two principles will provide an adequate translation useful in most contexts, and is sometimes achieved by machine translation based on AI, such as Google Translate or DeepL on simple non-technical texts. 

Is that sufficient? Can you expect more from a professional translator? Of course, you can. And you must!

An excellent translation is more than accurate. On top of conveying the meaning of the source, it should deliver the message as intended by its authors.

To do so, the translator must sometimes make decisions regarding the level of technicality to adopt. These choices are particularly important in the biomedical domain, where the granularity of concepts and their relationships differ between languages (although the translator will face them in most technical domains). For instance, there is not always a one-to-one mapping between the English and French descriptions of anatomical parts or symptoms. French doctors also tend to use more technical terms when talking to patients than British doctors. Therefore, to conserve the same impact, a given source document will have to be translated slightly differently if the intended audience is, e.g., a surgeon who is supposed to reproduce a procedure, a physician who needs to understand a condition, patients looking for information underpinning therapeutic decisions, or the general public. “Disease burden” should be translated into “charge de morbidité” in an epidemiological document, but probably into “impact de la maladie” in a marketing presentation.

Such technical choices rely on past expertise, which is why translators have specialities and why they become better with time like good wine. But they also emerge from dedicated research, conducted for each translation project.  A good example is the translation of safety data sheets (the document describing the characteristics, possible health effects and precautions to be taken with a chemical compound or a drug). Both the headings and the contents are coded and country-specific. Knowledge of both languages will be sufficient to communicate the meaning of the text, but the result of the translation will not be a valid document. To do this, one must read the specifications of such safety data sheets both in the source and target languages. This is one of the areas where human translation cannot yet, probably for a while, be replaced by machine translation.

The meaning of words, the semantics, is not the only factor to take into account when polishing a translation, though. The tone of the text and the specific dialect to use (whether actual language or specialist circle’s jargon) will also strongly affect the delivery of a message. Depending on the type of document, the length of sentences, the rhythm, and the punctuation might need tuning to reach the target population. The aesthetic of a text, its general catchiness, is a cornerstone of marketing. And so, whether one translates brochures, websites, or… research publications and grant applications!

Finally, the cherry on the cake, which differentiates perhaps a specialist linguist from a mere translator, is the correction of the source document. This move is something that must be done tactfully, and perhaps solely after a translator and client have established some level of trust. Such corrections might be of proofreading nature (corrections of typos) or more profound, including factual corrections or advice on delivery. 

All this will contribute to a more than accurate translation. And all this is, currently and for the foreseeable future, out of reach of the most advanced Machine Translation approaches. 

Pourquoi utiliser un test détectant 90 % des cas peut parfois être du Pile ou Face

Par Nicolas Gambardella

[Version anglaise]

Les tests sont au cœur de la plupart, sinon de la totalité, des stratégies proposées pour lutter contre la pandémie de Covid-19. La famille d’approches « identifier et éliminer » repose sur l’identification des cas de personnes infectées par le virus du SRAS-CoV-2 et sur leur isolement ou leur traitement. La famille d’approches « acquérir une immunité » repose sur l’identification des personnes qui ont été infectées par le passé et qui sont maintenant immunisées contre la maladie, afin de pouvoir les libérer. Enfin, les stratégies de dépistage influent également sur l’estimation de la létalité de cette maladie (voir remarque à la fin de ce billet).

Au moment où j’écris ces lignes (13 avril 2020), le gouvernement britannique vient de rejeter tous les tests d’anticorps sanguins qu’il a testés, c’est-à-dire les tests qui identifient les personnes ayant été en contact avec le virus dans le passé, et supposées être immunisées. Au même moment, on peut lire de nombreux rapports de « tests peu fiables », ne détectant « qu’un tiers des cas ». Comment se fait-il que des professionnels aient conçu des tests si « mauvais » ? Quelle doit être la qualité d’un test pour qu’il soit utile ? Et pourquoi un test qui repère correctement 90 % des personnes infectées ne vaut-il pas mieux qu’un pile ou face pour dire si vous êtes réellement infecté ou non ?

Allons droit au but afin que vous puissiez arrêter de lire et reprendre des activités de confinement plus agréables, si vous le souhaitez. Puis nous introduirons les maths.

Si nous disposons d’un test qui identifie correctement 90 % des personnes infectées (une sensibilité de 90 %), et qui signale correctement comme négatif 90 % des personnes non infectées (une spécificité de 90 %), mais qu’en même temps 90 % de l’ensemble de la population n’a jamais été infectée (une prévalence de 10 %), et que nous testons ensuite un échantillon aléatoire de cette population, nous obtiendrons la même quantité de vrais et faux positifs. En d’autres termes, si vous êtes testé positif, les chances que vous soyez réellement immunisé sont… 50 % ! Vous pouvez facilement comprendre ça avec l’image suivante.

Le fond bleu pâle représente la population qui n’a pas été infectée, tandis que le fond rose pâle représente la population qui a été infectée (la prévalence). Le test des personnes roses est positif, tandis que celui des personnes bleues est négatif. Comme vous pouvez le voir, il y a le même nombre de personnes roses (9) sur les fonds rose pâle et bleu pâle. Oui, le test est positif pour 9 personnes infectées sur 10, alors qu’il n’est positif que pour 1 personne non infectée sur 10. Mais il y a 9 personnes non infectées pour chaque personne infectée, ce qui fait pencher la balance dans l’autre sens.

Ce n’était qu’un exemple, simplifié puisque j’ai supposé des sensibilité et spécificité égales. Pour un test détectant la présence de quelque chose, la sensibilité serait généralement inférieure à la spécificité (manquer quelque chose sera plus probable que signaler quelque chose qui n’est pas là). Par ailleurs, comment les chiffres changent-ils lorsque nous modifions la prévalence, c’est-à-dire la proportion de la population qui a été infectée ? Venons-en aux maths.

Le calcul est basé sur le théorème de Bayes, du nom du révérend Thomas Bayes. Ce billet ne porte pas sur le théorème lui-même, sa signification ou sa démonstration. Si vous souhaitez en savoir plus, la chaîne YouTube 3Blue1Brown propose d’excellentes vidéos sur le sujet (en anglais) :

The quick proof of Bayes’ theorem

Bayes theorem

Pour aujourd’hui, acceptez juste l’affirmation suivante :

Vos chances d’être vraiment infecté si votre test est positif sont égales aux chances d’être infecté multipliées par les chances d’obtenir un test positif si vous être infecté, proportionnellement à la population dont le test est revenu positif (que les personnes aient été infectées ou non).

En mathématiques, on écrirait (P(X) étant la « probabilité de X » et la barre verticale « | » représentant une probabilité conditionnelle, à savoir la probabilité que le terme de gauche soit vrai si le terme de droite l’est) :

P(Infecté | Positif) = P(Infecté) x P(Positif | Infecté) / P(Positif)

Cette équation, le théorème de Bayes, vient du fait que :
P(Positif) x P(Infecté | Positif) = P(Infecté) x P(Positif | Infecté)
C’est évident si l’on considère l’image ci-dessous. Qu’on dessine le cercle de gauche d’abord, puis celui de droite ou le contraire, on obtient la même intersection.

Le dénominateur, P(Positif), représentant toutes les personnes testées positives, est la somme des personnes ayant correctement été testées positives après infection et de celles ayant incorrectement été testées positives alors qu’elles n’étaient pas infectées :

P(Positif) = P(Infecté) x P(Positif | Infecté) + P(NonInfecté) x P(Positif | NonInfecté)

Cette probabilité, P(Infecté | Positif), est particulièrement importante dans le cas des tests d’anticorps. Personne ne veut dire à une personne qu’elle est immunisée si elle ne l’est pas !

De la même façon, nous pouvons calculer les chances que quelqu’un ayant un test négatif ne soit effectivement pas infecté. Ceci est très important au début de l’épidémie, lorsque l’on veut éviter la propagation de la maladie par les gens infectés.

P(NonInfecté | Négatif) = P(NonInfecté) x P(Négatif | NonInfecté) / P(Négatif)

Le dénominateur, P(Négatif), représentant toutes les personnes testées négatives, est la somme des personnes ayant correctement été testées négatives en n’étant pas infectées et de celles ayant incorrectement été testées négatives bien qu’étant infectées :

P(Négatif) = P(NonInfecté) x P(Négatif | NonInfecté) + P(Infecté) x P(Négatif | Infecté)

Voyons ce que l’on obtient avec des valeurs numériques. Nous avons trois paramètres et leurs compléments. Disons que nous avons une maladie affectant 5 % de la population (la prévalence).
P(Infecté) = 0,05
P(NonInfecté) = 0,95

80 % des personnes infectées sont reconnues par le test (la sensibilité).
P(Positif | Infecté) = 0,8
P(Négatif | Infecté) = 0,2

95 % des personnes qui ne sont pas infectées ne présentent pas de test positif (la spécificité).
P(Négative | NonInfecté) = 0,95
P(Positif | NonInfecté) = 0,05

Alors, si vous êtes testé positif, quelles sont les chances que vous soyez vraiment immunisé ?

0,05 x 0,8 / (0,05 x 0,8 + 0,95 x 0,05) = 0,457

46 % ! En d’autres termes, il y 54 % de chances que vous ne soyez pas immunisé, bien que votre test soit positif… De la même manière, si votre test est négatif, les chances que vous soyez infectés sont de 0,2 %. Cela paraît négligeable, mais cela peut être suffisant pour laisser sortir un patient infectieux. De plus, ce chiffre augmente avec la prévalence. De combien ? Le graphique ci-dessous décrit l’évolution des probabilités d’être correctement testé positif ou négatif alors que la proportion de la population infectée augmente.

C’est plutôt déprimant. Une façon d’améliorer les résultats est évidemment de disposer de meilleurs tests. Cependant, le « retour sur investissement » s’amenuise à mesure que la qualité des tests s’améliore. Une autre solution, consiste à multiplier les tests, si possible avec des tests différents. C’est, par exemple, la base du test combiné pour la trisomie 21. Je vous laisse calculer les probabilités dans le cas de deux tests fournissant des résultats identiques.

Une remarque sur la létalité de Covid-19

Pourquoi ai-je écrit plus haut que la précision des tests était pertinente pour estimer la létalité de la maladie ? Vous trouverez ci-dessous un graphique du rapport entre le nombre de décès par nombre de cas et le nombre de tests par million de personnes, pour tous les pays ayant déclaré au moins un décès et au moins dix tests (données du 10 avril 2020).

Il est assez clair qu’il existe une corrélation. Plus les tests sont nombreux, plus le nombre de décès estimé est faible. Cela montre que nous surestimons probablement la létalité de la maladie, et que nous sous-estimons sa prévalence (et donc son infectiosité). Que ce résultat soit exact ou non, la capacité à déduire correctement le nombre réel de personnes infectées ou immunisées est assez cruciale. En outre, la sensibilité et la spécificité des tests utilisés par les différents pays doivent être prises en compte lors de l’estimation de la prévalence et du taux de létalité.

Why using a test that detects 90% of cases can be no better than the flip of a coin?

By Nicolas Gambardella

[French version]

Testing is at the core of most, if not all, strategies proposed to fight the Covid-19 pandemic. The “identify and squash” family of approaches relies on identifying cases of people infected by the SARS-CoV-2 virus and isolate and/or treat them. The “get immune” family of approaches relies on identifying people who were infected in the past, and are now immune to the disease, so we can release them. Finally, testing strategies also affect the estimation of how lethal this disease is (see note at the end).

As I write this post (13 April 2020), the UK government just rejected all the blood antibody tests it assessed, the tests that identify people who were in contact with the virus in the past, and supposedly immune. In a similar vein, we can see many reports of “unreliable tests”, catching “only one-third of the cases”. How come professionals designed such “bad” tests? How good a test must be to be useful? And why is a test that correctly spots 90% of infected people not better than the flip of a coin at telling if you are actually infected or not?

There is a short and a long answer. I will give the short one first, so you can stop reading and go back to more enjoyable confinement activities if you so wish.

If we have a test that correctly identifies 90% of the people who were infected (a sensitivity of 90%), and correctly reports as negative 90% of people who were not infected (a specificity of 90%), but at the same time 90% of the whole population was never infected (a prevalence of 10%), and then we test a random sample of this population, we will get the same amount of true and false positive. In other words, if you are tested positive, the chances that you are actually immune is… 50%! You can easily grasp that on the picture below.

The light blue background represents the population that has not been infected while the light pink background represents the population that has been infected (the prevalence). The blue people are tested negative, while the pink people are tested positive. As you can see, we get the same amount of pink people (9) on light pink and light blue backgrounds. Yes, the test comes back positive 9 out of 10 infected people, while it comes back positive only 1 out of 10 non-infected people. But there are 9 non-infected people for each infected one, which tips the balance the other way.

Now, that was just one example, simplified since I assumed equal sensitivity and specificity. For a test detecting the presence of something, sensitivity would typically be lower than specificity (missing something will be more probable than reporting something that is not there). Also, how do the figures change when we change the prevalence, that is the proportion of the population that got infected? Let’s get to the actual calculations.

The basis for such calculus is the Bayes’ theorem, named after the Reverend Thomas Bayes. This post is not about the theorem itself, its meaning or its demonstration. If you are interested to know more, the YouTube channel 3Blue1Brown provides excellent videos on the topic:

The quick proof of Bayes’ theorem

Bayes theorem

For our purpose, you just have to accept the following statement:

Your chances to be actually infected if you tested positive are equal to the chances to be infected in the first place multiplied by the chances of testing positive if actually infected, scaled to the size of the population that tested positive (whether actually infected or not).

In mathematical terms, we would write:
(P(X) means “Probability of X”, the vertical bar “|” represents a conditional probability, the probability that what is on the left side is true given that what is on the right side is true)

P(Infected | Positive) = P(Infected) x P(Positive | Infected) / P(Positive)

This equation, Bayes’ theorem, comes from the fact that:
P(Positive) x P(Infected | Positive) = P(Infected) x P(Positive | Infected)
This is obvious from the image below. Whether you draw the left circle first, then the second, or the other way around, the overlapping surface is still the same.

The denominator, P(Positive), representing all people who tested positive, is the sum of the people who rightly tested positive while being infected and the people who wrongly tested positive while not being infected:

P(Positive) = P(Infected) x P(Positive | Infected) + P(NotInfected) x P(Positive | NotInfected)

This probability, P(Infected | Positive), is particularly important in the cases of antibody tests. We do not want to tell a person they are immune if they are not!

Similarly, we can compute the chances that someone who tested negative is actually not infected. That is very important at the beginning of the epidemics when we want to stop infected people to spread the disease.

P(NotInfected | Negative) = P(NotInfected) x P(Negative | NotInfected) / P(Negative)

The denominator, P(Negative), representing all people who tested negative, is the sum of the people who rightly tested negative while not being infected and the people who wrongly tested negative while in fact being infected:

P(Negative) = P(NotInfected) x P(Negative | NotInfected) + P(Infected) x P(Negative | Infected)

Let’s see what we get with actual values. We have three parameters and their complement. Let’s say we have a disease affecting 5% of the population (the prevalence).
P(Infected) = 0.05
P(NotInfected) = 0.95

80% of infected people are caught by the test (its sensitivity).
P(Positive | Infected) = 0.8
P(Negative | Infected) = 0.2

95% of the people who are not infected will not be tested positive (the specificity).
P(Negative | NotInfected) = 0.95
P(Positive | NotInfected) = 0.05.

Now, if you are tested positive, what are the chances you are actually immune?

0.05 x 0.8 / (0.05 x 0.8 + 0.95 x 0.05) = 0.457

46%! In other words, there are 54% chances that you are not actually immune despite being labeled as such by the test… Conversely, if you are tested negative, the chances that you are actually infected are 0.2%. The number looks pretty small, but this can be sufficient to “leak” an infectious patient outside. And this number grows as the prevalence does. How much? The plot below depicts the evolution of probabilities to be correctly tested positive and negative when the proportion of the infected population increases.

That looks pretty grim, doesn’t it? One way of improving the results is obviously to have better tests. However, the “return on investments” becomes increasingly limited as the quality of tests improves. Another solution, lies in multiple testing, if possible with different tests. This is, for instance, the basis of combined test for Down’s Syndrome. I will let you work out the math if you get twice the same result with two independent tests.

Note about Covid-19’s lethality

Why did I write above that the accuracy of testing was relevant for estimating the lethality of the disease (the Infection Fatality Rate, IFR)? Below is a plot of the ratio number of deaths per number of cases towards the number of tests per million people, for all countries that reported at least 1 death and at least 10 tests (data from 10 April 2020).

It is pretty clear that there is a correlation, the more tests being done, the lower the estimated fatality. This shows that we probably overestimate the lethality of the disease, and underestimate its prevalence (and therefore its infectiosity). Whether this result is accurate or not, the ability to correctly infer the actual number of people infected and/or immune is pretty crucial. Moreover, the sensitivity and specificity of the tests used by different countries should be taken into account when estimating prevalence and fatality rate.

Tips for translating a novel

By Nicolas Gambardella

In a previous blog post, I already covered a few tips for new translators. These, of course, apply to the translation of any text document. At aSciStance, I specialize in technical documents, in particular from the health and life sciences sectors. However, I have a secret life. In the evenings, I translate sci-fi novels. Besides the rules described before, there are a few do and don’t that apply when translating a novel. Here are some, in no particular order.

The most important thing when translating a novel (and presumably writing it in the first place) is to keep the reader enthralled. This generally requires an easy and smooth reading (I will put Lovecraft and Joyce aside…). As a result, the form becomes very important, and you should not necessarily need to stick 100% to the source. A word for word translation will be close to unreadable. Moreover, sentence segmentation tends to vary between languages. Therefore, some splitting and merging will be unavoidable. If the translation of a long proposition with many adjectives results in a boring or confusing piece of text, do not hesitate to replace it with a terse and punchy alternative. Conversely, depending on the source and target language, you might want to expand a single word in a lengthier piece of text. Such an expansion might also be needed if a piece of information is common knowledge in the population using the source language but not the population reading the target one (for instance, historical events or monuments).

You should therefore not hesitate to “find your voice”. The actual story is obviously paramount. However, the rhythm, the tone of the dialogs, the level of language, all participate in telling this story. These will change between languages. When I translated The Night of the Purple Moon, I chose to define three different levels of languages for three different groups of teenagers. The main protagonists were brought up in an upper-middle-class setting, where the father was a librarian. Their language is correct, but not too posh. While to distinguish between some of the unruly boys, I used a more familiar language register, even a bit of slang (although profanities were a no-no). On the opposite, a couple of children were foreigners, coming from a country with different levels of deference. They learned English in books and make use of a very polite, slightly old fashion, language (for instance, calling their parents “mother and father” rather than “mom and dad”).

That said, each novel possesses specific rhythm, tone, terminology, and “feeling”. Sometimes they are part of an author’s trademark and should be respected as much as possible. Lovecraft’s stories would have a very different impact if his “wholly abominable and unspeakable horrors” had been translated into smooth and easy to read pieces. If you choose to change some of those characteristics, make sure to be consistent throughout.

Immerse yourself into the novel’s universe. What counts in a story is self-coherence, not accuracy. Particularly if you are translating a science-fiction novel, like NOPM and have, like me, a biomedical background, you should not be offended that bacteria or viruses can survive the cold void of space – and the constant radiation – or that they can kill a human by recognizing sex hormones they never encountered before. After all, in science-fiction, there is the word fiction…
Do not try to be too exact either. In an imaginary setting, translate 100 miles into 100 km, rather than 160.934 km. It just means “quite a long distance”. Except of course, if this distance is important for the story. As any ultra-marathon runner will tell you, having to travel 100 km or 100 miles are two very different endeavors.

However, do not hesitate to correct the factual errors the author could have committed which you think could bother some readers. Obviously, only do so with the author’s permission. I will not list the instances where I did that in NOPM (you will have to read the English and French versions). But sometimes such as correction can kill two birds with one stone. In NOPM, the source mentioned that the germs were coming with the space dust. Space dust was not very clear to me. If we were talking about cosmic dust, it is a bit too thin to contain germs. Moreover, “poussière de l’espace” sounds a bit childish in French. However, comet dust fits well with the story and sounds better in French, as “poussière de comète” (although to be fair “poussière cosmique” sounds even cooler! #NoteForFutureTranslations)

Try to be consistent but not repetitive. If a certain item is always referred by a certain name in the source, try to always use the same term in the translation. In NOPM, the organisms that killed humans are always called “germs”. I chose to use “microbes” and stuck to it. I did not use “germes” or “bacteries”. Similarly, in Colony East (the follow-up from NOPM which I am translating as I am writing this blog post), I chose to translate “pills” by “comprimés”, and I do not use “pilules” or “médicament”. Such a consistency facilitate the reading, in particular for younger readers.

However, use such a consistency sparingly when it comes to entire expressions. It is sometimes quite annoying to find exactly the same description, or the same bit of dialog, several times. This is in particular true if the occurrences are in the same chapter. This problem is increased by Translation Memory-based CAT tools, and you should be cautious when using such tools (which I do. I use Cafetran Espresso).

Which brings me to the final pieces of advice. Now that you have translated your text, check it, check it, check it.

1) Use a Machine Translation engine (such as DeepL) to reverse translate your work. Are there inconsistencies between the result and the original text? Does that reveal a potential for confusion in the reader’s mind?

2) Proofread your work with dedicated software. I use three of them at the moment, Grammarly, Grammalecte, and LanguageTool. Yes, you are a fantastic linguist. But even the keenest eye might miss the occasional typo or doublet.

3) Read back every chapter after completing the translation. Read them aloud. Reading a piece of text aloud forces you to slow down, be more attentive to every word, and better detect subtle grammatical errors.

Here are a few links to other relevant web pages. Please feel free to suggest others in your comments.

Renormalizing data with Arcsinh instead of log

By Nicolas Gambardella

Do you need to quickly normalize data but are bothered by null of negative values? You can use the Inverse hyperbolic sine, Arsinh, function instead of a simple log function. This approach also allows for treating differently small and high values. Arsinh is defined as:

Firstly, since x+sqrt(x²+1) is always strictly positive, arsinh is defined for all real values, contrary to log which is only defined for strictly positive numbers. Furthermore, as can easily be seen, for small values of x, the function tends to ln(x+1), something often used to overcome the 0 measurements. For large values of x, arsinh(x) progresses as log(x).

Let’s say we have a dataset that is quite noisy, with unevenly spread sampling, and that includes an unwanted baseline. Here is a made-up dataset:

To create it, we generated 1000 lognormal-distributed sampling values x. The variable value is equal to the sampling value, plus a random noise in which the standard deviation varies as the ratio of sqrt(x)/x (biological noise), plus a noisy constant technical baseline (5 plus a normal noise with SD=0.01) .

We are clever, and notice the background noise, so we subtract it:

Now, the first issue is that plenty of values are negative. In some cases, your normalization will fail. Sometimes, the normalization will proceed, ditching values (as R says, “Warning message: NaNs produced”). As can be seen below, there is a large area sparsely populated on the left, for low values of x.

If, on the contrary, we use arsinh, we rescue all those values.


Arsinh is used for instance in flow cytometry and in mass spectrometry. It is one of the corrections used by the R package BestNormalize

Merry Christmas in all languages

By Nicolas Gambardella

It is that time of the year again. Although the “modern” celebrations are supposed to be of relevance for Christian only, this is really a celebration of the solstice (whether the summer or the winter one depending on the side of the globe you are living in). So let’s wish each other a merry day.

Several web pages list ways of saying Merry Christmas in many languages. However, those pages are generally incomplete, sometimes incorrect (I loved the one that wrote “Martha snores” instead of Merry Christmas in a minority language). Here, I list all the Merry Christmas I could collect, with links to the respective Wikipedia pages. I tried to present them in the original script as well as the latin one, although some are hard to write in Unicode.
Please contact me if you disagree with one of my choices, or if you are aware of a missing entry. The entries I consider validated (using various sources) are in boldface.

Acholi
Uganda
Krismasi mkunjufu

Afrikaans
South Africa, Namibia
Geseënde Kersfees

Ahtna (central)
Alaska
C’ehwggelnen Dzaen

Akan
Ghana, Ivory Coast, Benin
Afishapa

Albanian
Albania, Kosovo
Gëzuar Krishtlindjet

Aleut
Native American, Alaska
Kamgan Ukudigaa

Alsatian
France
E güeti Wïnâchte

Alutiiq/Sugpiak
Native American
Nunaniqsaakici Aʀusistuami

Amharic
Ethiopia
Melikam Gena
መልካም ገና

Apachean
Apache, Navajo
Gozhqq Késhmish

Arabic (yet to be developed into different forms)
Eid Milad Majid
عيد ميلاد مجيد

Aragonese
Spain
Feliz nadal

Armenian
Armenia
Shnorhavor Surb Tsnund
Շնորհավոր Սուրբ Ծնունդ

Aromanian
Balkans
Cărciun hărios

Ashanti
Ghana
afehyia pa

Assamese
India (Assam)
meri khristmas
মেৰী খ্রীষ্টমাচ

Asturian
Spain
Bones Navidae

Astur-leonese
Spain
Felís Ñavidá

Aymara
Peru, Bolivia, Chile
Sooma nawira-ra

Azerbaijani
Azerbaijan
Milad bayramınız mübarək

Bambara/Bamanankan
Mali
Ala ka Noeli diya(?)

Basque
France, Spain
Eguberri on

Batak-Karo
Indonesia
Selamat wari Nata

Bavarian
Germany
Šene Veinåhd

Belarusian
Belarus
z Kaljádami
З Калядамі

Bemba
Zambia, Dem Rep Congo, Tanzania, Bostwana
Kristu abe nenu muli ino nshiku nkulu ya Mwezi

Bengali
Bangladesh, India
shubho bôṛodin
শুভ বড়দিন

Srećan Božić Berber (orig alph is neo-Tifinagh)
MENA
Tameghra tameggazt
ⵜⴰⵎⴻⵖⵔⴰ ⵜⴰⵎⴻⴳⴳⴰⵣⵜ

Bikol/Bicolano
Philippines
Maogmang Pasko

Bildts
Netherlands
Noflike Korsttydsdagen

Bislama
Vanuatu
Mi wisim yufala eerywan one gutfala Krismes

Blackfoot
Native American
I’Taamomohkatoyiiksistsikomi
ᖱᒣᖳᒐᒉᑊᖿᒪᔪᖱᖽᐧᒡᒧᐧᖾᒍ

Bosnian
Bosnia & Herzegovina
Srećan Božić

Breton
France
Nedeleg laouen

Bulgarian
Bulgaria
Vesela Koleda
Весела Коледа

Burmese
Myanmar
pyawshwinsaw hkarahchcamaat hpyitparhcay
ပျော်ရွှင်သောခရစ္စမတ်ဖြစ်ပါစေ

Cantonese
China
Seng Dan Fai Lok
聖誕快樂

Cape Verdean
Cap Verde
Boas Festas

Catalan
Andorra, Spain
Bon Nadal

Cebuano
Philippines
Maayong pasko

Celtic
Wales
Nadolig Llawen

Central Dusun (Bunduliwan)
Malaysia
C’ehwggelnen Dzaen

Chabacano/Chavacano
Philippines
Feliz Pascuas

Chamorro
Guam, Marianas
Felis Nåbidåt

Chechen
Chechen republic
Kerlaču şarca
Керлачу шарца

Cherokee
Native American US
ulihelisdi danisdayohihv
ᎤᎵᎮᎵᏍᏗ ᏓᏂᏍᏓᏲᎯᎲ

Chewa/Chichewa/Nyanja
Zambia, Malawi, Mozambique, Zimbabwe
Khrisimasi yabwino

Cheyenne
Native American, US
Hoesenestotse

Choctaw
Native American, US
Yukpa, Nitak Hollo Chito

Chuukese
Caroline Islands
Neekirissimas annim

Coastal Kadazan
Malaysia
Kotobian tadau Krismas

Comanche
Native American, US
Tsaa Nʉʉsukatʉ̱ Waa Himarʉ

Coptic
Egypt
Picristos afmansf
Ⲡⲓⲭⲣⲓⲥⲧⲟⲥ ⲁⲫⲙⲁⲛⲥⲫ

Cornish
United Kingdom
Nadelik Lowen

Cree
Native American, US
Mitho Makosi Kesikansi
ᒥᑐ ᒪᑯᓯ ᑫᓯᑲᓐᓯ

Corsican
France
Bon Natale

Creek/Muscogee
Native American, US
Afvcke Nettvcakorakko

Crimean Tatar
Crimea
Yañı yılıñız hayırlı olsun
Янъы йылынъыз хайырлы олсун

Croatian
Croatia
Sretan Božić

Czech
Czech Republic
Veselé Vánoce

Cuyonon
Philippines
Malipayeng Paskoa

Dagbani
Ghana
Ni ti Burunya Chou

Danish
Denmark, Germany
Glædelig Jul

Dari/Farsi
Afghanistan
Christmas Mubarak
کرسمس مبارک

Dutch
Belgium, Netherlands
Vrolijk Kerstfeest

Elfdalian
Sweden
Guäd Juäld

Edo
Nigeria
Iselogbe

Emilian-romagnol
Italy
Bon Nadèl

English
United Kingdom, USA
Merry Christmas

Erzya
Russia
Od ije dy Roştova marto
Од ие ды Роштова марто

Esperanto
N/A
Feliĉan Kristnaskon

Estonian
Estonia
Häid jõule

Ewe
Ghana, Togo
Blunya na wo

Extremaduran
Spain
Felís Naviá

Faroese
Denmark
Gleðilig Jól

Fijian
Fidji
Marau na Kerisimasi

Filipino
Philippines
Maligayang Pasko

Flemish
Belgium
vroolek kerstfeejst
vroolek kerstfeejst

Finnish
Finland
Hyvää Joulua

French
France, Monaco, Belgium, Switzerland, Canada, Africa
Joyeux Noël

Frisian,Frysk,West-Frisian
Netherlands
Noflike Krystdagen

Friulian
Italy
Bon Nadâl

Fula/Fulani
Niger, Nigeria, Benin, Cameroon, Chad, Sudan, Togo, Guinea, Sierra Leone
Jabbama be salla Kirismati

Galician
Spain
Bo Nadal

Gallo
France
Bon Nouao

Garhwali
India
जसीलो क्रिसमस र जसीलो नै विरबै

Garifuna
Caribbean
Buiti fedu

Gascon
France
Gaujos Nadau

Georgian
Georgia
šobas gilocavt
შობას გილოცავთ

German
Austria, Germany, Liechtenstein, Switzerland
Fröhliche Weihnachten

Gitxsan
Canada
Hisgusgitxwsim Ha’niisgats Christ ganhl Ama Sii K’uuhl

Gothic
Faha weiha naht
𐍆𐌰𐌷𐌰 𐍅𐌴𐌹𐌷𐌰 𐌽𐌰𐌷𐍄

Greek
Greece, Cyprus
Kalá hristúyenna
Καλά Χριστούγεννα

Greenlandic
Greenland
Juullimi pilluarit

Guaraní
Paraguay
Avyaitete ahï ko Tupa ray árape qyraï Yy Kapyryin rira

Guarayu
Paraguay
Imboeteipri tasecoi Tupa i vave

Guinea-Bissau Creole
Guinea-Bissau, Senegal, Gambia
Imboeteipri tasecoi Tupa i vave

Gujarati
India Ānandī nātāla
આનંદી નાતાલ

Gwichʼin
Alaska
Drin tsal zhit shoh ohlii

Haitian
Creole Haiti
Jwaye Nowèl

Hausa
Niger, Nigeria, Ghana, Benin, Cameroon, Ivory Coast, Togo
Barka da Kirsimatikuma

Hawaiian
Hawaii
Mele Kalikimaka

Hebrew
Israel
Chag molad sameach
חג מולד שמח

Hiligaynon/Ilonggo
Philippines
Malipayon nga Paskwa

Hindi
India
śubh krismas
शुभ क्रिस्मस

Hmong
China
Nyob zoo hnub yug Yesxus

Hungarian
Hungary
Boldog Karácsonyt

Iban
Malaysia, Indonesia, Brunei
Selamat Hari Krismas

Ibidio
Nigeria
Idara ukapade isua

Icelandic
Iceland
Gleðileg jól

Igbo
Nigeria
E keresimesi Oma

Ilocano
Philippines
Naragsak Nga Pasku

Indonesian
Indonesia
Selamat Natal

Inupiaq
Alaska
Quvianaq Agaayuniqpak

Inuktitut
Alaska
Kuvianak Inovia
ᑯᕕᐊᓇᒃ ᐃᓄᕕᐊ

Irish
Ireland
Nollaig Shona

Iroquoian
Canada
Ojenyunyat Sungwiyadeson homungradon nagwutut & Ojenyunyat osrasay

Italian
Italy
Buon Natale

Jamaican
Jamaica
Merri crissmus

Jämtlandic
Sweden
Gojuln

Japanese
Japan
Meri Kurisumasu
メリークリスマス

Javanese
Indonesia
Sugeng Natal
ꦱꦸꦒꦼꦁꦫꦶꦪꦪꦤꦠꦭ꧀ꦭꦤ꧀ꦮꦂꦱꦲꦼꦁꦒꦭ꧀ ꦱ꧀ꦭꦩꦼꦠ꧀ꦤꦠꦭ꧀ꦭꦤ꧀ꦠꦲꦸꦤ꧀ꦲꦚꦂ

Jèrriais
Jersey
Bouan Noué

Judaeo-Spanish/Ladino
Israel
Noel alegre i felis anyo muevo
נויל אליגרי אי פ׳יליס אנייו

Jula/Dyula/Dioula
Burkina Faso
la ye Nowɛli diya

Jingpho
Myanmar
Ngwi pyaw ai X’mas rai u ga

Kalmyk
Russia
Tsagaan Sar ölzätä boltxa
Цаһан Сар өлзәтә болтха

Kannada
India
kris mas habbada shubhaashayagalu
ಕ್ರಿಸ್ ಮಸ್ ಹಬ್ಬದ ಶುಭಾಷಯಗಳು

Kapampangan
Philippines
Masayang Pasku

Kaqchikel
Guatemala
Dios tik’ujie’ avik’in

Karachay-Balkar
Russia, Turkey
Džangy džylyġyz oġurlu bolsun
Джангы джылыгъыз огъурлу болсун

Karelian
Russia
Rastavanke sinun
Раставанке синун

Kashubian
Poland
Wèsołych gódów

Kazakh
Kazakhstan
Rojdestvo quttı bolsın
Рождество құтты болсын

Khmer
Cambodia
rikreay thngai bonyanauel
រីករាយ​ថ្ងៃបុណ្យ​ណូអែល

Khoekhoe
Africa
!Gâi!gâxa !khub!naes tsî ǀkhaehesa ǀasa kurib

Kinyarwanda
Rwanda
Noheri nziza

Komi
Russia
Vyl’ voön da bur Röštvoön
Выль воöн дa бур Рöштвоöн

Konkani
India
Khushal Borit Natala

Korean
Korea
jeulgeoun seongtanjeol
즐거운 성탄절

Koyukon
Alaska
Denaahuto’ Hoolaahn Dedzaahn Sodeelts’eeyh

Kurdish Kumanji
Turkey, Iran, Iraq, Syria
Kirîsmes pîroz

Kurdish Sorani
Iraq, Iran
jachny krismiset be khoshy bet
ﺟﻪﮊﻧﻰ ﻛﺮﻳﺴﻤﻴﺴﺖ ﺑﻪ خۆشى بێت

Kyrgyz
Kyrgyzstan
Caratkannın tuısımen
Жаратканнын туысымен

Ladin
Italy
Bun Nadèl

Lakota
USA
Wanikiya tonpi wowiyuskin

Lao
Laos
suksan wan kharitsamāt
ສຸກສັນວັນຄຣິດສມາດ

Latin
Italy
Felix dies Nativitatis

Latvian/Lettish
Latvia
Priecīgus Ziemassvētkus

Lingala
DR Congo, Rep Congo, Central African Republic, Angola
Mbotama Malamu

Lithuanian
Lithuania
Linksmų Kalėdų

Lombard
Italy, Switzerland
Bon Nedal

Low saxon
Germany
Frohe Wiehnachten

Lozi
Zambia
Kilisimusi ye munati ni matohonolo a silimo/mwaha o munca

Luganda
Uganda
Seku Kulu

Lule Sámi
Sweden, Norway
Buorre javla

Lushootseed
USA
Haʔɬ pədx̌aʔx̌aʔ

Luxembourgish
Luxembourg
Schéi Krëschtdeeg

Macedonian
North-Macedonia
Sreḱen Božiḱ
Среќен Божиќ

Magahi
India
bada din aayo naya saal mubaarak
बड़ा दिन आयो नया साल मुबारक

Malagasy
Madagascar
Tratry ny Krismasy

Malay
Brunei, Indonesia, Malaysia,Singapore, Thailand
Selamat hari Natal

Malayalam
India
kristumas āśansakaḷ
ക്രിസ്തുമസ് ആശംസകള്‍

Maltese
Malta
IL-Milied It-tajjeb

Mandarin
China
Shèngdàn kuàilè
圣诞快乐

Meitei/Manipuri
India
Yāi-phə-bə sə-ji-bu che-rāo-bə oi-rə-sə-nu
ꯌꯥꯏꯐꯕ ꯁꯖꯤꯕꯨ ꯆꯩꯔꯥꯑꯣꯕ ꯑꯣꯏꯔꯁꯅꯨ

Manx (Gaelic)
Isle of Man
Nollick Ghennal

Māori
New Zealand
Meri Kirihimete

Marathi
India
Śubha nātāḷa
शुभ नाताळ

Marshallese
Marshall Island
Monono ilo raaneoan Nejin

Masurian
Poland
Wesołéch Gód

Michif
Canada, USA
Gayayr Nwel

Mizo
India, Burma
Krismas Chibai

Moksha
Russia
Roštuva marxta
Роштува мархта

Moldovan
Moldova
Craciunun Fericit

Monégasque
Monaco
Bon Natale

Mongolian
Mongolia
Zul saryn mend hürgeje
Зул сарын мэнд хүргэе

Montenegrin
Montenegro
Hristos se rodi
Христос се роди

Mozarabic
Spain
Buen natal
ون نتل

Nahuatl
Mexico
Cualli netlācatilizpan

Naskapi
Canada
miywaaitaakun mikusaanor

Navajo/Dine
USA
Yáʼátʼééh Késhmish

Ndebele – Northern
Zimbabwe, South Africa
Izilokotho Ezihle Zamaholdeni

Nepali
Nepal, India
Krasmasakō śubhakāmanā
क्रस्मसको शुभकामना

Newari/Nepal Bhasa
Nepal, India
भिं ख्रिस्मस

Niuean
Niue, Cook islands, Tonga
Monuina a aho kilisimasi mo e tau foou

Norman
Jersey
Un bouan Noué

Norwegian
Norway
God Jul

Occitan
France, Monaco, Italy, Spain
Polit Nadal

Ogoni
Nigeria
Eenyie Mea Krist Ne Eenyie Aagbaa

Ojibwe/Chippewa
Canada
Niibaa’ anami’egiizhigad & Aabita Biboo

Okinawan
Japan
merī kurisumasu
メリークリスマス

Old English
United Kingdom
Blīþe Gēol

Oneida
USA
Wanto’wan amp; Hoyan

Onhan
Philippines
Malipayon nga Paskwa

Oriya/Odia
India
Nababarṣara subhechā
ନବବର୍ଷର ସୁଭେଚ୍ଛା

Ossetian
Russia
Cyppurcy Bærægbonæn
Цыппурсы Бӕрӕгбонӕн

Otomi
Mexico
Njohya ar pa ‘mu̲i ne njohya ‘na’yo nje̲ya

Palauan
Palau
Ungil Kurismas

Pangasinan
Philippines
Maabig ya pasko

Papiamento
Aruba, Curaçao, and Bonaire
Bon Pasco

Pashto
Afghanistan
De Krismas akhtar de bakhtawar
د كرسمس ﺍﺧﺘﺮ ﺩ

Pennsylvania German/Dutch
USA
En frehlicher Grischtdaag

Persian
Afganistan
kerismas mobârak
کریسمس مبارک‎

Polish
Poland
Wesołych Świąt

Portuguese
Portugal, Brazil, Cape Verde, Guinea-Bissau, Mozambique, Angola and São Tomé and Príncipe
Feliz Natal

Punjabi
India
Mairī krisamasa
ਮੈਰੀ ਕ੍ਰਿਸਮਸ

Qʼanjobʼal
Guatemala, Mexico
chi woche swatx’ilal hak’ul yet jun yalji Komami’

Quechua
Peru, Bolivia, Chile
Sumaj kausay kachun Navidad ch’sisipi

Rapa-Nui
Easter Island
Mata-Ki-Te-Rangi. Te-Pito-O-Te-Henua

Rarotongan/Cook Islands Māori
Cook Islands
Kia orana e kia manuia rava i teia Kiritimeti e te Mataiti Ou

Romani
Europe
Baxtalo Krećuno

Romansh
Switzerland
Bellas festas da Nadal

Romanian
Romania
Crăciun Fericit

Russian
Russia
S Rozhdestvom
С Рождеством

Rusyn
Eastern Europe
Chrystos roždajesja
Христос рождаєся

Sámi – Northern
Norway, Sweden, Finland and Russia
Buorit Juovllat

Sámi – Southern
Norway, Sweden, Finland and Russia
Buerie jåvle

Sámi – Lule
Norway, Sweden, Finland and Russia
Buorre javla

Samoan
Samoan Islands
Maunia Le Kilisimasi

Sanskrit
India
Kristamasaparvaṇaḥ śubhēcchāḥ
क्रिस्तमसपर्वणः शुभेच्छाः

Sardinian
Italy
Bon nadale

Scots
United Kingdom
Blythe yuil

Scottish Gaelic
United Kingdom
Nollaig chridheil

Seneca
USA
a:o’-e:sad yos-ha:-se:’

Serbian
Serbia
Srećan Božić
Христос се роди

Sesotho/Sotho
Lesotho
Keresemese e monate le mahlohonolo a selemo se setjha

Seychellois
Seychelles
Bonn e Erez Ane (PLACEHOLDER SINCE THAT IS ACTUALLY NEW YEAR)

Shona
Zimbabwe
Muve neKisimusi

Sicilian
Italy
Bon Natali

Silesian
Czech republic
Radosnych Godōw

Sindhi
India
ڪرسمس جون واڌايون ڪرسمس جون واڌايون

Sinhala/Singhalese
Sri Lanka
subha natthalak
සුභ නත්තලක්

Slovak
Slovakia
Veselé vianoce

Slovenian
Slovenian
Vesel Boži
č

Soga/Lasoga
Uganda
Mwisuka Sekukulu

Somali
Somalia, Djibouti
Kirismas Wacan

Sorbian Lower
Germany
Wjasołe gódy

Sorbian Upper
Germany
Wjesołe hody

Sotho – Northern
South Africa
Mahlogonolo a Keresemose

Spanish
Spain
Feliz Navidad

Sranan Tongo
Suriname
Swit’ Kresneti

Sundanese
Indonesia
Wilujeng Natal

Swahili
Kenya, Tanzania, Uganda, Rwanda, Burundi, Malawi, Somalia, Zambia, Mozambique, Democratic Republic of the Congo
Heri ya Krismasi

Swazi
South Africa
Khisimusi lomuhle

Swedish
Sweden
God Jul

Swiss German
Switzerland
Schöni Wienachte

Tagalog
Philippines
Maligayang Pasko

Tahitian
Polynesia
‘Ia ‘oa’oa i te Noera ‘e ‘ia maita’i i te mau ‘ōro’a matahiti ‘āpī

Tamil
India, Sri Lanka
Kiṟistumas nalvāḻttukkaḷ
கிறிஸ்துமஸ் நல்வாழ்த்துக்கள்

Tajik
Tajikistan, Uzbekistan
Dimoƣcoqī Mavludi Iso
Димоғчоқӣ Мавлуди

Tanaina/Denaʼina
Canada
Natukda Nuuphaa

Tatar
Tatarstan
Raştua bäyräme belän
Раштуа бәйрәме белән

Telugu
India
Santōṣakaramaina krisṭhmas
సంతోషకరమైన క్రిస్ఠ్మస్

Tetum
Timor
Ksolok loron natal nian

Tewa
USA
Hihchandi Núuphaa

Thai
Thailand
S̄uk̄hs̄ạnt̒ wạn khris̄t̒mās̄
สุขสันต์วันคริสต์มาส

Tigrinya
Eritrea, Ethiopia
Rhus Be’al Ldetn Hadsh Ametn
ርሑስ በዓል ልደትን ሓድሽ ዓመትን።

Tlingit
USA, Canada
Xristos Khuwdziti kax sh kaxtoolxetl

Tokelauan
Tokelau, Swains Island
Manuia te Kilihimahi

Tok Pisin
Papua New Guinea
Bikpela hamamas blong dispela Krismas go long yu

Tongan
Tonga
Kilisimasi fiefia mo ha ta’u fo’ou monū’ia

Tsonga
Mozambique, South Africa
A ku vi Khisimusi lerinene naswona a ku vi lembe lerintshwa lerinene

Tsotsil
Mexico
Xmuyubajuk ti avo’one ti ta k’ine xchu’uk ti ta ach’ jabile

Tswana
Southern Africa
Keresemose e e monate le ngwaga o o itumedisang

Turkish
Turkey, Cyprus
Mutlu Noeller

Turkmen
Turkmenistan
Täze ýylyňyz gutly bolsun

Tutchone
Canada
t’ohudinch’i Hulin Dzenu & Eyum nan ek’an nenatth’at danji te yesohuthin ch’e hadaatle

Tuvaluan
Tuvalu
Manuia te Kilisimasi mo te Tausaga Fou

Twi
Ghana
Afenhyia pa

Udmurt
Russia
Vyl’ Aren, no Tolsur
Выль Арен, но Толсур

Ukrainian
Ukraine
z Rizdvóm
з Різдвом

Urdu
Pakistan, India
krismas mubārak
کرسمَس مبارک

Uyghur
China, Kazakhstan
Rojistıwa bayrımıngızgä mubaräk
روجىستىۋا بايرىمىڭىزگە مۇبارەك

Uzbek
Uzbekistan
Rojdestvo bayramingiz qutlug
Рождество байрамингиз қутлуғ

Venda
South Africa
Ḓuvha ḽa mabebo a Murena ḽavhuḓi

Venetian
Italy
Bon nadale

Veps
Russia
Raštvoidenke i Udenke Vodenke

Vietnamese
Vietnam
Chúc mừng Giáng sinh

Võro
Estonia
Rõõmsit joulupühhi

Votic
Russia
Yvää uutta vootta

Walloon
Belgium
djoyeus Noyé

Waray
Philippines
Maupay nga Pasko

Welsh
United Kingdom
Nadolig llawen

Westrobothnian
Scandinavia
Gow juwl

Wolof
Senegal, Gambia, Mauritania
Mangui lay ndioukeul ci Noël bi

Xhosa
South Africa, Zimbabwe, Lesotho
Krismesi emnandi

Yiddish
World
a freylekhn nitl
אַ פֿריילעכן ניטל

Yolngu
Australia
Kritjmatj yiŋgathirri ga dhuŋgarra dhawurruŋga yiŋgathirri

Yoruba
West Africa
Ẹ ku Ayọ Keresimesi

Yucatec Maya
Mexico, Guatemala, Belize
Utzul mank’inal

Yupik
Alaska, Russia
Angliq Alussistuaq

Yup’ik
Alaska
Alussistuaqegtaarmek piamken

Zazaki
Turkey
Serra to ya newî pîroz bo

Zulu
South Africa
Jabulele uKhisimusi


Less is More

By Nicolas Gambardella

In scientific texts, less is often more. Less figures and tables mean more clarity; Less experiments and results mean more impact. This might seem counter-intuitive since more information should always be better, right? Moreover, whether as preliminary data in a grant application or as results in a research paper, we all want to describe all the great experiments we ran, the clever analyses we came with, and the conclusions we derived. However, we also want – and need – to convey excellence.

Except for truly groundbreaking research papers, where a single result matters, overshadowing everything else, the final impact of a paper or a grant application on the reader will reflect the average quality of every independent result. If you performed three or four excellent experiments, and they are sufficient to demonstrate your point, every additional result of less novelty or perfection will decrease the average final impact. Here are five points you should reflect on when writing a scientific text.

1) Put yourself in your reader’s shoes

When producing any kind of material for public consumption, whether text or other types of documents, in science or any other field, we should never forget that the content should be geared toward the audience, not ourselves. As such, when writing an article or a grant application, we should always keep the potential reader in mind.

  • What is interesting for you is not necessarily interesting for them
  • You do not want to bore them stiff
  • You do not want to make important facts or conclusions hard to find
  • You want them to remember the main message
  • You want them to remember the WOW feeling they had when reading
  • You do not want them to think meh at any time, about any result

2) Do not describe the entire journey that led to the final set of results.

Imagine you ordered a wedding cake. The baker experienced three mishaps before getting it well. The first mix was wrong, and was not baked properly; another one was overcooked; the design of the third one failed. Do you expect the baker to deliver all four attempts on the wedding day or only the fourth one?

Each scientific text tells a story and unfolds along a storyline. This is necessary to bring the reader to the conclusions we want to share. However, this storyline is a logical construct, built to make the point clearer and easier to understand. It does not need to be the actual story, as it happened in the lab. There is no need to describe all the false starts, the dead ends, the mishaps, all the iterations with optimization (see below). Just tell the readers what you found, using the final or best experiments you performed.

3) Do not clutter the main body with negative results

Negative results are important, and we should not hide them. However, there is no need to put them in the main body of a text, except if they are revealing new insights. The overall message of a paper or grant application should always be positive, optimistic and forward-looking. If you want to report negative results, or warn others of dead ends, to spare their time, energy, and expenses, why not put these results in supplementary materials, on your website or deposit them in the relevant public database?

4) Do not clutter the text with sup-par, or trivial results

There is no need to explore exhaustively a question in the main text of a research paper. You should choose your main message, and build-up the best case for it, using only what is necessary to demonstrate your point. Yes, I am sure there are many other interesting aspects worth presenting and discussing in your experiments or your datasets. However, by devoted too much space to those secondary questions, you will dilute the primary conclusion, and make it more difficult to identify, and less impactful.

5) Do not load the main text with set-up procedures

You should mention the tests you performed and the validation procedures you put in place. This is important. But is it crucial to show all of them in the main body of your text? You probably ran dozens of experiments to find the right dose for your drug or marker, to optimize your buffers or culture medium. This took lots of effort and time. But is it as important for the reader as the final dose or culture medium? After all, like all professionals, we assume that you performed due diligence. Would you list all the search expressions you used in PubMed to perform the necessary bibliographic search during the project?

Remember, the people you want to impress most are editors and reviewers (for a paper) and members of grant panels (for an application). These people are often senior scientists, which means they have a limited amount of time available for each text. Furthermore, they are not always technical experts on the very question tackled in each of these texts (whether this is a pathological or desirable feature of modern research assessment is beyond the scope of this post). Most of them will only read the main meat, and probably quickly so. If you want to provide background information, provide them in supplementary materials or on a website.

In conclusion, in order to maximize impact, build a story as long as necessary and as short as possible. Remember: The perceived excellence of a piece of work is the average of the excellence of each of its components.

Is Machine Translation a threat or an opportunity?

By Nicolas Gambardella

Machine Translation (MT) is one of the most discussed topics in the world of translators at the moment (on par with collapsing fees). Most of the arguments revolve around either its usefulness or the threat it poses to the professional human translators. We briefly touched on it within a previous post, but we would like to go a bit deeper here and provide some ideas about making the most of MT within the current translation workflow.

What is MT?

Wikipedia tells us that Machine translation is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another (warning, this Wikipedia page is quite outdated, as evidenced by the tiny mention of the neural network-based approach). Within the world of translation, this means the automatic translation of a piece of text by software that analyzes the source, without human intervention. This is different (and complementary) from systems based on translation memories.

This post is not a technical essay on the inner workings of MT, and we are not going to explain how the translation is actually done. Many approaches were proposed over the years, with increasing success. However, the paradigmatic change happened – as in many other domains – when people started to use “deep learning“, i.e. using cascades of artificial neural networks trained on a huge amount of data (for more technical information you can read Google’s Neural Machine Translation (NMT) paper in arXiv). Suddenly, one could actually copy-paste an e-mail or a webpage in a translation tool and understand what it was about. Sure, the result is not perfect. Let’s be frank, it is often quite bad and sometimes funny. But it is understandable, and more or less looks like what a human with an intermediate level in a foreign language could produce when translating a text about a topic they know nothing about. And the spelling and grammar are better than many of the e-mails, text messages and Facebook posts we are all daily subjected to. The latest massive improvement came with the DeepL system, training the network using the Linguee database of existing translations.

How does the professional translation world work?

In order to understand the disruption brought by MT, it is useful to recapitulate how a large part of the professional translation is organized. There are exceptions to what we describe below, fields of translation where people interact differently, such as companies with embedded translation offices, authors dealing directly with their translators, etc. We are not concerned by these, although MT has presumably a large impact there as well. First of all, there are three different jobs involved in the production of a translated document: 1) Translation per se; 2) Editing (for which the source document is needed), where one checks that the translation is accurate, all the requirements followed (e.g. no translation of person and product names), and 3) Proofreading (for which the source document is not needed), where one checks spelling, grammar, punctuation, etc. This is the so-called TEP workflow.

Typically, when someone, the end client, is in need of a translation, they will either contact a translation company or will post a job advert on one of the many possible websites, either non-specialised – such as Upwork or Freelancer.com – or specialized in translation – such as TranslatorsBase or TranslatorCafé. The companies can be real translation companies, performing in-house translation, or agencies, outsourcing the work. In most cases though, some outsourcing will be involved since very few companies have enough employees to cover all language pairs and expertise in all fields. Such outsourcing will be done through the company’s own network of freelancers, via professional platforms such as ProZ or using the sites mentioned above. Now, sometimes, the outsourcing process does not stop here, and a cascade of subcontracting unfolds, with decreasing fees at each step of the ladder. Unfortunately, as the fees decrease, so does the quality of the translation. This is why a revision step is put in place by the outsourcers. This can be just a proofreading exercise, fixing spelling, punctuation and the occasional grammar issue. Or it can turn into a heavier editing task, correcting translation mistakes. In the case of an outsourcing cascade, this can effectively become a retranslation.

How is MT affecting the translation pipeline

Before the advent of NMT, MT produced a text so bad, that it took a professional translator longer to fix it than retranslating from scratch. A machine-translated text was also immediately obvious, even when compared to bad human translations. All that has now changed. The quality of the produced translations increased dramatically (at least in certain cases. We discuss this in the next section) and large amounts of text can be translated very very quickly. While the free online versions generally limit every single translation to a few thousand characters, one can extend that via APIs (with or without fees, see for instance the R package deeplr).

This triggered two consequences, one ethical, one unethical, but both unfortunate. The first consequence is that some agencies think they can stop outsourcing the human translation part of a job and only pay for the revision one. The second consequence is that some freelancers pretend to translate themselves while they just use MT and a superficial revision. To be honest, in the latter case we are generally at the bottom of the subcontracting cascade, and the human translation would be quite bad anyway. In both cases, the result is a text that requires editing rather than proofreading. In the first case, agencies are honest and openly admit the fact, offering jobs of Machine Translation Post Editing (MTPE). But, and this is the crux of the problem, in both cases, the rate offered is at the level of proofreading rather than editing.

Improved MT also brought another change to the working practices of a professional translator. Many translators use Computer Aided Translation tools. Typically, such a tool divides the source text into segments, that are translated separately. Those tools now provide access to MT engines to provide suggestions for segment translations, as an alternative to Translation Memories (even if one could argue that DeepL is somehow linked to an uber TM, in the form of the Linguee database).

The luddites

Understandably, the world of professional translation has been shaken by the sudden rise of NMT. In a couple of years, what was seen as a promising field of research became a game-changer. The reaction in such situations is always the same. It broadly follows the Five Stages of Grief. Because of the past history of the field, most translators went through the denial period. Many are still stuck there. Using the cases where MT performs badly – albeit not worse than a casual translator not doing their homework – as evidence, such people reject its relevance entirely. A portion of the community moved on the bargaining phase (trying to avoid or compete with MT), and some are even in the acceptance phase. However, a very vocal part of the community is currently in the anger phase. In some sense, they are similar to the Luddites who refused industrialization for fear that it would suppress their jobs. However, since they cannot break the MT engines, they turn their anger towards the translators using it. They are mistaken in exactly the same way as the 19th-century Luddites. They fear that the change of paradigm will remove the need for skilled workers and replace them with unskilled cheap ones. While exactly the opposite will happen, as it did a few centuries ago when automation created highly skilled jobs and removed the lowly paid manual ones. The segment of the translation community that will be the most affected by MT is the domain of non-technical, low quality, translation, while the skills of specialised human translators will be more recognized than they were when lost is an ocean of mediocre translators. Which brings us to the strengths and weaknesses of MT.

How good is MT?

So, machine translation improved tremendously, but how good is it for practical purposes? Sure, we all came across funny translations, and we can all do with a good laugh. However, for simple texts, the result is OK. DeepL’s translations to French of the following sentences is almost perfect: “The sky is grey. It is likely to rain”, “Postman Pat’s truck is red”, “The Luddites were a secret oath-based organization”, “Jeremy Corbyn is the leader of the labour party”. In the case of the first sentence, DeepL actually chooses a correct but suboptimal translation (Il est probable qu’il pleuve). However, it picks the right one (Il va probablement pleuvoir) if we add a double quote at the end, which reveals one of the problems specific to its approach, that is oversensitivity to local context in existing translations. That said, Google Translate always picks the suboptimal solution.

This suggests a range of situations were MT could be used: Everyday’s discourse, children stories, factual descriptions, and news. What have those situations in common? The language is simple, and must be understood by everyone. These are “layperson translations”.

Now, by contrast, MT fails with highly specialised and technical documents, when the language requires a pre-existing particular knowledge from the reader, not shared by the entire population. Why is that? Because MT cannot cope with several situations, including the following:

  • When a word has several widely different meanings, and the source text does not use the most frequent one. For instance, in the ecclesiastic world, the French word “coule” designs a garment worn by monks. Now, MT will always believe “coule” is a verb meaning either some liquid moving from up to down, or something that get submerged by water, The proposed translations will be flow, run, pour, sink, cast (if what is flowing is metal or cement), stream, trickle, or even founder. It will never be cowl.
  • Not the same word or expression in different languages. Here we find the famous “il pleut comme vache qui pisse” translated into “it rains cats and dogs”. Same underlying meaning, totally different expression. In general, all such imaged expressions tend to be translated literally by MT, resulting in completely meaningless sentences.
  • Meronymy/Holonymy, that is when the word used in a language represents part of the thing which the equivalent word in another language represents. I am not talking about synecdoche here, that is a stylistic figure which uses the part for the whole or the other way around.
  • Hyponymy and hypernymy, that is when a word in a language represents a generalization of the thing represented by the word in the other language. For instance, “seagull” is a layperson English word representing a subset of the family Laridae. In French, there is no such layperson term. Instead one will use either “goéland” representing the genus Larus which are big birds, or “mouette” representing several genera of the subfamily Larinae which are small birds. MT has no way to know which one the author of the source text meant (even if the previous sentence clarified the issue).
  • Complex relationships. In English, the temporal bone of the skull is separated into parts coming from different embryological origins (the squamous, petrous and tympanic bones). In French, the temporal bone is separated into regions of the adult structure, the “écaille”, “rocher”, and “mastoid”. It is impossible to translate one into the other. One has to reconstruct the entire description.
  • Context-dependent translation. MT typically focused on a word and its immediate surrounding. For instance, a human translator will understand that in the following sentences “La fille regarda les jouets qu’on lui avait offert. Son ballon était bleu et son vélo rouge”, the ball and the bike are the girl’s ones. But MT cannot determine that. Both GT and DeepL translate it into: “The girl looked at the toys that had been given to her. His balloon was blue and his bike red.” (which by the way is a great example of unintended but real sexism).

I am certain there are other areas where MT performs unevenly or badly (for instance when it comes to household names, slang, etc.)

Among the other issues presented by MT are two problems that mirror each other. Since the MT engines have no memory of the entire text, the same word can be translated differently in different parts. Sometimes it does not matter, as in “stream” and “trickle” in the example above. Sometimes it does, if we get sometimes “stream” and sometimes “cowl”! Conversely, because MT engines were built on a given training set, they tend to produce texts that are boring in terms of vocabulary and “robotic” in terms of style. To be fair, this is much less of a problem with Deepl than with GT. Also, the problem is worse with translation memories, so MT might even be an improvement here.

Two ways of using MT in professional translation

At the heart of the debate and disagreement around MT in the professional translation setting lies a lack of clarity on the way it is and/or should be used. At the moment, there are two very different ways of using MT for translation:
1) using MT to perform the whole translation, and ask third parties to review the results.
2) using MT as part of a piece of the toolkit to perform translations, for instance, to provide starting points or alternatives for segments, in parallel to translation memories.

Many agencies, or publishers, think MT is ready for 1), while it is not. Let’s be really really clear here: MT is not the key to automatically – and cheaply – translate corpora of texts, either articles or books, etc.

Furthermore, reviewing translations performed that way is extremely difficult. It is by no mean a proofreading exercise, but rather an editing exercise. We had to edit large texts which comprised parts translated by MT and parts translated by a human who clearly was not a native of the target language. Both types were difficult to edit. However, there was one crucial difference: While the human-translated parts presented a horrendous style and many grammatical mistakes, the MT parts presented WRONG translations. In most cases, this is much worse. For instance, in the biomedical domain, tiny misunderstanding might lead to dreadful consequences.

Conversely, many professional translators think or claim that MT is not ready for 2), and cut themselves from a very useful tool. We wholeheartedly adopted 2). We think there is much improvement needed, and it is possible (see below). We believe translators, like any professionals, need to take control of their tools. When a farmer works out their field, they use various technologies. But one rarely sees some third parties, completely unaware of what was done to the ground and how it was done, coming and evaluating the work. They just buy the product. We think MT should be used by translators, not blindly, but in a controlled manner. Then, we will be able to learn from it, but also to help it grow to become an even more useful tool.

How to use MT efficiently

  • Use MT on a segment per segment basis rather than for the whole text (the definition of what makes a segment is let to the imagination or the preferences of the reader/translator).
  • Never accepts a proposed translation blindly. Check all the important words, as well as tenses and accords.
  • Make full use of the alternatives provided for instance by DeepL. The proposed choice is statistical, but often the right or more accurate one is within the first 3-5 alternatives.
  • Once a significant chunk of text is translated, re-read in its entirety to make the style more homogeneous and reduce repetitions. To be fair, this is not specific for MT, and should always be done.
  • back-translate the text from the target to the source language, in order to spot possible ambiguities or mistranslations.

What do you think? Are you using Machine Translation at the moment? Which systems? How?

10 errors to avoid when starting as a translator

By Nicolas Gambardella

Many people start in the translation business without a corresponding professional training. This is absolutely fine, and it is in fact a good way of using one’s language skills acquired either during a professional activity or a travelling life. However, as amateurs, they probably all tend to make the same mistakes. Here we list a few of them.

1) Believing that a translation job is just … translating

A translation job is much more than converting a text from a source language to a target language. Glossaries and a bit of grammar polishing would almost be sufficient for that. However, a translator must convey the “content” of the source document. That involves of course translating the words. But it also, and foremost, involves producing a text that carries the same message. And to do so requires to understand what the text is about, in details and with all its subtleties. This is why all translators have their specialities, and although most translators can do an OK job with any text in their paired languages, they really excel only within a few niches.

Conveying the proper meaning is sometimes at odd with keeping to a strict translation of the words themselves. Depending on the domain covered, one wants to massage the text to make it more readable and respect the form of the source text. With the exception of legal documents – where one must absolutely stick to the original, even if the result seems quite heavy – some sentence restructuring and expression switching is needed to make the result more palatable, and also truly equivalent in the target language. Finally, in the artistic domain, one wants to respect the style of the original, terse or verbose, dull or vivid, mainstream or abstruse. Lovecraft did not write like Stephen King despite hovering in the same literature space.

2) Starting the translation immediately

In order to translate a text accurately, we cannot start the work straight away. We must read the entire text beforehand, to make sure we understand what it is about, have an idea of the specialized knowledge we might need to acquire, and what was the goal of the authors. Such a preliminary read will only marginally increase the time spent on a text. Or at least it should, otherwise we are probably not spending enough time on the job! Reading a 100 000 words book before starting the translation might seem daunting, but the required time is still far less than what we will spend accurately translating those 100 000 words. And the gain down the line in terms of translation speed and accuracy largely makes up for the extra effort. During this initial read, we should make notes of anything we do not immediately get, any word or expression we did not come across in the past, and make sure we do fully understand it.

3) Trusting machine translation

Machine translation has seen astounding progress in the past few years. Software such as the Google Neural Machine Translation and (even more) DeepL , really transformed the activity to a point that, in many cases, the result really sounds like it has been produced by a native speaker, but is also better than a translation made by a casual translator, i.e. someone who would make most of the errors listed here … (By the way, this makes even more pathetic the ridiculous translations used in some places such as Stansted airport. It beggars belief that nowadays people produced voice announcements that barely make sense, and even check-in machines that speak some nonsense languages using random words assembled in sentences with no grammar whatsoever).

However, machine translation is still mostly good for straight texts, without nuance, technical jargon, and stylistic oddities. It is still too much based on word for word translation, or translation of short segments. This often results in wrong choices in case of homonyms in the source language, wrong split of propositions in long sentences, lots of repetitions etc. Also, machines seem to ignore basic life facts, such as only female give birth. So the translation of “They gave birth to their babies” is invariablyIls ont donné naissance à leurs bébés” and not “Elles ont donné naissance à leurs bébés”. More disturbingly, when we want to translate “he ate his date”, instead of “il a mangé sa date”, Google Translate provides “Il a mangé son rendez-vous” and DeepL even decides to add up slang to the delightful “Il a mangé son rencard“. Not very vegan.

That said, machine translation is generally a good feeder for Computer Assisted Translation, which brings us to the next mistake.

4) Blindly trusting the segment-based text proposed by our CAT software

Computer Assisted Translation speeds up translation massively. It saves all the time spent translating and typing trivial pieces of text such as “the red car”, “his name was Joe” and “the sky was gray and it was likely to rain”. However, CAT cannot be trusted blindly. CAT translation is based on segmentation. The text is split in small parts, containing one or a few sentences. The software then suggest translations for each segment.

Firstly, some of those translations might come from machine translation, e.g. Google Translate or DeepL. Thus, see point 3. But very often the translations come from Translation Memories. Translation memories come with their own problems. Sometimes the translations proposed are plainly wrong, with missing words or wrong sentence parsing (resulting in wrong adjective associations for adjectives or verbs for instance). Another important issue is error propagation. If a segment was badly translated once, and this translation was recorded in TMs, it will be proposed in future translations.

A very important issue is the fact that the translations proposed for a segment is done purely on this segment, independently of the content of other segments of the text. There is rarely enough context in a single segment to discriminate between different meanings of a term.

Finally, the segmentation largely follows the punctuation in the source language. Depending on the translation, for instance in literary works where one needs to keep a style and rhythm, the optimal split might be different in the target language. Fortunately, CAT tools offer segment split/merge facilities.

5) Assuming the source document is right

This is a thorny issue. The basic position is that the source language document is correct, and we need to faithfully translate it. But this is not necessarily the case. Everyone makes mistakes, even the most thorough writers. Some mistakes are easy to spot and to correct, and many should not affect the translation, such as unambiguous spelling errors. However, others will be much harder to detect. For instance, words with similar pronunciations in English (the ubiquitous “complimentary” for “complementary”, “add” for “had”, “your” for “you’re” or the dreadful “of” for “have”), or absence of accents (or incorrect ones) in French, will lead to completely wrong translations. In many case, the context will provide a quick answer, but sometimes a bit more brain juice is needed. We should always double check that we understood the text correctly, and that our chosen translation is the only one.

Finally, horror, some “errors” are made on purpose, for stylistic reasons. In the case of a novel or a play, wrong grammar or vocabulary might be part of the plot or a defining feature of a character. In that case, we probably must provide a translation that contain a correct equivalent of the initial erroneous text …

6) Forgetting to double check the punctuation

OK, that might actually be a specific version of the previous error. Translators are linguists, and as all linguists, we are in love with punctuation (aren’t we?). Is there anything that beats the Oxford comma as a favorite topic for conversation? (except perhaps split infinitives) Surprisingly enough, this is not the case of every person, or even every writer. Punctuation can be a life saver in the case of very long and complex sentences. It can also be a killer in case it is absent, or, heaven forbid, wrongly placed. For instance, observe the following bit of text:
“an off-flavour affecting negatively the positive fruity and floral wine aromas known as Brett character.”

What is the “Brett character”? (enlightened disciples of Bacchus, lower your hand). Is it the positive fruity and floral wine aromas? Or is it the off-flavour? It is, in fact, the latter, a metallic taste given by some yeast (from the genus Brettanomyces). Of course, the answer would be much clearer if the source sentence was:

“an off-flavour affecting negatively the positive fruity and floral wine aromas, known as “Brett character”.”

But let’s not add punctuation to Guillaume Appolinaire’s poetry, and keep Le Pont Mirabeau free of punctuation. Actually, the following translation of La Tour Eiffel might be one of the truest poetry translation ever, respecting the meaning, the style, and the shape.

7) Not paying attention to the mainstream use bias

This error is often a side-effect of using CAT tools with TMs or MT. The proposed translations will often rely on the most frequent meaning of a term, and its most frequent translation. This is not necessarily the meaning which is the right one, or the best one, for the current source document.

Sometimes, this is just irritating. For instance, in a literary text talking about “petits détours”, CAT will keep suggesting “small detours”. While this is correct, it does not fully convey the idea carried by “petits” here. It is too bland too quantitative, and “little detours” is the best translation, as shown here, here and here.

However, the mistake can be more severe. Google Translate tells us the story of a dreadful mum, “She put a bow in her daughter’s hair” being translated into “Elle a mis un arc dans les cheveux de sa fille”. That must have hurt terribly. As was the case for the poor lad who “entered a ball” and ended up “entré dans un ballon” (GT) or even “entré dans une balle” (DeepL), instead of “entré dans un bal”. Not much room to dance there. Sometimes, the mainstream use is actually overridden by the politically correct one, and the saucy “he was nibbling at her tit” is translated into “il mordillait sa mésange”. Except if we are talking about a cat, that is a disturbing image instead of a titillating one. While those examples were a bit joky, some cases are harder to spot. Someone who planted “Indian flags” in their garden will almost always end up in French exhibiting their nationalism rather than their love of irises.

In some cases, the various meanings have similar frequencies in daily use, and different tools provide alternative suggestions. DeepL will suits plumbers providing “installer un compteur” for “To set up a counter”, while Google Translate will lean towards merchants with “mettre en place un comptoir“.

8) Trying to stick 100% to the words of the source text

The true meaning of a word goes beyond its definition in a thesaurus. They carry different weight in different languages. The rude word meaning faeces is used as an interjection in almost every language. However, the level of rudeness is different in all western European countries, and sometimes choosing another rude word of the adequate level is better (no, we will not provide examples). And of course, there are very few cases where anyone should translate “it rains cats and dogs” into “il pleut des chats et des chiens”. One should always translate it into “il pleut comme vache qui pisse” (it rains as if a cow was pissing). While the new image is no so much better, at least no animal is hurt.

9) Trying to stick 100% to the structure of the source text

Trying to reproduce absolutely the structure of the source document is very tempting and encouraged by the segmentation process of CAT tools. However, this is lazy. English sentences are known to be shorter than French ones. Therefore, translating a sentence from the latter language might require several in the former. Let’s not speak of German where an entire sentence might end up in a single word! As usual, first comes the meaning, then the rhythm, then the style. Not only this requires to merge/split sentences, it might also require swapping propositions or sentences.

10) Not reading back the complete resulting translation

Last but not least, we should never forget to re-read attentively the entire translation. In the profession, proofreading is often mentioned as an activity disconnected from translation. But no translation work should be considered complete without a proofreading step! This is even more important if CAT software were used. They are known to promote “sentence salads”, where heterogeneous texts, in style and vocabulary, are caused by using the memory of many previous translations.

What about yourself? Which mistake did you make when learning how to become an accurate and efficient translator?

10 tips to model a biological system

By Nicolas Gambardella

You are about to embark in a system biology project which will involve some modelling. Here are a few tips to make this adventure more productive and more pleasant.

1 – Think ahead

Do not start building the model without knowing where you are going. What do you want to achieve by building this model? Is it only a quick exercise, a one-off? Or do you want this model to become an important part of your current and future projects? Will the model evolve with your questions and the data you acquire? A model with a handful of variables, created to quickly explore an idea, and a model that will be parameterized with experimental measurements, whose predictions will be tested and that will be further expanded are two completely different beasts. Following the 9 tips below in the former case is an overkill, a waste of time. However, cutting corners in the latter case will cause you unending pain when your project unfolds.

2- Focus on the biology

A good systems biology model aims at being anchored in biological knowledge, and even (generally) reflects the biological mechanisms underlying the behaviours of the system. We are using modelling to understand biology, and not using biology as an illustration of modelling techniques (which is a perfectly respectable activity, but not the focus of this blog post). In order to do so, the model must be built from the processes we want to represent (hence complying with the Minimum Information Requested in the Annotation of Models). Therefore, try to build up your model from reactions (or transitions if this is a Petri Net, rules for a Rule-based model, influences for a Logic model), rather than writing directly the equations controlling the evolution of variables.

Another aspect which is worth a thought is the existence of different “compartments”. In systems biology, compartments are the “spaces” that contain the biological entities represented by your variables (the word has a slightly different meaning in PKPD modelling, where it means the variable itself). Because compartments can have different sizes, that these sizes can change and can be used to affect other aspects of the models, it is important to represent them correctly, rather than ignoring them altogether, which was the case for decades.

Many tools have been developed to help you build models that way, such as (but absolutely not limited to) CellDesigner and the excellent COPASI. These software tools are in general very user-friendly and more approachable for biologists. A large list of tools is available from the SBML software guide.

3- Document as you build

Bookkeeping is a cornerstone of any professional activity, and lab notebooks are scientists’ best friends. Modelling is no exception. If you do not log why you created a variable or a reaction, what biological entities they represent, how you chose the initial values or the boundaries for a parameter estimation, you will make your life down the line hell. You will not be able to interpret the results of simulations, to modify the model, to share it with collaborators, to write a publication etc. This documentation must be started as soon as you begin building the model. Memory fades quickly, and motivation even quicker. The biggest self-delusion (or plain lie) is “I am efficient and focused now, and I must get results done. I will clean up and document the model later.” You will most probably never clean up and document the model. And if you do, you will suffer greatly, trying to remember why the heck you made those choices before.

Several software tools, such as COPASI, provide means of annotating every single element of a model, either with free text, or with controlled annotations. Alternatively, you can use regular electronic notebooks, Google docs, and spreadsheets if you work with others etc. Anything goes, as far as you do create this documentation. Note that you can later share model and documentation at once, either with the documentation included in the model (for instance in SBML notes and annotation elements) or with model and documentation shared as a single COMBINE Archive.

4- Choose a consistent naming scheme

This sounds like a mundane concern. But it is not! The names of variables and parameters are the first layer of documentation (see tip 3). It also anchors your model in biology (tip 2). A naming scheme that is logical and consistent while easy to remember and use will also greatly facilitate future extensions of your model (tip 1). NB: we do not want to open a debate “identifiers versus accession number versus usable name” or the pros and cons of semantics in identifiers (see the paper by McMurry et al for a great discussion on that topic). Here, we are talking of the short names one sees in equations, model graphs, etc.

buildingAvoid very long names if not needed (“adenosine triphosphate”), but do not be over-parsimonious (“a”). “atp” is fine. Short, explicit, clear for most people within a given context. Reuse common practices if possible, even if they are not official. Uppercase K is mostly used for equilibrium constants, lowercase k for rate constants. A model using “Km” for the rate constant of DNA methylase and “kd” for its dissociation constant from DNA would be quite confusing. Be consistent in the naming. If non-covalent complexes are denoted with an underscore, A_B being a complex between A and B, and hyphens denote covalent modifications, A-P representing the phosphorylated form of A, do not use CDP for the phosphorylated form of the complex between C and D (or, heaven forbid, C-D_P !!!)

5- Choose granularity and independent variables wisely

Two mistakes are often made when it comes to describe mathematically systems in biology. The first one is a variant of the “spherical cow“. In order to facilitate the manipulation of the model, it is very tempting to create as few independent variables as possible (by variable, we mean here the things we can measure and predict). Those variables can be combinations of others, combinations sometimes only valid in specific conditions. Such simplifications make exploring the different behaviours easier, for instance with phase portraits and bifurcation plots. A famous example is the 2 variable version of the cell cycle model by John Tyson in 1991. However, the hidden constraints might not allow the model to reproduce all the behaviours displayed by the biological system. Moreover, reverse engineering the variables to interpret the results could be difficult.

The second, mirroring, mistake is to try modelling the biological system in exquisite details, representing all our knowledge and thus creating too many variables. Even if the mechanisms underlying the interactions between those variables were known (which is most often not the case), the resulting model often contains too many degrees of freedoms, effectively allowing any behaviour to be reproduced with some parameter values (making it impossible to falsify). It also becomes very difficult to accurately determine the values of all parameters based on a limited number of independent measurements.

It is therefore paramount to choose the right level of granularity. There is no simple and universal solution, and extreme cases can be encountered. In d’Alcantara et al 2003, calmodulin is represented by two variables (total concentration and concentration of active molecules). In Stefan et al 2008, calmodulin is represented by 96 variables (all calcium-binding combinations plus binding to other proteins and different structural conformations). However, both papers study the same question.

The right answer is to pick the variable granularity depending on the questions asked and the data available. A rule of thumb is to start with a small number of variables, that can be matched (directly or via mathematical transformations) with the quantities you have measurements for. Then you can progressively make your model more complex and expressive as you move on, while keeping it identifiable.

6- Create your relationships

Once you have defined your variables, you can create the necessary relationships, which are all the mathematical constructs that link variables and parameters together. Graphical software such as CellDesigner or GINsim permit to draw the diagrams representing the processes or the influences respectively.

Note that some software tools provide shorthand notations which permit to create variables and parameters directly when writing the reactions. This is very handy for creating small models instantly. However, I would refrain from doing so if you want to document your model properly (it also makes easier to create spurious variables and “dangling ends” through typos in the variable names).

Working on the relationships after defining the variables also permits to modify the model easily. You can add or remove a reaction without having to go through the entire model as you would with a list of ordinary differential equations.

7- Choose your math

The beauty of mathematical models is that you can explore a large diversity of possible linkages between molecular species, actual mechanisms hidden behind the “arrow” representing a process. A transformation of X in a compartment into Y in another compartment can be controlled for instance by a constant flux (don’t do that!), a passive diffusion, a rate-limited transport, or even exotic higher-order kinetics. At that point, we could write: [insert clone of tips 5 here]. Indeed, while the mathematical expressions you choose can be arbitrarily complex, the more parameters you have, the harder it will be to find proper values for them.

If the model is carefully designed, switching between kinetics should not be too difficult. A useful habit to take is to preferentially use global parameters (which scope is the entire model/module) rather than parameters defined for a given reaction/mathematical expression. Doing so will, of course, ease the use of the parameter in different expressions, but also facilitate the documentation and ease future model extensions, for instance where a parameter does no longer have a fixed value but is affected by other things happening in the model.

8- Plug holes and check for mistakes

Now that you have your shiny model, you need to make sure you did not forget to close a porthole that would sink it. Do you have rate-laws generating negative concentrations? Conversely, does your model generate umpteen amounts of certain molecules which are not consumed, resulting in preposterous concentrations? Software like COPASI have checks for this kind of things. In the example below, I created a reaction that consumes ATP to produce ADP and P, with a constant flux. This would result in infinite concentrations of ADP and infinitely negative concentrations of ATP. COPASI catches it, albeit returning a message that could be clearer.

Ideally, a model should be “homeostatic”. All molecular species should be produced and consumed. Pure “inputs” should be produced by creation/import reactions, while pure “outputs” should be consumed by degradation/export reactions. Simulating the model would not lead to any timecourse tending to either +∞ or -∞

9- Create output

“A picture is worth a thousand words”, and the impact of the results you obtained with such a nice will be greater if served in clear, attractive and expressive figures. Timecourses are useful. But they are not always the best way to present the key message. You want to show the effect of parameter values on molecular species’ steady-states? Try parameter scanning plots, and their derivatives, such as bifurcation plots. Try phase-portraits. Distributions of concentrations during stochastic simulations or after ensemble simulations can be represented with histograms. And why being limited to 2D-plots? Use 3D plots and surfaces instead, possibly in conjunction with interactive display (plot.ly …).

10- Save your work!

Finally, and this is quite important, save often and save all versions. Models are code, and code must be versioned. You never know when you will realize you made a mistake and will want to go back a few steps and start exploring a different direction. You certainly do not want to start all over again. Recent work explored ways of comparing model versions (see the works from the Waltemath group for instance). But we are still some way off the possibility of accurately “diff and merge” as it is done on text and programming code. The safest way is to save separately all the significant versions of a model.

Have fun modelling!