April 2019 – aSciStance blog

10 errors to avoid when starting as a translator

By Nicolas Gambardella

Many people start in the translation business without a corresponding professional training. This is absolutely fine, and it is in fact a good way of using one’s language skills acquired either during a professional activity or a travelling life. However, as amateurs, they probably all tend to make the same mistakes. Here we list a few of them.

1) Believing that a translation job is just … translating

A translation job is much more than converting a text from a source language to a target language. Glossaries and a bit of grammar polishing would almost be sufficient for that. However, a translator must convey the “content” of the source document. That involves of course translating the words. But it also, and foremost, involves producing a text that carries the same message. And to do so requires to understand what the text is about, in details and with all its subtleties. This is why all translators have their specialities, and although most translators can do an OK job with any text in their paired languages, they really excel only within a few niches.

Conveying the proper meaning is sometimes at odd with keeping to a strict translation of the words themselves. Depending on the domain covered, one wants to massage the text to make it more readable and respect the form of the source text. With the exception of legal documents – where one must absolutely stick to the original, even if the result seems quite heavy – some sentence restructuring and expression switching is needed to make the result more palatable, and also truly equivalent in the target language. Finally, in the artistic domain, one wants to respect the style of the original, terse or verbose, dull or vivid, mainstream or abstruse. Lovecraft did not write like Stephen King despite hovering in the same literature space.

2) Starting the translation immediately

In order to translate a text accurately, we cannot start the work straight away. We must read the entire text beforehand, to make sure we understand what it is about, have an idea of the specialized knowledge we might need to acquire, and what was the goal of the authors. Such a preliminary read will only marginally increase the time spent on a text. Or at least it should, otherwise we are probably not spending enough time on the job! Reading a 100 000 words book before starting the translation might seem daunting, but the required time is still far less than what we will spend accurately translating those 100 000 words. And the gain down the line in terms of translation speed and accuracy largely makes up for the extra effort. During this initial read, we should make notes of anything we do not immediately get, any word or expression we did not come across in the past, and make sure we do fully understand it.

3) Trusting machine translation

Machine translation has seen astounding progress in the past few years. Software such as the Google Neural Machine Translation and (even more) DeepL , really transformed the activity to a point that, in many cases, the result really sounds like it has been produced by a native speaker, but is also better than a translation made by a casual translator, i.e. someone who would make most of the errors listed here … (By the way, this makes even more pathetic the ridiculous translations used in some places such as Stansted airport. It beggars belief that nowadays people produced voice announcements that barely make sense, and even check-in machines that speak some nonsense languages using random words assembled in sentences with no grammar whatsoever).

However, machine translation is still mostly good for straight texts, without nuance, technical jargon, and stylistic oddities. It is still too much based on word for word translation, or translation of short segments. This often results in wrong choices in case of homonyms in the source language, wrong split of propositions in long sentences, lots of repetitions etc. Also, machines seem to ignore basic life facts, such as only female give birth. So the translation of “They gave birth to their babies” is invariably “Ils ont donné naissance à leurs bébés” and not “Elles ont donné naissance à leurs bébés”. More disturbingly, when we want to translate “he ate his date”, instead of “il a mangé sa date”, Google Translate provides “Il a mangé son rendez-vous” and DeepL even decides to add up slang to the delightful “Il a mangé son rencard“. Not very vegan.

That said, machine translation is generally a good feeder for Computer Assisted Translation, which brings us to the next mistake.

4) Blindly trusting the segment-based text proposed by our CAT software

Computer Assisted Translation speeds up translation massively. It saves all the time spent translating and typing trivial pieces of text such as “the red car”, “his name was Joe” and “the sky was gray and it was likely to rain”. However, CAT cannot be trusted blindly. CAT translation is based on segmentation. The text is split in small parts, containing one or a few sentences. The software then suggest translations for each segment.

Firstly, some of those translations might come from machine translation, e.g. Google Translate or DeepL. Thus, see point 3. But very often the translations come from Translation Memories. Translation memories come with their own problems. Sometimes the translations proposed are plainly wrong, with missing words or wrong sentence parsing (resulting in wrong adjective associations for adjectives or verbs for instance). Another important issue is error propagation. If a segment was badly translated once, and this translation was recorded in TMs, it will be proposed in future translations.

A very important issue is the fact that the translations proposed for a segment is done purely on this segment, independently of the content of other segments of the text. There is rarely enough context in a single segment to discriminate between different meanings of a term.

Finally, the segmentation largely follows the punctuation in the source language. Depending on the translation, for instance in literary works where one needs to keep a style and rhythm, the optimal split might be different in the target language. Fortunately, CAT tools offer segment split/merge facilities.

5) Assuming the source document is right

This is a thorny issue. The basic position is that the source language document is correct, and we need to faithfully translate it. But this is not necessarily the case. Everyone makes mistakes, even the most thorough writers. Some mistakes are easy to spot and to correct, and many should not affect the translation, such as unambiguous spelling errors. However, others will be much harder to detect. For instance, words with similar pronunciations in English (the ubiquitous “complimentary” for “complementary”, “add” for “had”, “your” for “you’re” or the dreadful “of” for “have”), or absence of accents (or incorrect ones) in French, will lead to completely wrong translations. In many case, the context will provide a quick answer, but sometimes a bit more brain juice is needed. We should always double check that we understood the text correctly, and that our chosen translation is the only one.

Finally, horror, some “errors” are made on purpose, for stylistic reasons. In the case of a novel or a play, wrong grammar or vocabulary might be part of the plot or a defining feature of a character. In that case, we probably must provide a translation that contain a correct equivalent of the initial erroneous text …

6) Forgetting to double check the punctuation

OK, that might actually be a specific version of the previous error. Translators are linguists, and as all linguists, we are in love with punctuation (aren’t we?). Is there anything that beats the Oxford comma as a favorite topic for conversation? (except perhaps split infinitives) Surprisingly enough, this is not the case of every person, or even every writer. Punctuation can be a life saver in the case of very long and complex sentences. It can also be a killer in case it is absent, or, heaven forbid, wrongly placed. For instance, observe the following bit of text:
“an off-flavour affecting negatively the positive fruity and floral wine aromas known as Brett character.”

What is the “Brett character”? (enlightened disciples of Bacchus, lower your hand). Is it the positive fruity and floral wine aromas? Or is it the off-flavour? It is, in fact, the latter, a metallic taste given by some yeast (from the genus Brettanomyces). Of course, the answer would be much clearer if the source sentence was:

“an off-flavour affecting negatively the positive fruity and floral wine aromas, known as “Brett character”.”

But let’s not add punctuation to Guillaume Appolinaire’s poetry, and keep Le Pont Mirabeau free of punctuation. Actually, the following translation of La Tour Eiffel might be one of the truest poetry translation ever, respecting the meaning, the style, and the shape.

7) Not paying attention to the mainstream use bias

This error is often a side-effect of using CAT tools with TMs or MT. The proposed translations will often rely on the most frequent meaning of a term, and its most frequent translation. This is not necessarily the meaning which is the right one, or the best one, for the current source document.

Sometimes, this is just irritating. For instance, in a literary text talking about “petits détours”, CAT will keep suggesting “small detours”. While this is correct, it does not fully convey the idea carried by “petits” here. It is too bland too quantitative, and “little detours” is the best translation, as shown here, here and here.

However, the mistake can be more severe. Google Translate tells us the story of a dreadful mum, “She put a bow in her daughter’s hair” being translated into “Elle a mis un arc dans les cheveux de sa fille”. That must have hurt terribly. As was the case for the poor lad who “entered a ball” and ended up “entré dans un ballon” (GT) or even “entré dans une balle” (DeepL), instead of “entré dans un bal”. Not much room to dance there. Sometimes, the mainstream use is actually overridden by the politically correct one, and the saucy “he was nibbling at her tit” is translated into “il mordillait sa mésange”. Except if we are talking about a cat, that is a disturbing image instead of a titillating one. While those examples were a bit joky, some cases are harder to spot. Someone who planted “Indian flags” in their garden will almost always end up in French exhibiting their nationalism rather than their love of irises.

In some cases, the various meanings have similar frequencies in daily use, and different tools provide alternative suggestions. DeepL will suits plumbers providing “installer un compteur” for “To set up a counter”, while Google Translate will lean towards merchants with “mettre en place un comptoir“.

8) Trying to stick 100% to the words of the source text

The true meaning of a word goes beyond its definition in a thesaurus. They carry different weight in different languages. The rude word meaning faeces is used as an interjection in almost every language. However, the level of rudeness is different in all western European countries, and sometimes choosing another rude word of the adequate level is better (no, we will not provide examples). And of course, there are very few cases where anyone should translate “it rains cats and dogs” into “il pleut des chats et des chiens”. One should always translate it into “il pleut comme vache qui pisse” (it rains as if a cow was pissing). While the new image is no so much better, at least no animal is hurt.

9) Trying to stick 100% to the structure of the source text

Trying to reproduce absolutely the structure of the source document is very tempting and encouraged by the segmentation process of CAT tools. However, this is lazy. English sentences are known to be shorter than French ones. Therefore, translating a sentence from the latter language might require several in the former. Let’s not speak of German where an entire sentence might end up in a single word! As usual, first comes the meaning, then the rhythm, then the style. Not only this requires to merge/split sentences, it might also require swapping propositions or sentences.

10) Not reading back the complete resulting translation

Last but not least, we should never forget to re-read attentively the entire translation. In the profession, proofreading is often mentioned as an activity disconnected from translation. But no translation work should be considered complete without a proofreading step! This is even more important if CAT software were used. They are known to promote “sentence salads”, where heterogeneous texts, in style and vocabulary, are caused by using the memory of many previous translations.

What about yourself? Which mistake did you make when learning how to become an accurate and efficient translator?

10 tips to model a biological system

By Nicolas Gambardella

You are about to embark on a system biology project which will involve some modelling. Here are a few tips to make this adventure more productive and pleasant.

1 – Think ahead

Do not start building the model without knowing where you are going. What do you want to achieve by building this model? Is it only a quick exercise, a one-off? Or do you want this model to become an important part of your current and future projects? Will the model evolve with your questions and the data you acquire? A model with a handful of variables, created to explore an idea quickly, and a model that will be parameterised with experimental measurements, whose predictions will be tested and that will be further expanded are two completely different beasts. Following the 9 tips below in the former case is overkill, a waste of time. However, in the latter case, cutting corners will cause you unending pain when your project unfolds.

2- Focus on the biology

A good systems biology model aims to be anchored in biological knowledge and even (generally) reflects the system’s behaviours’ biological mechanisms. We are using modelling to understand biology and not using biology to illustrate modelling techniques (which is a perfectly respectable activity but not the focus of this blog post). In order to do so, the model must be built from the processes we want to represent (hence complying with the Minimum Information Requested in the Annotation of Models). Therefore, try to build up your model from reactions (or transitions if this is a Petri Net, rules for a Rule-based model, influences for a Logic model), rather than writing directly the equations controlling the evolution of variables.

Another aspect worth a thought is the existence of different “compartments”. In systems biology, compartments are the “spaces” that contain the biological entities represented by your variables (the word has a slightly different meaning in PKPD modelling, meaning the variable itself). Because compartments can have different sizes, these sizes can change, and they can be used to affect other aspects of the models, it is important to represent them correctly, rather than ignoring them altogether, which was the case for decades.

Many tools have been developed to help you build models that way, such as (but absolutely not limited to) CellDesigner and the excellent COPASI. These software tools are, in general, very user-friendly and more approachable for biologists. An extensive list of tools is available from the SBML software guide.

3- Document as you build

Bookkeeping is a cornerstone of any professional activity, and lab notebooks are scientists’ best friends. Modelling is no exception. If you do not log why you created a variable or a reaction, what biological entities they represent, how you chose the initial values or the boundaries for parameter estimation, you will make your life down the line hell. You will not be able to interpret the results of simulations, modify the model, share it with collaborators, write a publication, etc. You must start this documentation as soon as you begin building the model. Memory fades quickly, and motivation even quicker. The biggest self-delusion (or plain lie) is “I am efficient and focused now, and I must get results done. I will clean up and document the model later.” You will most probably never clean up and document the model. And if you do, you will suffer greatly, trying to remember why the heck you made those choices before.

Several software tools, such as COPASI, provide means of annotating every single element of a model, either with free text, or with controlled annotations. Alternatively, you can use regular electronic notebooks, Google docs, and spreadsheets if you work with others etc. Anything goes, as far as you do create this documentation. Note that you can later share model and documentation at once, either with the documentation included in the model (for instance, in SBML notes and annotation elements) or with model and documentation shared as a single COMBINE Archive.

4- Choose a consistent naming scheme

This sounds like a mundane concern. But it is not! Variable and parameter names are the first layer of documentation (see tip 3). It also anchors your model in biology (tip 2). A naming scheme that is logical and consistent while easy to remember and use will also greatly facilitate future extensions of your model (tip 1). NB: we do not want to open a debate “identifiers versus accession number versus usable name” or the pros and cons of semantics in identifiers (see the paper by McMurry et al for a great discussion on that topic). Here, we talk about the short names one sees in equations, model graphs, etc.

Avoid very long names if not needed (“adenosine triphosphate”), but do not be over-parsimonious (“a”). “ATP” is fine. Short, explicit, clear for most people within a given context. Reuse common practices if possible, even if they are not official. For instance, uppercase K is mostly used for equilibrium constants, lowercase k for rate constants. A model using “Km” for the rate constant of DNA methylase and “kd” for its dissociation constant from DNA would be pretty confusing. Be consistent in the naming. If non-covalent complexes are denoted with an underscore, A_B being a complex between A and B, and hyphens denote covalent modifications, A-P representing the phosphorylated form of A, do not use CDP for the phosphorylated form of the complex between C and D (or, heaven forbid, C-D_P !!!)

5- Choose granularity and independent variables wisely

We often make two mistakes when describing systems in biology mathematically. The first one is a variant of the “spherical cow“. In order to facilitate the manipulation of the model, it is very tempting to create as few independent variables as possible (by variable, we mean here the things we can measure and predict). Those variables can be combinations of others, combinations sometimes only valid in specific conditions. Such simplifications make exploring the different behaviours easier, for instance, with phase portraits and bifurcation plots. A famous example is the two variable version of the cell cycle model by John Tyson in 1991. However, the hidden constraints might not allow the model to reproduce all the behaviours displayed by the biological system. Moreover, reverse engineering the variables to interpret the results could be difficult.

The second, mirroring, mistake is to try modelling the biological system in exquisite details, representing all our knowledge and thus creating too many variables. Even if the mechanisms underlying the interactions between those variables were known (which is most often not the case), the resulting model often contains too many degrees of freedom, effectively allowing any behaviour to be reproduced with some parameter values (making it impossible to falsify). It also becomes challenging to accurately determine the values of all parameters based on a limited number of independent measurements.

It is therefore paramount to choose the right level of granularity. There is no universal and straightforward solution, and we can encounter extreme cases. d’Alcantara et al 2003 represented calmodulin is represented by two variables (total concentration and concentration of active molecules). In Stefan et al 2008, calmodulin is represented by 96 variables (all calcium-binding combinations plus binding to other proteins and different structural conformations). Nevertheless, both papers study the biological phenomenon.

The right answer is to pick the variable granularity depending on the questions asked and the data available. A rule of thumb is to start with a small number of variables that can be matched (directly or via mathematical transformations) with the quantities you have measurements for. Then you can progressively make your model more complex and expressive as you move on while keeping it identifiable.

6- Create your relationships

Once you have defined your variables, you can create the necessary relationships, which are all the mathematical constructs that link variables and parameters together. Graphical software such as CellDesigner or GINsim permit to draw the diagrams representing the processes or the influences respectively.

Note that some software tools provide shorthand notations that permit the creation of variables and parameters directly when writing the reactions. This is very handy for creating small models instantly. However, I would refrain from doing so if you want to document your model properly (it also makes it easier to create spurious variables and “dangling ends” through typos in the variable names).

Working on the relationships after defining the variables also easily permits the model’s modification. For example, you can add or remove a reaction without having to go through the entire model as you would with a list of ordinary differential equations.

7- Choose your math

The beauty of mathematical models is that you can explore a large diversity of possible linkages between molecular species, actual mechanisms hidden behind the “arrow” representing a process. A transformation of X in a compartment into Y in another compartment can be controlled for instance by a constant flux (don’t do that!), a passive diffusion, a rate-limited transport, or even exotic higher-order kinetics. At that point, we could write: [insert clone of tip 5 here]. Indeed, while the mathematical expressions you choose can be arbitrarily complex, the more parameters you have, the harder it will be to find proper values for them.

If the model is carefully designed, switching between kinetics should not be too difficult. A useful habit to take is to preferentially use global parameters (which scope is the entire model/module) rather than parameters defined for a given reaction/mathematical expression. Doing so will, of course, ease the use of the parameter in different expressions and facilitate the documentation and ease future model extensions, for instance, where a parameter no longer has a fixed value but is affected by other things happening in the model.

8- Plug holes and check for mistakes

Now that you have your shiny model, you need to make sure you did not forget to close a porthole that would sink it. Do you have rate laws generating negative concentrations? Conversely, does your model generate umpteen amounts of certain molecules which are not consumed, resulting in preposterous concentrations? Software like COPASI have checks for this kind of thing. In the example below, I created a reaction that consumes ATP to produce ADP and P, with a constant flux. This would result in infinite concentrations of ADP and infinitely negative concentrations of ATP. COPASI catches it, albeit returning a message that could be clearer.

Ideally, a model should be “homeostatic”. All molecular species should be produced and consumed. Pure “inputs” should be produced by creation/import reactions, while pure “outputs” should be consumed by degradation/export reactions. Simulating the model would not lead to any timecourse tending to either +∞ or -∞.

9- Create output

“A picture is worth a thousand words”, and the impact of the results you obtained with such a nice will be greater if served in clear, attractive and expressive figures. Timecourses are useful. But they are not always the best way to present the key message. You want to show the effect of parameter values on molecular species’ steady-states? Try parameter scanning plots, and their derivatives, such as bifurcation plots. Try phase-portraits. Distributions of concentrations during stochastic simulations or after ensemble simulations can be represented with histograms. And why being limited to 2D-plots? Use 3D plots and surfaces instead, possibly in conjunction with interactive display (plot.ly …).

10- Save your work!

Finally, and this is quite important, save often and save all versions. Models are code, and code must be versioned. You never know when you will realise you made a mistake and will want to go back a few steps and start exploring a different direction. You certainly do not want to start all over again. Recent work explored ways of comparing model versions (see the works from the Waltemath group, for instance). But we are still some way off the possibility of accurately “diff and merge” as it is done on text and programming code. The safest way is to save all the significant versions of a model separately.

Month: April 2019