24/11/2022
Why are doctors male and nurses female? It seems like a ridiculous question, but when it comes to machine translation it’s a major problem.
In this article, we have Manuel Lardelli and Dr. Dagmar Groman talk about the problem of gender bias in machine translation. They presented their talk on “Gender-fair (machine) translation”, where they draw upon the perspectives of both machine translation and translation studies to survey studies addressing gender beyond the binary of male and female.
We ask a number of questions regarding gender bias in machine translation and the challenges in addressing this problem. Let’s dive in.
When translating a text to another language, it is important to correctly refer to the gender of individuals or groups of individuals. Translating a reference to a woman as male or to a non-binary person as female, is a mistranslation or, in this case, an instance of misgendering.
Mistranslations can be in part due to how language structures vary in regard to gender. For instance, English is a notional gender language: some (few) nouns and third person singular pronouns are gendered. Other languages, such as Italian and German, are highly inflected and gender can be found in articles, adjectives and other word classes.
Also, some nouns evoke stereotypical associations and are associated with one specific gender (e.g. doctor-male, nurse-female). Thus, individuals must sometimes be referenced in translation, but their gender is unknown.
Large existing text corpora frequently show a certain bias towards a certain gender. Specifically, there is dominance of the male gender with far fewer female forms. Additionally, gender-fair language is extremely underrepresented due to the fact that strategies to address non-binary individuals in language are rather recent.
An example for English would be the use of singular they to address an individual if their gender is unknown. In other languages, there is no such use and male generics are often used. Machine translation models trained on biased corpora learn the bias from the corpus, leading to misgendering in the models’ output.
Misgendering in machine translations automatically propagates gender bias existing in the training materials. On the one hand, this entails creating erroneous translations in terms of gender references. On the other hand, the wide and often unaware use of MT, e.g. in social media or websites, can lead to the erasure of non-binary genders from the texts without users knowing. Here it is also important to highlight that several studies have shown that misgendering causes emotional pain to referenced individuals. Thus, addressing gender bias is important to create correct and acceptable machine translations.
At the NeTTT conference, we presented a literature overview of studies in the field of gender-fair (machine) translation that specifically address non-binary individuals. We use machine in brackets since we tried to bring together both translation studies (TS) and machine translation.
Our findings show that non-binary genders are still largely neglected within both TS and MT. Specifically, research from the TS is limited to few language pairs, e.g. English-Spanish. There are also very few peer-reviewed publications on the topic and many studies are bachelor’s or master’s theses. This highlights a greater interest from young scholars that are not still established in their research field.
In MT, many researchers recognize that gender is not binary, but they still do not address the topic because there is a lack of preliminary work. We found only two studies in which it was tried to debias machine translation beyond the binary, but the results were generally low.
Current approaches that address gender bias in MT focus on a binary conception of gender, i.e., ensuring that male and female gender references are translated correctly. Methods to this end range from tagging gendered sequences in text to allow an MT transfer to any kind of gender-specific language to fine-tuning or adapting already trained models with gender-specific texts.
The main challenge to enable gender-fair MT, that is, correct referencing of all genders in a non-binary conception, is the lack of available text corpora. Furthermore, a stronger consideration of gender beyond the binary in MT research would enable more innovative techniques to debias MT models.
It has become very clear that gender-fair language strategies, that is, strategies that address all or no genders, are very specific to the language in question since they depend on its grammatical system. For German, different and considerably more strategies have been proposed than for English or Italian. Thus, one key challenge is to provide a flexible MT solution that might be able to provide more than one specific gender-fair language strategy in the MT output.
We would also need evaluation metrics that specifically address gender bias, as this is not taken into account in translation quality assessment. Once again, the very few approaches that try to spot gender biases and assess translation outputs accordingly are generally limited to the male and female gender.
This article is part of a series that takes a deeper look at the research presented at the 2022 NeTTT conference. You can find the rest here:
MT is not the future but the now: Highlights from the NeTTT conference (Day 1)
Context is key in MT: Highlights from the NeTTT conference (Day 2)
Towards better MT: Highlights from the NeTTT conference (Day 3)