29/11/2022
Machine translation, like a lot of technology, holds many promises when it comes to giving more widespread access, in this case through language. But is it able to deliver on those promises, or will it fall prey to the risk of perpetuating inequalities?
In this article, we catch up with Matt Riemland, who presented his talk titled “Machine translation and technocracy: Mitigating issues of power parity in MT for low-resource languages” at the NeTTT conference this year. Here, he talks about how machine translation can impact speakers of low-resource languages, and what needs to be done to mitigate the risks of the technology toward these marginalized communities.
My presentation focused on the ways in which low-resource machine translation (MT) systems may intertwine with social inequalities in humanitarian contexts. Although researchers have long envisioned that machine translation would be a tool used for humanitarian purposes, progress towards these altruistic ambitions has been somewhat disappointing.
MT has greatly enhanced translation practices in commercial industries, of course, but it’s still largely absent from humanitarian work, by which I’m primarily referring to global development initiatives and crisis responses. (There are notable exceptions, such as Translators without Borders’ Gamayun initiative.)
Part of the reason for this gap is that the marginalized (typically Global South) communities targeted by these humanitarian efforts generally speak low-resource languages. Essentially, there is far less training data available for these languages, so they’re much less conducive to modern data-driven MT architectures.
A lot of recent research attempts to solve—or at least mitigate—the training data scarcity problem from a technical standpoint. But there hasn’t been as much focus on the potential social consequences of actually implementing MT systems where they’re assumed to be useful.
This limited focus reflects a common critique of mainstream global development—namely, the singular reliance on technical solutions and the disregard for their social impacts. In fact, research has shown that development initiatives that introduce new information and communication technologies (ICTs) may actually worsen inequalities for marginalized communities when they are designed without consideration of social impacts.
My presentation contemplated what conditions would be necessary for low-resource MT systems to avoid exacerbating social inequalities, and suggested some ways these systems could help local communities take control of their own well-being.
I adopted that term from Chipidza and Leidner, who define power parity as “equality in the control of resources and information”. According to them, the implementation of ICTs—like machine translation—into development settings can boost power parity by satisfying two conditions. First, ICTs must strengthen local communities’ ability to define their own views, needs, and goals. And second, local communities’ use of ICTs must not require a long-term reliance on resources, funding, or expertise provided by external actors.
Technologies that fail these criteria will lead to an imbalance of power. Low-resource MT in humanitarian settings is particularly susceptible to these risks, given the enormous gap between language technologies for high-resource and low-resource languages, as well as the humanitarian sector’s strong bias toward lingua francas like English.
Again, it goes back to the two conditions for achieving power parity in development-oriented ICTs, which can be called “effective voicing” and “resource independence”. A long-standing criticism of mainstream development practices is that they ultimately divide stakeholders into two distinct groups, perpetuating a severe power imbalance between them. There are technocratic development planners (typically from the Global North) who design and control development initiatives, and then there are local communities (typically from the Global South) who are relegated to passive beneficiaries of these efforts.
Improperly designed ICTs like machine translation may create a situation in which local communities’ voices are even more filtered through the lens of dominant groups, and in which the technology’s continued use entails a perpetual reliance on outside expertise and/or resources.
For machine translation, an example of improper design might be a system whose interface isn’t user friendly, and/or whose effectiveness depends on a level of training or technical knowledge that local communities never reach.
There’s been a fair amount of recent research highlighting the importance of MT literacy among end users. It’s not hard to imagine a scenario in which ill-equipped users of low-resource MT never come to operate the system effectively on their own.
If they don’t understand how the technology works, what kinds of texts it is suitable for, or how to post-edit MT output effectively, they won’t get much use out of it. Not to mention the risk of translation errors that might get overlooked by post-editors who aren’t accustomed to working with MT output.
Well, imagine a future scenario in which a large, well-funded NGO—operating in a major international language like Spanish—implements an MT system that allows them to translate materials about their development projects from Spanish into the language of a local indigenous community. If the NGO is uncritical about its unilateral decision-making, and if its goal is merely to inform the indigenous community members of whatever they decide, the organization may consider this one-way communication channel to be a success. The development planners may believe that they’re promoting inclusivity by translating into a low-resource language, when in practice they’ve simply reinforced that the indigenous community is merely there to receive instructions.
In order to mitigate the risk of low-resource MT exacerbating inequalities, there needs to be close collaboration with marginalized language communities. Even tech giants with seemingly unlimited funding, resources, and expertise recognize the need for local linguists’ involvement in the process of improving and expanding their massive state-of-the-art MT systems.
Research teams from Google, Meta (Facebook), and Microsoft all emphasize that incorporating low-resource languages into multilingual neural machine translation models necessitates substantial input from local linguists. But free-to-use online NMT services still come with a number of significant risks that end users aren’t necessarily aware of. And even though these free MT services are increasing their coverage of low-resource languages, it would be much better to develop MT systems that marginalized language communities can operate somewhat independently.
If MT experts want to assist local communities from the outside, they could jumpstart or guide the creation of low-resource MT systems, then promote independent use by offering workshops or developing open-access training materials, such as INTERACT’s crisis translation tutorials or the recently published book Machine translation for everyone (2022). Any external support would aim to help marginalized language communities eventually achieve independent management of MT systems.
Ultimately, the responsibility of mitigating MT risks would fall on the communities themselves, but that increased responsibility would go hand in hand with an increased ability to control their own development.
This article is part of a series that takes a deeper look at the research presented at the 2022 NeTTT conference. You can find the rest here:
MT is not the future but the now: Highlights from the NeTTT conference (Day 1)
Context is key in MT: Highlights from the NeTTT conference (Day 2)
Towards better MT: Highlights from the NeTTT conference (Day 3)