News

May 4, 2026

Making Invisible Languages Exist

When artificial intelligence ignores a language, it accelerates its disappearance. Montreal researchers are tackling this problem, and their work could concretely change the lives of millions of people.

Imagine not being able to describe your symptoms to a doctor. Not because you don’t know what you’re feeling, but because your language simply doesn’t exist in the digital tools available. No reliable automatic translator. No voice assistant that understands you. Someone who speaks your language has to accompany you to every appointment just so you can receive proper care.

This is the daily reality for millions of people whose language is absent from the digital world. And that absence won’t resolve itself. It’s getting worse, as artificial intelligence establishes itself as the invisible infrastructure of our societies, reproducing and amplifying existing linguistic inequalities.

According to the United Nations, an average of two Indigenous languages disappear every month worldwide. In Canada, half of the fifty to ninety First Nations languages could go extinct within the next ten to fifteen years. Meanwhile, the large language models that power voice assistants, translation tools, and medical and educational interfaces are trained almost exclusively on high-volume European languages.

This is not a technical inevitability. It’s the result of choices, says David Adelani, researcher and professor at McGill University.

“Someone in San Francisco decides which languages matter. Does that person even consider that fifty million people in Africa speak a language that is completely invisible to technology?”

A data problem, a power problem 

AI runs on data. Languages that are underrepresented online are ignored by models. The result is a paradox that few people notice: Estonian, spoken by one million people, receives better technological support than Hausa, spoken by tens of millions, simply because Estonia generates abundant digital data. The size of a linguistic community doesn’t determine its place in AI. Its presence on the web does.

For Canada’s Indigenous languages, the challenge runs even deeper. Many are traditionally oral. Written forms were often imposed on them through colonization. Developing digital tools for these languages cannot simply replicate existing models. It requires an entirely different approach.

That is precisely the approach being developed by Adelani and his doctoral student Marie Maltais, through a project funded by IVADO’s IAR3 program. 

Building together

Marie Maltais grew up in Montreal without ever really hearing about the Indigenous languages present in her own city. It was through sociolinguistics at Université de Montréal that the reality caught up with her. “You realize that for so many other people, their language isn’t recognized at all,” she says. “And it’s also deeply important to them — for their education, their governance, their culture.” 

That realization shaped her entire path, from the University of Edinburgh — where minority Celtic languages hold a central place — to Adelani’s lab at McGill.

What drew her there was a philosophy of research that is rare in the field: communities are not data providers. They co-build the tools, guide the technological choices, and decide how to govern what they produce. 

David Adelani has been applying this approach for years across Africa. His datasets have been used by African startups to develop their own models. His latest project, AfriqueLLM, covers twenty African languages and represents one of the most complete attempts to date to create an open large language model dedicated to African languages.

Alongside that construction work, his evaluation research now spans more than 200 languages across Africa, Asia and the Americas. The finding is consistent: existing large models perform significantly worse in these languages than in dominant ones.

But he is emphatic about a point that major tech initiatives consistently overlook: collecting data is not enough. “If the community doesn’t have the technical know-how to use its own data, the technology will be sold back to them,” he says. What he wants to build is the opposite: communities capable of shaping their own tools. 

Marie Maltais lives this reality in her current research with an Irish university. What matters, she says, is not just collecting data, but maintaining regular human contact with the communities that produce it. “It’s not the same as being able to ask my questions face to face,” she says. For communities whose oral tradition is central to their identity, that direct contact isn’t a formality. It’s the foundation of trust

What it could change 

IVADO’s funding has allowed David Adelani to explore a question he couldn’t otherwise have pursued: can tools be created for endangered languages using what already exists — dictionaries, grammars, lexicons — without waiting for massive corpora that may never materialize? It’s an approach directly applicable to Canada’s Indigenous languages.

The potential applications are concrete. A text synthesis and simplification tool, usable on a phone, so a child can learn in their mother tongue. A fully voice-based system, with no text transcription, for communities whose identity is rooted in oral tradition. And that mother who can’t go to the doctor alone: a tool that would let her communicate with a health professional in her own language, without an intermediary, without an added layer of friction in an already difficult moment.

“The idea is that the tools created should be useful to the communities themselves and respond to an existing need. Going to see a doctor is already stressful,” says Maltais. “If we can remove that point of friction, that’s really what I’m working towards.”  

Because not all of these languages are doomed. Some are critically endangered; others are beginning a journey of reclamation. In Canada, more than one in four speakers of Indigenous languages learned their language as a second language — proof that transmission can resume even after it has been broken. Young Indigenous people from across the country are answering the call to revitalize the languages of their ancestors, and in some cases communities are working to revive languages with few or no remaining fluent speakers. In this context, accessible digital tools designed with communities wouldn’t only help slow a disappearancethey could contribute to the awakening of dormant languages.  

Partnerships yet to be built

The collaboration with Canadian Indigenous communities remains to be built. No structured partnerships exist yet. But the methods are proven, the commitment is real, and the IAR3 program aims to build those bridges.

Technology is not the only answer to protecting endangered languages. But the right tools, built for communities, placed in the right hands and mastered by them, can help some survive. Not merely as heritage to be preserved, but as living languages that parents pass on to their children, that students use to learn, that patients use to communicate with medical staff without a go-between. 

That is where the true reversal lies: not in placing technology in the hands of communities, but in putting them in a position to design it according to their own needs and choose what they do with it. 

David Ifeoluwa Adelani is an IVADO Professor and Assistant Professor at the McGill University School of Computer Science, a member of Mila, and a CIFAR AI Chair. Marie Maltais is a doctoral student in computational linguistics at McGill University and a member of Mila. Both collaborate with IVADO through the IAR3 program, NLP cluster, low-resource and endangered languages stream.