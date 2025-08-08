Nine years ago, Google’s CEO Sundar Pichai first pledged that artificial intelligence would make information “ universally accessible ” to everyone, regardless of language.

He has continued to repeat that promise ever since, fuelling expectations around the world that technology would finally bridge linguistic divides and provide equal access to knowledge for all.

Yet for those who speak any of Africa’s more than 2,000 languages , that promise remains distant.

Millions across the continent still find that the advanced AI tools transforming agriculture, education, and daily life cannot understand or communicate in their own languages.

According to research, ChatGPT – which has 800 million weekly active users worldwide— recognises only 10 to 20 per cent of sentences written in Hausa , which is spoken by over 94 million Nigerians.

The same goes for other widely spoken African languages such as Yoruba, Igbo, Swahili, and Somali, all of which remain severely underrepresented in mainstream AI models despite having tens of millions of speakers.

But why have so many African languages been overlooked by today’s most powerful AI tools and what does this reveal about who gets to shape the digital future?

‘Low resource’ languages

One of the foremost and utmost reasons for African languages’ exclusion from AI is what researchers call the “low-resource” problem.

In this context, “ low-resource ” refers to the scarcity of online materials such as websites, books, and transcripts available in those languages.

Since most large language models (LLMs) rely on huge volumes of such digital data to learn and generate text, the vast majority of this data is in English (high-resource) or a handful of other widely spoken global languages in the West.

“Our measure for progress and research agenda is based on what works for Western languages,” says Hellina Hailu Nigatu, an AI researcher focused on LLMs at the University of California, Berkeley.

The lack of training data leaves AI models like ChatGPT or Gemini struggling to recognise, generate or even meaningfully “see” African languages, no matter how many people speak them.

“African languages are categorised as ‘low-resource’ and are usually excluded, or even when they are included, systems perform poorly on them,” she tells TRT World.

This classification system that divides the world's languages into "high-resource" and "low-resource" categories has become the industry's preferred framework for discussing this disparity.

Commercial incentives, systemic bias and cost issue

Another reason for underrepresentation is the priorities of global AI research and development.

Research shows that large language model (LLM) outputs lean towards “ Western stereotypes ”.

The standards are set mostly by Western tech companies and academic institutions, which focus on languages with the largest online footprints and most funding directed towards a small group of “high-resource” languages.

As a result, African languages are rarely prioritised for investment or innovation.

Commercial incentives also play a major role. Since the immediate economic returns from African language markets are limited, companies have little motivation to dedicate time and resources to improving AI support for these languages.

This structural bias is reinforced by the datasets used to train AI models.

Even when African languages are included, the systems often adopt Western cultural assumptions , sometimes misrepresenting local contexts or perpetuating stereotypes.

The findings align with broader research on algorithmic bias.