loader image

Improving Discoverability and Impact of African Resources

Technology has come a long way, and amazing inventions are coming up every day. Today, people have virtual assistants on their phones and can ask them whatever they want. Your car’s navigation system can provide you with a set of instructions to guide your way. But what if these solutions were in a language you don’t speak or understand? What if you had to learn a new language to properly utilise technology? For many, this isn’t just an imagination—it’s their reality.

There are over 2,000 languages spoken in Africa, but only a handful are represented in online resources. This isn’t because they lack the ability to convey information, entertainment, or knowledge effectively. Rather, it’s because they have not been given priority. Imagine if people could talk to Siri in Igbo, Dholuo, or Xhosa and have it understand them. This kind of accessibility remains a distant dream for many African language speakers.

Right now, a person somewhere in Africa is trying to conduct research in or on the language they speak every day. Picture a student researcher at a university in Uganda looking for speech datasets in Luganda for his end-of-year project or a program manager in Lesotho facilitating social welfare programs in Sesotho. They are looking for language resources that would help them build solutions that bring convenience or information to millions of people. As they go about their work, they find it difficult to get the necessary language resources they need. This is a situation that is all too familiar to Lanfrica’s founder, Chris Emezue.

While working on a research project on Fon, a language spoken in Benin, Chris and his team were finding it difficult to find any datasets to use for their project. Assuming they were the first to go into such an undertaking, they started from scratch and built their own model. However, after working tirelessly for months and eventually publishing their paper, they discovered that someone else had also been working on a similar project for Fon. If only they had known earlier, they could have collaborated and saved valuable time and effort.

“Usually, what we can’t find is deemed not to exist.”

The months they worked on that project highlighted a major problem in building language technologies for under-represented languages: the lack of discoverability and accessible information on less-popular languages. Such an issue isn’t unique to Fon; it affects countless languages and their speakers around the world. It is not that there isn’t any work going on in this respect. There are a lot of individual researchers, linguists, research communities, and universities working on African languages, but the resulting resources remain in silos, making them undiscoverable. This produces a challenge where anyone who wants to use these resources ends up reinventing the wheel or spends too much time trying to find them to the point of giving up. This has resulted in duplicate resources of varying quality and expertise instead of giving rise to robust resources.

At Lanfrica, we are committed to changing this narrative. We envision a world where anyone can find the resources they need without hassle. Our platform aggregates resources on African languages, making them easily discoverable and accessible. We believe that by connecting and linking these resources, we can help researchers avoid unnecessary duplication of efforts and thus foster greater collaboration. This, in turn, will lead to more development of language technologies that respect and leverage the unique characteristics of African languages.

“We can’t rely on our current information systems, database and archive managers as they have a blindspot in curating African resources and showing them to the relevant persons”.

In addition to building the platform, we also take an active role in collaborating with research communities to build more datasets for African languages. For example, one of our recent projects, NaijaVoices, has created 1,800 hours of speech data for Nigeria’s three main languages (Yoruba, Igbo and Hausa). This is the largest dataset of its kind and is a significant resource for researchers and technologists. Such projects bring us closer to seeing more innovative speech-related projects crop up and shape core industries in Africa like agriculture, healthcare, education, etc.

We also want to see newer generations appreciate their languages more than they abandon them for Western ones because of the perceived and sometimes real opportunities they offer. Languages like English, Spanish, and French are well-known globally, while widely spoken African languages such as Yoruba, Xhosa, and Amharic remain largely unheard of. We want a world where people can participate fully in all facets of life in the language they understand, express themselves in, share tales, educate themselves, and see themselves in.

One resource at a time, one language at a time, we are working towards the ideal future we have in mind.


Follow us on X or join the community Slack channel to become part of this journey!