loader image

How To Scale Language Data Ecosystems to Drive Industrial Development Growth

*Recommendations from a discussion paper co-designed with the G7 Italian Presidency, and implemented by the UNDP, with Lanfrica as an expert contributor.

Africa is home to over 2,500 languages, yet they remain significantly underrepresented in the digital space. As the world embraces artificial intelligence (AI), many sectors such as healthcare, education, and financial technology (fintech) are changing. By 2030, AI is expected to add $15.7 trillion to the global economy. However, Africa remains mostly underserved in this development.

Currently, most AI chatbots are trained on only about 100 of the world’s 7,000+ languages, with African languages rarely included. AI systems that do not understand local languages or cultural contexts cannot serve African users effectively. This issue becomes clearer every day. For example, an AI life coach might be free and widely used in the U.S. but might remain inaccessible in many parts of Africa.

Lanfrica was one of the 70 innovators from 17 African countries who collaborated with UNDP and the G7 Italian Presidency to address this disparity. Together, we analysed the current state of multilingual AI and co-authored a discussion paper on embedding linguistic diversity into AI systems to unlock greater economic development. 

We identified four key areas that, if supported and implemented, could help Africa build stronger and fairer AI systems that promote language diversity and inclusion.

1. Amplify Awareness and Build Momentum

Government and public engagement in local language digitisation remains low. This creates poor policy environments and limits support. Some governments are taking proactive steps, which is encouraging. Nigeria’s Ministry of Communications has partnered with local startup Awarri to build the country’s first government-backed large language model (LLM). South Africa continues to uphold multilingualism through initiatives like PanSALB.

2. Foster Collaboration Among Innovators

Across the continent, communities like Masakhane, researchers, and nonprofits are actively documenting and digitising languages. These efforts are often isolated. Improving visibility and collaboration across innovation hubs can lead to improved outcomes, reduced duplication, and foster a co-creation of impactful solutions among funders and researchers.

The discussion paper also advocates for the support of initiatives like Lanfrica and AfricArxiv, which catalogue and maintain logs of ongoing work, past research and existing datasets, aiding in better discoverability and supporting collaborative efforts to avoid duplication of efforts. 

3. Advance Inclusive Data Collection and Cataloguing

We face limited technical and linguistic expertise, and building robust datasets requires time and trust. Open platforms like DVoice (by ToumAI Analytics) enable the collection and transcription of speech data in languages such as Swahili, Hausa, and Wolof.We need more resources for linguists, translators, and community members to scale this work and create context-aware solutions, such as SMS or USSD tools in areas with limited internet access.

4. Scale Data-Sharing and Secure Data Rights

Data governance remains a significant challenge. Many projects rely on untracked or poorly licensed data, which raises ethical concerns. Solutions include adopting community-led licensing models, piloting responsible data frameworks, and compensating contributors fairly. Initiatives like Mozilla’s licensing work in Kenya or Equiano Institute’s provenance research are paving the way.

The work ahead is crucial and demands shared responsibility. Governments, institutions, civil society, and innovators must take deliberate steps to create an environment that fosters the development of ethical and inclusive AI. While African-led solutions are emerging, their growth and adoption depend on systems that offer a fair playing field. Without this, we risk being left behind once again.

These recommendations were developed during the  AI Hub for Sustainable Development’s Local Language Partnership Accelerator Pilot, which explored how effective and ethical partnerships can accelerate the growth and adoption of Language AI technologies for sustainable local innovations.

You can download the full discussion paper here.