About Lanfrica
About the Lanfrica technology
We aim to organize under-represented African knowledge to accelerate their discovery and impact. This means that we want to connect all under-represented resources, wherever they are, in whatever format they are.
How Lanfrica Works
"Lanfrica" is a comprehensive solution for mapping the African AI ecosystem. Broadly speaking, it consists of three interconnected modules - Finding, Organization, and the Platform.
At the heart of our methodology is a focus on metadata. For every resource we index, we work solely with its metadata, which is the descriptive information about the record. This metadata-centric approach allows us to link to resources while respecting the copyright, ownership, source, and infrastructure of the original record.
Here is an example of a record linked on Lanfrica, displaying its associated metadata: https://lanfrica.com/record/african-storybook-project
1. Finding
This module handles the large-scale, semi-automated effort of discovering and identifying African AI resources across the web. Our primary method is metadata harvesting. We designed our internal metadata harvesting engine, which utilizes the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) on a large scale, allowing us to connect and link disparate sources of information under one unified system.
A central work of Lanfrica is determining which resources from these vast sources are relevant to African AI. For example, there are 2M+ papers on arXiv. However, only a tiny percentage of these pertain (or relate directly/indirectly) to African AI.
2. Organizing
Here, we organize the vast African AI resources to provide value. Some of this work entails determining the resource's type (e.g., dataset, paper), its domain, the African languages it covers, its modality (e.g., text, speech), and performing some quantitative analyses. At Lanfrica, we prioritize reliability and quality. It is crucial that the information displayed on our platform can be trusted.
3. The Platform
The final module is our Open Access Platform, accessible at https://lanfrica.com. This platform serves as a gateway to African innovation by allowing users (across different segments like developers, engineers, researchers, decision-makers, and policymakers) to discover the full spectrum of African AI datasets, papers, models, and use cases.
Our Focus
The focus of Lanfrica is illustrated in the image below. Our aim is to connect, and bring together, all the different tools, datasets, materials, and communities used to build inclusive and empowering technologies for under-represented communities. All these are what we call "resources".

Types of Records We Link
When we say "resources" or "records", what we refer to includes (but is not limited to) the following:
Datasets
These are machine learning/AI datasets — both raw and processed — used to train machine learning technologies. It's important to note here that this includes the "raw" datasets, which are data that are not processed into a machine learning dataset (or not in a form that a machine learning person would immediately use), but are useful. Many African datasets exist in raw form, and part of creating inclusive technologies involves bringing these raw datasets to light. A good example of a raw dataset brought to light is this.
Linguistic tools
These can include dictionaries, corpora, keyboards, lexical databases, or other text- and speech-based materials essential for language study and technology. It’s important to note that these resources often help in preserving and promoting African and indigenous languages, supporting tasks like language learning, translation, and speech recognition. A good example might be a bilingual corpus used to develop a machine translation system for a local language. Another, more concrete example is a custom keyboard layout designed to accurately type characters in a language like Amharic or Tigrinya.
Papers
These are scholarly articles or peer-reviewed publications presenting original research, methodologies, or theoretical discussions. It’s important to note that they delve deeply into specialized topics, often contributing significant new insights to language technology, health, finance, agriculture, or related fields.
Articles
These are more general pieces of writing—such as blog posts, news stories, or online media—that provide accessible information or commentary on specific subjects. It’s important to note that articles can be more approachable than academic papers, often serving as a bridge between technical research and a wider audience. A good example could be a news article highlighting recent policy changes to support local language education.
Policy documents
These are official or organizational guidelines, regulations, and strategic frameworks that shape how languages and technologies are developed and adopted. It’s important to note that policy documents can significantly influence research funding, implementation strategies, and long-term preservation efforts for African and indigenous languages. A good example might be a government directive mandating the use of local languages in public services, thereby boosting their visibility and usage.
Language Focus
Our current primary focus centers on African language resources and indigenous knowledge systems, working to amplify and preserve the rich linguistic heritage of African communities.
While we are presently concentrated on African languages, our long-term vision includes expanding to support the languages and indigenous knowledge of other communities worldwide, creating a truly inclusive platform for underrepresented linguistic resources.
Domain Focus
Our platform connects records across specific domains where there is significant potential for impact and collaboration:
Language: we link resources involving African languages, with particular emphasis on language technology and natural language processing (NLP) applications. This includes datasets, models, tools, and research that can be used to build translation systems, speech recognition applications, text analysis tools, and other language technologies that serve African language communities.
Health: we link health-related resources, including medical datasets, clinical research, public health studies, policy documents, and healthcare tools that address health challenges and opportunities across African contexts.
Finance: we link resources related to financial inclusion, mobile banking, microfinance, economic development, and financial technology solutions relevant to African markets and communities.
Agriculture: we link agricultural resources, including crop research, farming techniques, climate adaptation strategies, food security studies, and agricultural technology innovations that support sustainable farming practices and food systems.
Climate: we link climate-related resources, including environmental datasets, climate modeling research about the African continent.
Last updated
Was this helpful?