Join our mailing list to get updates on our events, news, and the latest from the world of African language resources.

Your email is safe with us. We promise not to spam!
Please, consider giving your feedback on using Lanfrica so that we can know how best to serve you. To get started, .
X
Filter

Filter Records

Languages

Loading...

Tasks

Loading...

Record Types

Loading...

Tags

Loading...

The African Storybook (ASb) is a multilingual literacy initiative that works with educators and children to publish openly licensed picture storybooks for early reading in the languages of Africa. An initiative of Saide, the ASb has an interactive website that enab...

Expand Abstract

A first set of African Language Embeddings Word Embeddings Document Embeddings Currently Supporting Sepedi (Northern Sotho), nso Setswana, tsn

In promoting a multilingual South Africa, the government is encouraging people to speak more than one language. In order to comply with this initiative, people choose to learn the languages which they do not speak as home language. The African languages are mostly ...

Expand Abstract

AfriCLIRMatrix is a test collection for cross-lingual information retrieval research in 15 diverse African languages. This resource comprises English queries with query–document relevance judgments in 15 African languages automatically mined from Wikipedia...

Expand Abstract

Language diversity in NLP is critical in enabling the development of tools for a wide range of users.However, there are limited resources for building such tools for many languages, particularly those spoken in Africa.For search, most existing datasets feature few ...

Expand Abstract

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural ...

Expand Abstract

AfroLID is a powerful neural toolkit for African languages identification which covers 517 African languages....

Expand Abstract

Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large la...

Expand Abstract

Unlike major Western languages, most African languages are very low-resourced. Furthermore, the resources that do exist are often scattered and difficult to obtain and discover. As a result, the data and code for existing research has rarely been shared, meaning re...

Expand Abstract

This version of the Bloom Library data is developed specifically for the language modeling task. It includes data from nearly 400 languages across 35 language families, with many of the languages represented being extremely low resourced languages. Note: If you sp...

Expand Abstract

CCAligned consists of parallel or comparable web-document pairs in 137 languages aligned with English. These web-document pairs were constructed by performing language identification on raw web-documents, and ensuring corresponding language codes were corresponding...

Expand Abstract

Cross-lingual document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other. In this paper, we exploit the signals embedded in URLs to label web documents at scale with an average preci...

Expand Abstract

When it comes to scientific communication and education, language matters. The ability for science to be discussed in local indigenous languages can not only help expand knowledge to those who do not speak English or French as a first language but also can integrat...

Expand Abstract

The development of linguistic resources for use in natural language processing is of utmost importance for the continued growth of research and development in the field, especially for resource-scarce languages. In this paper we describe the process and challenges ...

Expand Abstract

Open Resource Term Bank (OERTB) project is to support the collaborative development and dissemination of terminological resources, and thereby promoting the use of African languages in teaching and learning at higher education institutions....

Expand Abstract