Join our mailing list to get updates on our events, news, and the latest from the world of African language resources.

Your email is safe with us. We promise not to spam!
Please, consider giving your feedback on using Lanfrica so that we can know how best to serve you. To get started, .
X
Filter

Filter Records

Languages

Loading...

Tasks

Loading...

Record Types

Loading...

Tags

Loading...

The African Storybook (ASb) is a multilingual literacy initiative that works with educators and children to publish openly licensed picture storybooks for early reading in the languages of Africa. An initiative of Saide, the ASb has an interactive website that enab...

Expand Abstract

Extracting functional themes and topics from a large text corpus manually is usually infeasible. There is a need to build text mining techniques such as topic modeling, which provide a mechanism to infer topics from a corpus of text automatically. This paper discus...

Expand Abstract

Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represente...

Expand Abstract

MAD-X adapters trained on AfroXLMR-base, it has the same configuration as XLMR-base....

Expand Abstract

Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, espe...

Expand Abstract

African Voices is a collaborative project that aims to collects high-quality speech (tts) datasets and synthesizers for all African languages. You can search datasets and synthesizers by language. You can also synthesize text from your synthesizer of choice. Additi...

Expand Abstract

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural ...

Expand Abstract

AfroLID is a powerful neural toolkit for African languages identification which covers 517 African languages....

Expand Abstract

This dataset consists of 193 agricultural keywords in English and Luganda. 64 keywords are in English and 129 keywords are in Luganda. The list of keywords was compiled by obtaining the counts of the most used agricultural words in Luganda radio discussions and onl...

Expand Abstract

This English-Luganda parallel sentence corpus was created by a team of researchers from AI & Data science research Lab at Makerere University with a team of Luganda teachers, students and freelancers. The collaborative work which involves generating English sentenc...

Expand Abstract

Transfer learning has led to large gains in performance for nearly all NLP tasks while making downstream models easier and faster to train. This has also been extended to low-resourced languages, with some success. We investigate the properties of transfer learning...

Expand Abstract

BibleTTS is a large high-quality open Text-to-Speech dataset with up to 80 hours of single speaker, studio quality 48kHz recordings for each language. We release aligned speech and text for six languages spoken in Sub-Saharan Africa, with unaligned data for four ad...

Expand Abstract

BibleTTS is a large, high-quality, open speech dataset for ten languages spoken in Sub-Saharan Africa. The corpus contains up to 86 hours of aligned, studio quality 48kHz single speaker recordings per language, enabling the development of high-quality text-to-speec...

Expand Abstract

This version of the Bloom Library data is developed specifically for the language modeling task. It includes data from nearly 400 languages across 35 language families, with many of the languages represented being extremely low resourced languages. Note: If you sp...

Expand Abstract

Neural machine translation (NMT) has achieved great successes with large datasets, so NMT is more premised on high-resource languages. This continuously underpins the low resource languages such as Luganda due to the lack of high-quality parallel corpora, so even ‘...

Expand Abstract