Join our mailing list to get updates on our events, news, and the latest from the world of African language resources.

Your email is safe with us. We promise not to spam!
Please, consider giving your feedback on using Lanfrica so that we can know how best to serve you. To get started, .
X
Filter

Filter Records

Languages

Loading...

Tasks

Loading...

Record Types

Loading...

Tags

Loading...

The African Storybook (ASb) is a multilingual literacy initiative that works with educators and children to publish openly licensed picture storybooks for early reading in the languages of Africa. An initiative of Saide, the ASb has an interactive website that enab...

Expand Abstract

Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represente...

Expand Abstract

We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Final...

Expand Abstract

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural ...

Expand Abstract

AfroLID is a powerful neural toolkit for African languages identification which covers 517 African languages....

Expand Abstract

This is a multilingual parallel corpus created from translations of the Bible compiled by Christos Christodoulopoulos and Mark Steedman. 102 languages, 5,148 bitexts total number of files: 107 total number of tokens: 56.43M total number of sentence fragments: 2.84...

Expand Abstract

BibleTTS is a large high-quality open Text-to-Speech dataset with up to 80 hours of single speaker, studio quality 48kHz recordings for each language. We release aligned speech and text for six languages spoken in Sub-Saharan Africa, with unaligned data for four ad...

Expand Abstract

BibleTTS is a large, high-quality, open speech dataset for ten languages spoken in Sub-Saharan Africa. The corpus contains up to 86 hours of aligned, studio quality 48kHz single speaker recordings per language, enabling the development of high-quality text-to-speec...

Expand Abstract

A dataset of over 700 different languages providing audio, aligned text and word pronunciations. On average each language provides around 20 hours of sentence-lengthed transcriptions. Data is mined from read New Testaments from http://www.bible.is/

This paper describes the CMU Wilderness Multilingual Speech Dataset. A dataset of over 700 different languages providing audio, aligned text and word pronunciations. On average each language provides around 20 hours of sentence-lengthed transcriptions. We describe ...

Expand Abstract

Names generally identify an individual and set him/her apart from other individuals. The Ewe (ISO639-3 [ewe])-speaking people of Ghana are known to have names which reflect an idea that a name giver or name bearer intends to put across. These names could be given d...

Expand Abstract

Gboard is a virtual keyboard app developed by Google for Android and iOS devices.

Bolingo Consult is a female-led LSP that makes localization for African Languages a seamless pro-cess. We have experience in navigating the complexities in localization for African languages. Our services range from translation, interpretation and media localizat...

Expand Abstract

Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represente...

Expand Abstract

The Mandla dictionary is an online open source crowd sourced dictionary for African languages. It features definitions, etymology, and example sentences written in the native language using both indigenous scripts and latin script, as well as parallel definitions w...

Expand Abstract