Join our mailing list to get updates on our events, news, and the latest from the world of African language resources.

Your email is safe with us. We promise not to spam!
Please, consider giving your feedback on using Lanfrica so that we can know how best to serve you. To get started, .
X
Filter

Filter Records

Languages

Loading...

Tasks

Loading...

Record Types

Loading...

Tags

Loading...

This English-Luganda parallel sentence corpus was created by a team of researchers from AI & Data science research Lab at Makerere University with a team of Luganda teachers, students and freelancers. The collaborative work which involves generating English sentenc...

Expand Abstract

Common Voice is Mozilla's initiative to help teach machines how real people speak. The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.

This project collected text and speech corpora for three languages in Kenya: Kiswahili, Dholuo and 3 Luhya dialects (Lumarachi, Logooli and Lubukusu). Primary data was collected from the respective language communities, which included Indigenous stories and narrati...

Expand Abstract

This project developed a Part of Speech (POS) Tagged dataset of 2 languages in Kenya: Dholuo and 3 Luhya dialects (Lumarachi, Lulogooli, and Lubukusi). The project tagged approximately 143,000 words, which includes about 50,000 words for Dholuo, 27,900 words for Lu...

Expand Abstract

This speech dataset includes both read and spontaneous speech recordings, recorded in Kenya with native Swahili speakers. In total this dataset includes 27 hours 31 minutes 50 seconds of speech data from 26 speakers, that is, 19 females and 7 males. The recordings ...

Expand Abstract

This research developed a Kencorpus Swahili Question Answering Dataset KenSwQuAD from raw data of Swahili language, which is a low resource language predominantly spoken in Eastern African and also has speakers in other parts of the world. Question Answering datase...

Expand Abstract

This project produced a parallel corpus between Swahili and 2 other Kenya Languages: Dholuo and Luhya. The Luhya Language has several dialects. In the project 3 dialects were chosen as a start: Lumarachi, Logooli and Lubukusi. A total of 12, 400 sentences were tr...

Expand Abstract

Kinyarwanda Speech-to-Text model based on the coqui architecture

Hugging Face demo space to try out the Coqui and SpeechBrain models for Kinyarwanda speech recognition.

Dataset made of more than 17 hours of Kinyarwanda studio recordings

Text-to-Speech model for Kinyarwanda

This is a dataset of publically available Tanzania Hansard documents, in Kiswahili. It contains 2735 png images of pages from pdf documents, and text files containing transcripts obtained from the OCR tool tesseract-ocr. The images are obtained via scanning pdf fil...

Expand Abstract

Machine translation (MT) systems are now able to provide very accurate results for high resource language pairs such as English and German. However, for many low resource languages, MT is still under active research. We propose to develop and share publicly an eval...

Expand Abstract

A Kinyarwanda chatbot answering COVID-19 related questions

NaijaSenti is an open-source sentiment and emotion corpora for four major Nigerian languages. This project was supported by lacuna-fund initiatives. Jump straight to one of the sections below, or just scroll down to find out more.