Join our mailing list to get updates on our events, news, and the latest from the world of African language resources.

Your email is safe with us. We promise not to spam!
Please, consider giving your feedback on using Lanfrica so that we can know how best to serve you. To get started, .
X
Filter

Filter Records

Languages

Loading...

Tasks

Loading...

Record Types

Loading...

Tags

Loading...

The African Storybook (ASb) is a multilingual literacy initiative that works with educators and children to publish openly licensed picture storybooks for early reading in the languages of Africa. An initiative of Saide, the ASb has an interactive website that enab...

Expand Abstract

We created a novel dataset, ANTC — African News Topic Classification for 4 African languages. We obtained data from three different news sources: VOA, BBC6 and isolezwe7 . From the VOA data we created datasets for Lingala and Somali. We obtained the topics from dat...

Expand Abstract

Right dislocation (Cheng & Downing 2012) and movement to a low FocP (van der Wal 2006) are competing analyses of Immediately-After-Verb (IAV) focus. In this paper, I discuss novel Lubukusu IAV focus data which shows that 1) IAV focus re- quires movement to a low FP...

Expand Abstract

Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represente...

Expand Abstract

We introduce a speech corpus containing multilingual code-switching compiled from South African soap operas. The corpus contains English, isiZulu, isiXhosa, Setswana and Sesotho speech, paired into four language-balanced subcorpora containing English-isiZulu, Engli...

Expand Abstract

We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Final...

Expand Abstract

This website contains information about African Languages, and other African Language related resources. Currently mostly only the South African languages are covered, as well as Kiswahili and Cilubà. It is estimated that there are between 2000 and 3000 languages ...

Expand Abstract

In promoting a multilingual South Africa, the government is encouraging people to speak more than one language. In order to comply with this initiative, people choose to learn the languages which they do not speak as home language. The African languages are mostly ...

Expand Abstract

The development of the African Wordnet (AWN) has reached a stage of maturity where the first steps towards an application can be attempted. The AWN is based on the expand method, and to compensate for the general resource scarceness of the African languages, variou...

Expand Abstract

AfriCLIRMatrix is a test collection for cross-lingual information retrieval research in 15 diverse African languages. This resource comprises English queries with query–document relevance judgments in 15 African languages automatically mined from Wikipedia...

Expand Abstract

Language diversity in NLP is critical in enabling the development of tools for a wide range of users.However, there are limited resources for building such tools for many languages, particularly those spoken in Africa.For search, most existing datasets feature few ...

Expand Abstract

This repository contains code to reproduce Better Quality Pre-training Data and T5 Models for African Languages which appears in the 2023 conference on Empirical Methods in Natural Language Processing (EMNLP). AfriTeVa V2 was trained on 20 languages (16 African La...

Expand Abstract

AfroLID is a powerful neural toolkit for African languages identification which covers 517 African languages....

Expand Abstract

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural ...

Expand Abstract

Code for the EMNLP 2021 Paper AfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages.