Join our mailing list to get updates on our events, news, and the latest from the world of African language resources.

Your email is safe with us. We promise not to spam!
Please, consider giving your feedback on using Lanfrica so that we can know how best to serve you. To get started, .
X
Filter

Filter Records

Languages

Loading...

Tasks

Loading...

Record Types

Loading...

Tags

Loading...

The African Storybook (ASb) is a multilingual literacy initiative that works with educators and children to publish openly licensed picture storybooks for early reading in the languages of Africa. An initiative of Saide, the ASb has an interactive website that enab...

Expand Abstract

This website contains information about African Languages, and other African Language related resources. Currently mostly only the South African languages are covered, as well as Kiswahili and Cilubà. It is estimated that there are between 2000 and 3000 languages ...

Expand Abstract

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural ...

Expand Abstract

AfroLID is a powerful neural toolkit for African languages identification which covers 517 African languages....

Expand Abstract

Morphological analysis involves investigating the syntactic class of a word but can also extend to the decomposition and syntactic analysis of its underlying morpheme composition. This is especially relevant to languages with an agglutinative writing system where m...

Expand Abstract

For this project, we collected and annotated data to develop language resources for the four official South African Nguni languages written with a conjunctive orthography. The data for these four languages is parallel to allow for comparative (computational) lingui...

Expand Abstract

Parts of Speech Tagging (POS Tagging) is a process of assigning labels to each word in text, to indicate its lexical category based on the context it appears in in text. The POS tagging problem is considered a mostly solved problem in languages with a lot of NLP re...

Expand Abstract

Founded in 1988, the Folio Group has grown from a tiny start-up into the major-league language service provider that it is today. This is largely driven by our reputation for reliability, technical expertise, fast turnaround and meticulous accuracy. Folio is recogn...

Expand Abstract

Gboard is a virtual keyboard app developed by Google for Android and iOS devices.

This paper describes the named entity language resources developed as part of a development project for the South African languages. The development efforts focused on creating protocols and annotated data sets with at least 15,000 annotated named entity tokens for...

Expand Abstract

Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages. We have experimented with two techniques which may provide pathways to large vocabular...

Expand Abstract

Bolingo Consult is a female-led LSP that makes localization for African Languages a seamless pro-cess. We have experience in navigating the complexities in localization for African languages. Our services range from translation, interpretation and media localizat...

Expand Abstract

The Mandla dictionary is an online open source crowd sourced dictionary for African languages. It features definitions, etymology, and example sentences written in the native language using both indigenous scripts and latin script, as well as parallel definitions w...

Expand Abstract

This repository contains links to data and code to fetch and reproduce the data described in our EMNLP 2021 paper titled "MassiveSumm: a very large-scale, very multilingual, news summarisation dataset". A (massive) multilingual dataset consisting of 92 diverse lang...

Expand Abstract

Current research in automatic summarisation is unapologetically anglo-centered–a persistent state-of-affairs, which also predates neural net approaches. High-quality automatic summarisation datasets are notoriously expensive to create, posing a challenge for any la...

Expand Abstract