Join our mailing list to get updates on our events, news, and the latest from the world of African language resources.

Your email is safe with us. We promise not to spam!
Please, consider giving your feedback on using Lanfrica so that we can know how best to serve you. To get started, .
X
Filter

Filter Records

Languages

Loading...

Tasks

Loading...

Record Types

Loading...

Tags

Loading...

The African Storybook (ASb) is a multilingual literacy initiative that works with educators and children to publish openly licensed picture storybooks for early reading in the languages of Africa. An initiative of Saide, the ASb has an interactive website that enab...

Expand Abstract

We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Final...

Expand Abstract

This website contains information about African Languages, and other African Language related resources. Currently mostly only the South African languages are covered, as well as Kiswahili and Cilubà. It is estimated that there are between 2000 and 3000 languages ...

Expand Abstract

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural ...

Expand Abstract

AfroLID is a powerful neural toolkit for African languages identification which covers 517 African languages....

Expand Abstract

This is a multilingual parallel corpus created from translations of the Bible compiled by Christos Christodoulopoulos and Mark Steedman. 102 languages, 5,148 bitexts total number of files: 107 total number of tokens: 56.43M total number of sentence fragments: 2.84...

Expand Abstract

Morphological analysis involves investigating the syntactic class of a word but can also extend to the decomposition and syntactic analysis of its underlying morpheme composition. This is especially relevant to languages with an agglutinative writing system where m...

Expand Abstract

This corpus is an attempt to recreate the dataset used for training XLM-R. This corpus comprises of monolingual data for 100+ languages and also includes data for romanized languages (indicated by *_rom). This was constructed using the urls and paragraph indices pr...

Expand Abstract

CCAligned consists of parallel or comparable web-document pairs in 137 languages aligned with English. These web-document pairs were constructed by performing language identification on raw web-documents, and ensuring corresponding language codes were corresponding...

Expand Abstract

Cross-lingual document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other. In this paper, we exploit the signals embedded in URLs to label web documents at scale with an average preci...

Expand Abstract

The development of linguistic resources for use in natural language processing is of utmost importance for the continued growth of research and development in the field, especially for resource-scarce languages. In this paper we describe the process and challenges ...

Expand Abstract

For this project, we collected and annotated data to develop language resources for the four official South African Nguni languages written with a conjunctive orthography. The data for these four languages is parallel to allow for comparative (computational) lingui...

Expand Abstract

Parts of Speech Tagging (POS Tagging) is a process of assigning labels to each word in text, to indicate its lexical category based on the context it appears in in text. The POS tagging problem is considered a mostly solved problem in languages with a lot of NLP re...

Expand Abstract

Open Resource Term Bank (OERTB) project is to support the collaborative development and dissemination of terminological resources, and thereby promoting the use of African languages in teaching and learning at higher education institutions....

Expand Abstract

Founded in 1988, the Folio Group has grown from a tiny start-up into the major-league language service provider that it is today. This is largely driven by our reputation for reliability, technical expertise, fast turnaround and meticulous accuracy. Folio is recogn...

Expand Abstract