Join our mailing list to get updates on our events, news, and the latest from the world of African language resources.

Your email is safe with us. We promise not to spam!
Please, consider giving your feedback on using Lanfrica so that we can know how best to serve you. To get started, .
X
Filter

Filter Records

Languages

Loading...

Tasks

Loading...

Record Types

Loading...

Tags

Loading...

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech s...

Expand Abstract

Founded in 1988, the Folio Group has grown from a tiny start-up into the major-league language service provider that it is today. This is largely driven by our reputation for reliability, technical expertise, fast turnaround and meticulous accuracy. Folio is recogn...

Expand Abstract

Gboard is a virtual keyboard app developed by Google for Android and iOS devices.

The Mandla dictionary is an online open source crowd sourced dictionary for African languages. It features definitions, etymology, and example sentences written in the native language using both indigenous scripts and latin script, as well as parallel definitions w...

Expand Abstract

Today, the exponential rise of large models developed by academic and industrial institutions with the help of massive computing resources raises the question of whether someone without access to such resources can make a valuable scientific contribution. To explor...

Expand Abstract

No Language Left Behind (NLLB) is a first-of-its-kind, AI breakthrough project that open-sources models capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and m...

Expand Abstract

NLLB project uses data from three sources : public bitext, mined bitext and data generated using backtranslation. Details of different datasets used and open source links are provided in details here.

Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the va...

Expand Abstract

PanLex, a project of The Long Now Foundation, aims to enable the translation of lexemes among all human languages in the world. By focusing on lexemic translations, rather than grammatical or corpus data, it achieves broader lexical and language coverage than relat...

Expand Abstract

PanLex is a nonprofit whose mission is to overcome language barriers to human rights, information, and opportunities. We believe that nobody should have their rights restricted because of the language they speak.

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. {`}Low-resourced{'}-ness is a complex problem going beyond data availability and reflects systemic problems in socie...

Expand Abstract

The melodic tone system of Kikamba, as described by Roberts-Kohno (2000; 2014), stands out as particularly complex within the context of recent crosslinguistic work on melodic tone in Bantu (Odden & Bickmore 2014; Bickmore 2015). It is unique, for example, in posse...

Expand Abstract

This is a collection of translated sentences from Tatoeba 359 languages, 3,403 bitexts total number of files: 750 total number of tokens: 65.54M total number of sentence fragments: 8.96M

The creation of FLORES-200 doubles the existing language coverage of FLORES-101. Given the nature of the new languages, which have less standardization and require more specialized professional translations, the verification process became more complex. This requir...

Expand Abstract