Fon-French Dataset

FFR Dataset is an ongoing project to collect, clean and store corpora of Fon and French sentences for machine translation from Fon-French. Fon (also called Fongbe) is an African-indigenous language spoken mostly in Benin, by about 1.7 million people. As training data is crucial to the high performance of a machine learning model, the aim of this project is to compile the largest set of training corpora for the research and design of translation and NLP models involving Fon. We have generated 117,029 parallel Fon-French sentences at the moment. Please read the Documentation file for more information about this dataset

Link

CONNECTED RECORDS

software
paper

LANGUAGES

TASKS

machine translation