Join our mailing list to get updates on our events, news, and the latest from the world of African language resources.

Your email is safe with us. We promise not to spam!
Please, consider giving your feedback on using Lanfrica so that we can know how best to serve you. To get started, .

Ìtàkúròso: Exploiting Cross-Lingual Transferability for Natural Language Generation of Dialogues in Low-Resource, African Languages

In this study, we investigate the possibility of cross-lingual transfer from a state-of-the-art (SotA) deep monolingual model DialoGPT to 6 African languages and compare with 2 other baselines (BlenderBot 90M, another SotA and a simple seq2seq). The languages are Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yoruba. Natural language generation (NLG) of dialogues is known to be a challenging task for many reasons. It becomes more challenging for African languages which are low-resource in terms of data. We translate and train on a small portion of the multi-domain MultiWOZ dataset for the languages. Besides intrinsic evaluation (i.e. perplexity), we conduct human evaluation of single-turn conversations using majority voting and measure inter-annotator agreement (IAA) using Fleiss Kappa and credibility tests. The results show that the hypothesis that deep monolingual models learn some abstractions that generalise across languages holds. We observe human-like conversations in 5 out of the 6 languages. It, however, applies to different degrees in different languages, which is expected. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1\%, of which 35.5\% are unanimous. Its credibility IAA unanimous score is 66.7\%. The main contributions of this paper include the representation of under-represented African languages and demonstrating the cross-lingual transferability hypothesis. We also provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.