Join our mailing list to get updates on our events, news, and the latest from the world of African language resources.

Your email is safe with us. We promise not to spam!
Please, consider giving your feedback on using Lanfrica so that we can know how best to serve you. To get started, .

NECAT-CLWE: A Simple But Efficient Parallel Data Generation Approach for Unsupervised and Semi-Supervised Neural Machine Translation

Many languages lack sufficient data to train qualitative translation systems, particularly those based on the cutting-edge neural machine translation architectures. Recently, it has been demonstrated that using an exact copy of the monolingual target data as the source data improves the quality of translation systems, allowing them to benefit from proper nouns and such similar words that do not require translation. However, using an exact copy of the target data contaminates the source data with terms in the target language that needs translation. As a result, we describe in this paper a similar but more effective parallel data generation approach for improving low-resource neural machine translation using named entity copying and approximate translations using cross-lingual word embedding (NECAT-CLWE). The work will be evaluated on the low resource English-Hausa neural machine translation.