NECAT-CLWE: A Simple But Efficient Parallel Data Generation Approach for Unsupervised and Semi-Supervised Neural Machine Translation

Many languages lack sufficient data to train qualitative translation systems, particularly those based on the cutting-edge neural machine translation architectures. Recently, it has been demonstrated that using an exact copy of the monolingual target data as the source data improves the quality of translation systems, allowing them to benefit from proper nouns and such similar words that do not require translation. However, using an exact copy of the target data contaminates the source data with terms in the target language that needs translation. As a result, we describe in this paper a similar but more effective parallel data generation approach for improving low-resource neural machine translation using named entity copying and approximate translations using cross-lingual word embedding (NECAT-CLWE). The work will be evaluated on the low resource English-Hausa neural machine translation.

Link

LANGUAGES

Hausa

NECAT-CLWE: A Simple But Efficient Parallel Data Generation Approach for Unsupervised and Semi-Supervised Neural Machine Translation

LANGUAGES

TASKS

TAGS