MULTILINGUAL ADAPTIVE FINE-TUNING (MAFT)
We introduce MAFT as an approach to adapt a multi-lingual PLM to a new set of languages. Adapting PLMs has been shown to be effective when adapting to a new domain (Gururangan et al., 2020)
or language (Pfeiffer et al., 2020; Alabi et al., 2020; Adelani et al., 2021). While previous work on
multilingual adaptation has mostly focused on autoregressive sequence-to-sequence models such as
mBART (Tang et al., 2020), in this work, we adapt non-autoregressive masked PLMs on monolingual corpora covering 20 languages. Crucially, during adaptation we use the training objective that
was also used during pre-training. The models resulting from MAFT were then fine-tuned on supervised NLP downstream tasks. We only applied MAFT to smaller models (XLM-R-base, AfriBERTa,
and XLM-R-miniLM), since one of our goals is to reduce model size, but XLM-R-large requires a
lot of compute resources and the training is slower. We call the resulting model after applying
MAFT to XLM-R-base as AfroXLMR-base, and AfroXLMR-mini when MAFT is applied to XLMR-miniLM
Link