Cookies are used on the Lanfrica website to ensure you get the best experience.
No Language Left Behind (NLLB) is a first-of-its-kind, AI breakthrough project that open-sources models capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more. It aims to help people communicate with anyone, anywhere, regardless of their language preferences. To enable the community to leverage and build on top of NLLB, we open source all our evaluation benchmarks(FLORES-200, NLLB-MD, Toxicity-200), LID models and training code, LASER3 encoders, data mining code, MMT training and inference code and our final NLLB-200 models and their smaller distilled versions, for easier use and adoption by the research community. This code repository contains instructions to get the datasets, optimized training and inference code for MMT models, training code for LASER3 encoders as well as instructions for downloading and using the final large NLLB-200 model and the smaller distilled models. In addition to supporting more than 200x200 translation directions, we also provide reliable evaluations of our model on all possible translation directions on the FLORES-200 benchmark. By open-sourcing our code, models and evaluations, we hope to foster even more research in low-resource languages leading to further improvements in the quality of low-resource translation through contributions from the research community.