Lanfrica

Filter

Filter Records

Languages

Pidgin, Nigerian63
Swahili175
Yoruba162
Amharic153
Hausa143
Igbo133
Afrikaans130
Zulu125
Somali115
Xhosa109
Kinyarwanda86
Setswana79
Wolof76
Sotho, Northern72
Sotho, Southern71
Ganda71
Tigrigna70
Lingala62
Tsonga57
Shona55
Swati55
Fon48
Rundi48
Venda46
Arabic, Egyptian Spoken45
Bamanankan44
Chichewa43
Akan43
Dholuo38
Swahili, Congo36
Swahili, Coastal35
Ndebele34
Ndebele33
Éwé33
Bemba28
Mòoré27
Dinka, Southeastern27
Bwamu, Cwi27
Amazigh26
Arabic, Moroccan Spoken24
Urhobo24
Gikuyu24
Umbundu24
Tamasheq22
Sango22
Nuer20
Arabic, Tunisian Spoken20
Fulfulde, Nigerian20
Jula19
Kanuri, Yerwa18
Kimbundu18
Kamba17
Ibibio17
Mbay17
Tumbuka17
Arabic, Algerian Spoken16
Malagasy, Merina16
Kabuverdianu16
Dinka, Southwestern16
Luba-Kasai16
Mandinka16
Oshiwambo16
Nyankore16
Kabiyè16
Susu15
Acholi15
Ateso15
Pular15
Fulfulde, Central-Eastern Niger14
Lugbara14
Fulfulde, Adamawa14
Oromo, Borana-Arsi-Guji14
Fulfulde, Western Niger14
Ga14
Zarma14
Chokwe14
Dan14
Ko14
Ron14
Pol14
Koongo13
Kanuri, Manga13
Serer-Sine13
Coptic13
Tha13
Kituba13
Afar13
Krio13
Lozi13
Efik13
Oromo, West Central13
Khoekhoe13
Pidgin, Cameroon13
Tonga13
Bukusu13
Dangme12
Themne12
Jiru12
Aka12
Kam12
Soga12
Male12
Kaan12
Herero12
Morisyen12
Kanembu11
Seychelles French Creole11
Dagaare, Southern11
Makhuwa11
Makonde11
Ndonga11
Geez11
Gourmanchéma11
Ewondo11
Nyakyusa-Ngonde11
Maninkakan, Eastern11
Kasem11
Pévé11
Edo11
Nande11
Sénoufo, Supyire10
Nyamwezi10
Pulaar10
Tigré10
Sukuma10
Moba10
Dagbani10
Nyoro10
Xaasongaxango10
Miyobe10
Akoose10
Soninke10
Masaaba10
Sebat Bet Gurage10
Sénoufo, Mamara10
Konkomba10
Limbum10
Oromo, Eastern10
Kenga9
Kenyang9
Wolaytta9
Hdi9
Kim9
Kongo, San Salvador9
Chopi9
Suba9
Ntcham9
Chumburung9
Tiv9
Kusaal9
Lukpa9
Tamazight, Central Atlas9
Ninzo9
Yao9
Fulfulde, Borgu9
Farefare9
Fulfulde, Maasina9
Nzema9
Arabic, Libyan Spoken9
Noon9
Dagara, Northern9
Mende9
Tooro9
Ngangam9
Turkana9
Maasai9
Guinea-Bissau Creole8
Saamya-Gwe8
Kigiryama8
Kouya8
Bokobaru8
Boko8
Dangaléat8
Mbuko8
Buamu8
Jopadhola8
Lunda8
Makaa8
Kitharaka8
Zande8
Gogo8
Kwangali8
Shambala8
Gamo8
Nobiin8
Alur8
Bedjond8
Nomaande8
Isoko8
Ngwo8
Mankanya8
Shilluk8
Bafut8
Ivbie North-Okpela-Arhe8
Ngambay8
Ndau8
Muyang8
Nkonya8
Kimîîru8
Kutu8
Lyélé8
Gaa8
Baoulé8
Nyole8
Kwere8
Makhuwa-Meetto8
Baatonum8
Murle7
Koma7
Tampulma7
Tswa7
Kaonde7
Kuranko7
Bulu7
Konzo7
Klao7
Noone7
Kagulu7
Lulogooli7
Hehe7
Haya7
Nyabwa7
Hanga7
Nyaturu7
Tarifit7
Paloor7
Gwere7
Samburu7
Equatorial Guinean Pidgin7
Gulay7
Ekegusii7
Kipfokomo7
Nyungwe7
Gor7
Gofa7
Delo7
Ekajuk7
Buli7
Marba7
Krumen, Tepo7
Mumuye7
Ditammari7
Ng’akarimojong7
Laalaa7
Chiga7
Cerma7
Paasaal7
Sénoufo, Djimini7
Esan7
Sidamo7
Daasanach7
Kipsigis7
Sena7
Chiduruma7
Dawro7
Uduk7
Mampruli7
Maninkakan, Western7
Obolo7
Siwu7
Zigula7
Luba-Katanga7
Aringa7
Arigidi7
Luvale7
Meta’7
Mundani7
Ghomálá’7
Hamer-Banna7
Bari7
Vai7
Kwa7
Adioukrou7
Mambila, Cameroon7
Maay7
Babanki7
Birifor, Malba7
Matal7
Kituba7
Avokaya7
Avatime7
Birifor, Southern7
Bimoba7
Kinga7
Jur Modo7
Nyemba6
Yaouré6
Deg6
Bilen6
Kumam6
Tamahaq, Tahaggart6
Pogolo6
Kakwa6
Rendille6
Vunjo6
Sénoufo, Tagwana6
Koromfé6
Tsikimba6
Bisa6
Aja6
Mofu-Gudur6
Datooga6
Fang6
Mokole6
Shi6
Kunda6
Rigwe6
Mwani6
Sénoufo, Cebaara6
Nuni, Southern6
Ngomba6
Ngindo6
Bamun6
Vengo6
Yalunka6
Wandala6
Machame6
Nilamba6
Songhay, Koyraboro Senni6
Basaa6
Ebira6
Kuo6
Izere6
Ika6
Ngulu6
Jola-Fonyi6
Lobi6
Nupe-Nupe-Tako6
Biali6
Kalanga6
Lango6
Anufo6
Kafa6
Merey6
Jola-Kasa6
Lelemi6
Denya6
Anyin6
Nateni6
Ndogo6
Mabaan6
Mina6
Sabaot6
Ndamba6
Gyele6
Tuwuli6
Luwo6
Vidunda6
Toma6
Vili6
Oku6
Gen6
Tobanga6
Kimré6
Masana6
Mwan6
Kutep6
Kambaata6
Koorete6
Kua6
Bandial6
Musgu6
Bisã6
Gonja6
Gude6
Mada6
Gbaya6
Mbembe, Tigon6
Gokana6
Talinga-Bwisi6
Gikyode6
Kako6
Chuwabu5
Harari5
Lama5
Luguru5
Lamba5
Nya Huba5
Kono5
Kaansa5
Dagaari Dioula5
Otuho5
Koonzime5
Mündü5
Lomwe5
Sangu5
Nafaanra5
Abidji5
Tunen5
Ndut5
Samo, Southern5
Somrai5
Songe5
Oluwanga5
Kupsapiiny5
Kamwe5
Zulgo-Gemzek5
Majang5
Kisi, Southern5
Attié5
Ghanaian Pidgin English5
Abua5
Rwa5
Anuak5
Lango5
Ma’di5
Kalabari5
Igala5
Shekkacho5
Pam5
Arabic, Eastern Egyptian Bedawi Spoken5
Vwanji5
Thur5
Sar5
Konso5
Nkoya5
Tonga5
Olunyole5
Loma5
Kpelle, Guinea5
Aghem5
Limba, West-Central5
Takwane5
Ikwere5
Iraqw5
Mbula-Bwazza5
Yombe5
Khana5
Bakoko5
Pökoot5
Ruund5
Lame5
Bedawiyet5
Idoma5
Berom5
Basketo5
Kuria5
Vagla5
Tem5
Awing5
Dinka, Northeastern5
Basa5
Okiek4
Gidar4
Igede4
Mbunda4
Hadza4
Mogofin4
Nambya4
Doyayo4
Shatt4
Esahie4
Nyala4
Kirike4
Mochi4
Sekpele4
Yamba4
Abé4
Izon4
Hadiyya4
Soli4
Loko4
Logo4
Gbagyi4
Adangbe4
Grebo, Northern4
Ngando4
Gun4
Mayogo4
Lingua Franca4
Olutsotso4
Nyangbo4
Fipa4
Dida, Yocoboué4
Gbaya, Northwest4
Tamazight, Standard Moroccan4
Fuliiru4
Nyaneka4
Lusengo4
Ahanta4
Pidgin Bantu4
Lobala4
Ake4
Maba4
Adele4
Gbaya, Southwest4
Nsenga4
Odual4
Kuhane4
Asu4
Ebrié4
Songhay, Humburi Senni4
Yala4
Yemba4
Tshuwau4
Gola4
Maan4
Koti4
Anaang4
Iten4
Sandawe4
Zaramo4
Markweeta4
Phende4
Lele4
Gungu4
Lumun4
Banda, South Central4
Eton4
Ronga4
Ejagham4
Aweer4
Jibu4
Ikwo4
Kera4
Mwimbi-Muthambi4
Tee4
Psikye4
Tikar4
Moloko4
Bura-Pabir4
Kisi4
Vute4
Kele4
Songhay, Koyra Chiini4
Tennet4
Kukele4
Bete4
Wè Western4
Nawuri4
Nyamwanga4
Burunge4
Keliko4
Laka4
Sena, Malawi4
Chakali4
Laru4
Wongo4
Tyap4
Bandi4
Lambya4
Ndali4
Beembe4
Katcha-Kadugli-Miri4
Karang4
Bena4
Tumak4
Mwera4
Mambwe-Lungu4
Birwa4
Karon4
Mukulu4
Bafia4
Kabba4
Fungwa4
Bum4
Kota4
Bomu4
Turka4
Bambalang4
Moro4
Tula4
Tupuri4
Tera4
Oniyan4
Bullom So4
Bekwarra4
Konni4
Sherbro4
Migaama4
Tonga4
Kom4
Tunni4
Pana4
Krumen, Plapo4
Bokyi4
Notre4
Baga Sitemu4
ut-Hun4
Kono4
Tsishingini4
Matumbi4
Giziga4
Wamey4
Gciriku4
Sisaala, Tumulung4
Dinka, South Central4
Karaboro, Eastern4
Ngbandi, Northern4
Kgalagadi4
Ngombe4
Kpelle, Liberia4
Ngemba4
Ngiti4
Daba4
Sissala4
Kare4
Toura4
Jukun Takum4
Konabéré4
Jita4
Mbe4
Lega-Mwenga4
Naro4
Selee4
Chidigo4
Didinga4
Nyiha, Tanzania4
Dhimba4
Dawida4
Subi4
Dahalo4
Kuwaataay4
Waama4
Comorian, Maore4
Sãotomense4
Malila4
Bacama4
Bana4
Xamtanga4
Mwaghavul4
Lika4
Shubi4
Bench4
Mattokki4
Bali4
Morokodo4
Moru4
Mofu, North4
Ik3
Mundang3
Manyawa3
Ngombale3
Khe3
Khisa3
Ila3
Opo3
Njyem3
Kalamsé3
Omotik3
Nandi3
Gavar3
Kistane3
Mefele3
Kung-Ekoka3
Kissi, Northern3
Mbelime3
Kwasio3
Kulango, Bouna3
Mafa3
Guruntum-Mbaaru3
Mba3
Kami3
Narim3
Somyev3
Maninka, Konyanka3
Mpumpong3
Kulango, Bondoukou3
Kokola3
Mbukushu3
Lolo3
Lomwe, Malawi3
Ngwe3
Bati3
Nubi3
Gula3
Laari3
Kombe3
Ngbaka3
Natioro3
Naba3
Mbwela3
Mesqan3
Lefa3
Lutachoni3
N’ko3
Mpiemo3
Kamara3
Teke-Kukuya3
Kwakum3
Olukhayo3
Mmaala3
Olumarachi3
Téén3
Mada3
Lendu3
Kiwilwana3
Mashi3
Ghotuo3
Teke-Tege3
Bangandu3
Gedeo3
Shall-Zwall3
Iceve-Maci3
Wali3
Mmen3
Dogosé3
Tachawit3
Duala3
Sénoufo, Syenara3
Siwi3
Wara3
Buwal3
Wan3
Marenje3
Wumbvu3
Bamenyam3
Ding3
Awngi3
Marka3
Nugunu3
Yaka3
Amba3
Safaliba3
Ekpeye3
Elip3
Kantosi3
Tuki3
Bambili-Bambui3
Mengaka3
Mbudum3
Dazaga3
Makhuwa-Marrevone3
Dyan3
Makhuwa-Shirima3
Sama3
Fali, South3
Bassa3
Bago-Kusuntu3
Wumboko3
Taveta3
Tunia3
Tamajaq, Tawallammat3
Bozo, Jenaama3
Baga Sobané3
Kyoli3
Tsamai3
Bushi3
Barein3
Bangala3
Tetela3
Bafaw-Balong3
Isu3
Gichuka3
Dida, Lakota3
Kuwaa3
Dogoso3
Sangu3
Dendi3
Viemo3
Baka3
Sokoro3
Seeku3
Cuvok3
Doondo3
Swo3
Vale3
Anii3
Dadiya3
Waata3
Ukwuani-Aboh-Ndoni3
Reshe3
Tachelhit3
Comorian, Ngazidja3
Zaghawa3
Pagibete3
Garre3
Animere3
Sénoufo, Palaka3
Phuie3
Mak3
Fali3
Juba Arabic3
Zenaga3
Esimbi3
Angolar3
Abron3
Arabic, Sudanese Spoken3
Mituku2
Tanjijili2
Masmaje2
Mpoto2
Molo2
Uhami2
Ukhwejo2
Iyive2
Tiyaa2
Mpade2
Uda2
Mbara2
Ubi2
Ubang2
Ukwa2
Mandari2
Teke-Tsaayi2
Kyanga2
Marghi Central2
Tarjumo2
Tswapong2
Bukpe2
Mbangwe2
Ukue2
us-Saare2
Vanuma2
Northwestern !Kung2
Zhire2
Zhoa2
Zialo2
Mesme2
Doko-Uyanga2
Zinza2
Etulo2
Otank2
Mboi2
Kuku2
Monzombo2
Zimba2
Usaghade2
Mfinu2
Uneme2
Umon2
Ulukwumi2
Mogum2
Maninka, Sankaran2
Moi2
Mono2
Ma2
Twendi2
Tama2
Mulgi2
Marfa2
Machinga2
Toram2
Toposa2
Zerenkel2
Tongwe2
Ménik2
Mvanip2
Tembo2
Tima2
Talodi2
Mursi2
Moingi2
Manyika2
Mbulungish2
Malgbe2
Mbangala2
Mvuba2
Mbowe2
Zeem2
Tajuasohn2
Tiene2
Zumaya2
Tsaangi2
Mangas2
Tawara2
Tsuvadi2
Mawa2
Musey2
Maslam2
Mansoanka2
Anyin Morofo2
Montol2
Mbesa2
Mubi2
Ajumbu2
Mengisa2
Mbo2
Tedaga2
Tsuvan2
Mabire2
Mubako2
Bomitaba2
Mambai2
Totela2
Tuotomb2
Yaaku2
Mbandja2
Tsogo2
Tondi Songway Kiini2
Zan Gula2
Vemgo-Mabas2
Mundat2
Mabaale2
Mango2
Mbosi2
Melo2
Mboko2
Mbonga2
Mushungulu2
Kaamba2
Yamongeri2
Mararit2
Makwe2
Xiri2
Mbere2
Dizin2
Kelo2
ǁXegwi2
Abureni2
Kwandu2
Mbangi2
Sambe2
Wawa2
Bungu2
Manda2
Matengo2
Mbole2
Mandjak2
Yakoma2
Wudu2
Kwegu2
Hassaniyya2
Lokaa2
Agwagwune2
Mendankwe-Nkwen2
Yauma2
Naki2
Yambeta2
Yela2
Banda-Yangere2
Tarok2
Midob2
Majera2
Nyankpa2
Makhuwa-Saka2
Sanga2
Shamang2
Putai2
Marghi South2
Komo2
Yeyi2
Yasa2
Xingoni2
Chingoni2
Mbunga2
Mbugwe2
Verre2
Zanaki2
Mingang Doso2
Mangbetu2
Waka2
Mamvu2
Mbule2
Vumbu2
Miltu2
Voro2
Zangwal2
Mlomp2
Zayse2
Byep2
Vame2
Masalit2
Zari2
Kujarge2
Massalat2
Bu2
Viti2
Vinza2
Mama2
Yukuben2
Mbum2
Wanda2
Bena2
Berta2
Yendang2
Wasa2
Warnang2
Mahongwe2
Mbugu2
Wom2
Mbala2
Yango2
Wè Northern2
Mbati2
Makhuwa-Moniga2
Warji2
Mangbutu2
Waja2
Toussian, Southern2
Tira2
Iyansi2
Yotti2
Weh2
Yulu2
Mungbam2
Alagwa2
Mbre2
Miya2
Mahou2
Orma2
Thuri2
Sake2
Nshi2
Ngasa2
Ngoshie2
Nigerian Sign Language2
Sagala2
Nathembo2
Safwa2
Surbakhal2
Shabo2
Ikoma-Nata-Isenye2
Saya2
Saafi-Saafi2
Ntomba2
Ngundu2
Nara2
Ndunda2
Saba2
Ruma2
Mala2
Rufiji2
Ruruuli-Runyala2
Runga2
Ngbundu2
Rombo2
Rogo2
Rungwa2
Nungu2
Nyamusa-Molo2
Lower Nossob2
Sara Kaba2
Kara2
Nancere2
Ngam2
Sonde2
Ndumu2
Nyam2
Ngombe2
Singa2
Nizaa2
South African Sign Language2
Ndemli2
!Xóõ2
Sénoufo, Nyarafolo2
Segeju2
Nawdm2
Ngandjera2
Ngom2
Ngiemboon2
Nyangatom2
Ngete2
Sened2
Dwang2
Suundi2
Vori2
Nda’nda’2
Nyambo2
Sheni2
Noy2
Ede Nago, Kura2
Ngendelengo2
Ndambomo2
Rang2
Sheko2
Glio-Oubi2
Badyara2
Papel2
Kpasham2
Parkwa2
Efutop2
Ogbia2
Ogbogolo2
Ogbronuagum2
Okobo2
Pye2
Okodia2
Oyda2
Okpe2
Tagargrent2
Obulom2
Otoro2
Oko-Eni-Osayen2
Osatu2
Okpe2
Ososo2
Oloma2
Kuvale2
Omi2
Ombo2
Oro2
Oorlams2
Okpamheri2
Oruma2
Odut2
Oblo2
Tegali2
Polci2
Panawa2
Nyanga-li2
Punu2
Nyengo2
Nyindu2
Ama2
Nyanga2
Pelende2
Poke2
Paleni2
Pangu2
Mpinda2
Pambia2
Piya-Kwonci2
Pangwa2
Nyang’i2
Pimbwe2
Pero2
Yom2
Phimbi2
Pangseng2
Njebi2
Nzadi2
Pere2
Nzakara2
Teke-Nzikou2
Piapung2
Abishi2
Nzakambay2
Ngul2
Nyali2
Mayeka2
Ngamambo2
Tumtum2
Ngamo2
Tagbu2
Gaam2
Tocho2
Tamki2
Taabwa2
Tangale2
Tal2
Ngbaka Ma’bo2
Nkukoli2
Nnam2
Numana2
Nchane2
Ndoe2
Tala2
Tagoi2
Nde-Nsele-Nta2
Sengele2
Seze2
Samay2
Sinyar2
Ndengereko2
Sukur2
Seki2
Samba Leko2
Sasaru2
Ndaka2
Ngbinda2
Tafi2
Sighu2
Teke, Ibali2
Myene2
Togoyo2
Màwés Aasʼè2
Manta2
Mangayat2
Ciwogai2
Sagalla2
Anfillo2
Tetserret2
Mesmes2
Tulishi2
Temein2
Tumzabt2
Malimba2
Tagdal2
Manya2
Mambila, Nigeria2
Oring2
Terik2
Mandja2
Toro2
Tita2
Teme2
Kusur-Myet2
Nalu2
Rom2
Iguta2
Cahungwarya2
Dogon, Tiranige Diga2
Ndolo2
Ndam2
Shua2
Sininkere2
Beng2
Songomeno2
Songoora2
Niellim2
Sanga2
Salampasu2
Hema2
Nzanyi2
Ndonde Hamba2
Njalgulgule2
Sakata2
Njerep2
Kunyi2
Senhaja Berber2
Tibea2
Nkongho2
Nkangala2
Assangori2
Nkami2
Siamou2
Simaa2
Shwai2
Nyika2
Nkutu2
Sala2
Nkoroo2
Shanga2
Nkari2
Ngungwel2
Nggwahyi2
Sarua2
Saho2
Sharwa2
Sira2
Sere2
Ngundi2
Sumbwa2
Kebu2
Ndombe2
Ndoola2
Suri, Tirmaga-Chai2
Ndunga2
Suku2
Ndobo2
Settla2
Nde-Gbite2
Ngwaba2
Neyo2
Mfumte2
Suba-Simbiti2
Suma2
Shau2
Shama-Sambuga2
Ngbaka Manza2
Ngizim2
Ngie2
Temi2
Songo2
Ngoreme2
Gvoko2
Nagumi2
Ndasa2
Kwambi2
Mbo2
Dabarre2
Dghwede2
Dengese2
Dek2
Degema2
Dewoin2
Dongotono2
Dogon, Bankan Tey2
Dungu2
Duwai2
Andaandi2
Dulbu2
Duguri2
Doka2
Deno2
Bangime2
Daju, Dar Sila2
Daho-Doo2
Damakawa2
Daju, Dar Fur2
Dagba2
Dhaiso2
Duupa2
Dugwor2
Dombe2
Dongo2
Dong2
Doe2
Ndrulo2
Dzùùngoo2
Ndendeule2
Dema2
Dama2
Duma2
Dimbong2
Dahalik2
Mazagway-Hidi2
Kadung2
Daju, Dar Daju2
Dinka, Northwestern2
Dirim2
Dibo2
Dime2
Dilling2
Day_2
Maindo2
Dompo2
Ubaghara2
Bozo, Kelengaxo2
Bozaba2
Bolondo2
Medumba2
Berti2
Burak2
Bumaji2
Bina2
Bikya2
Evant2
Bankagooma2
Busam2
Bebil2
Barikanchi2
Jalkunan2
Ongota2
Molengue2
Belanda Bor2
Bwisi2
Obanliku2
Ede Cabe2
Kabwa2
Kibaku2
Chara2
Kpeego2
Cherepon2
Chichonyi-Chidzihana-Chikauma2
Chenoua2
Camtho2
Cakfem-Mushere2
Caka2
Kajakse2
Cineni2
Izora2
Karimjo2
Cara2
Ibaas2
Kamuku2
Cutchi-Swahili2
Kasanga2
Atsam2
Samba Daka2
Tsucuba2
Mositacha2
Dorze2
Bobo Madaré, Southern2
Gbari2
Gade2
Geruma2
Gudu2
Laal2
Dirasha2
Gadang2
Gundi2
Guduf-Gava2
Gbanu2
Ywom2
Gbaya-Bozoum2
Gbaya-Bossangoa2
Mo’da2
Gbanziri2
Gabri2
Fwe2
Fur_2
Furu2
Fum2
Gengle2
Geme2
Foodo2
Glaro-Twabo2
Ngen2
Lere2
Ganang2
Mgbolizhia2
Magoma2
Gimnime2
Mághdì2
Gule2
Glavda2
Galambu2
Eviya2
Gula Iro2
Ndai2
Gibanawa2
Ghomara2
Ghulfan2
Ghadamès2
Gbii2
Enya2
Gera2
Foma2
Fania2
Dair2
Teke-Eboo2
Eki2
Ekit2
Jola-Felupe2
Ehueun2
Eggon2
Ega2
Efe2
Efai2
Kiembu2
Ebughu2
El Hugeirat2
Dzando2
Dogon, Yanda Dom2
Diri2
Dii2
Dogon, Toro So2
Daatsʼíin2
Dogon, Tomo Kan2
Disa2
C’Lela2
Yace2
Nding2
Fe’fe’2
Uvbie2
Flaaitaal2
Fyer2
Fongoro2
Feroge2
Fam2
Fang2
Uzekwe2
Ezaa2
Keiyo2
Etsako2
Eleme2
Etebi2
Ombamba2
Epie2
Beti2
Enwan2
Engenni2
Eman2
Emai-Iuleha-Ora2
El Molo2
Bomboma2
Borna2
Ganzi2
Mbat2
Bubia2
Kulung2
Mburku2
Bakpinka2
Banda, West Central2
Babango2
Barama2
Bangba2
Nubaca2
Bainouk-Samik2
Bainouk-Gunyuño2
Ayu2
Akuku2
Leyigha2
Ginyanga2
Ayere2
Awak2
Cicipu2
Avikam2
Befang2
Baga Pokur2
Awjilah2
Baldemu2
Banda-Ndélé2
Bafanji2
Bofi2
Besme2
Bebele2
Oroko2
Bhogoto2
Bende2
Morom2
Buduma2
Bali2
Bai2
Burun2
Baka2
Bade2
Bayot2
Bainouk-Gunyaamolo2
Shoo-Minda-Nye2
Bendi2
Hohumono2
Asu2
Aguna2
Gwamhi-Wuri2
Utugwang-Irungene-Afrike2
Qimant2
Argobba2
Ngelima2
Agatu2
Legbo2
Awutu2
Afitti2
Eloyi2
Defaka2
Aduge2
Àhàn2
Lidzonka2
Acheron2
Acipa, Eastern2
Áncá2
Abure2
Abon2
Abanyom2
Bankon2
Aasáx2
Aizi, Tiagbamrin2
Ashe2
Reel2
Amdang2
Asoa2
Ipulo2
Sari2
Cishingini2
Arbore2
Aninka2
Goemai2
Ngas2
Amo2
Ambo2
Ali2
Alaba-K’abeena2
Elege2
Alladian2
Alago2
Akwa2
Akaselem2
Akpa2
Ukpet-Ehom2
Aja2
Beba2
Bangolan2
Bwela2
Barambu2
Baiso2
Bassa-Kontagora2
Buso2
Bangwinji2
Bassossi2
Bauchi2
Wushi2
Bitare2
Birked2
Bata2
Bira2
Bakaka2
Baangi2
Buru2
Boguru2
Bamukumbit2
Balo2
Banda-Mbrès2
Bung2
Yangkam2
Birgit2
Binji2
Bonkeng2
Bishuo2
Barwe2
Bwile2
Boga2
Birri2
Bolgo2
Bamunka2
Belanda Viri2
Bure2
Boor2
Bete-Bendi2
Bukwen2
Boghom2
Bubi2
Budu2
Basa-Gurmana2
Bongili2
Bushoong2
Bua2
Batu2
Tchumbuli2
Banda-Banda2
Baga Koga2
Busuu2
Balanta-Kentohe2
Bekwel2
Boloki2
Kwa’2
Pande2
Buraka2
Beeke2
Kyak2
Bakwé2
Balanta-Ganja2
Beli2
Burji2
Bidyogo2
Budza2
Kol2
Bila2
Biafada2
Bidiyo2
Bhele2
Mbongno2
Bolia2
Bembe2
Bodo2
Bangubangu2
Bondei2
Bongo2
Bozo, Tiemacèwè2
Bole2
Bonjo2
Boma2
Bolon2
Mundabli2
Beezen2
Batanga2
Baga Manduri2
Boon2
Bangi2
Benga2
Bomwali2
Bomboli2
Bagirmi2
Bamwe2
Bom-Kim2
Limassa2
ǁGana2
Eruwa2
Zay2
Kanu2
Kiong2
Kakabe2
Kinuku2
Kakanda2
Kimbu2
Kariya2
Kibet2
Koshin2
Koalib2
Nkumbi2
Kaiku2
Kuturmi2
Krongo2
Koyaga2
Koro2
Kung2
Akebu2
Kugbo2
Kela2
Tese2
Karko2
Kir-Balar2
Guro2
Suri, Kacipo-Bale2
Kpan2
Kplang2
Kutto2
Kugama2
Kudu-Camo2
Koke2
Lagwan2
Koyo2
Kubi2
Kanyok2
Giiwo2
Kanufi2
Dera2
Gwama2
Gimme2
Bakole2
Kapya2
Kendeje2
Kolbila2
Tumi2
Kpessi2
Keiga2
Kisankasa2
Wannu2
Adara2
Jaya2
Jwira-Pepesa2
Wãpha2
Ju2
Jiba2
Jumjum2
Wapan2
Hõne2
Jakattoe2
Jju2
Jowulu2
Joba2
Yemsa2
Janji2
Jimi2
Zumbun2
Ngile2
Labir2
Bankal2
Karekare2
Kari2
Kélé2
Kete2
Kwaja2
Kadaru2
Nikyob-Nindem2
Gyong2
Seba2
Konongo2
Korandje2
Kachama-Ganjule2
Kabwari2
Koenoem2
Kande2
Katla2
Kamo2
Kanga2
Kobiana2
Kamantan2
Vono2
Ukaan2
Kaivi2
Lubila2
Duhwa2
Ikposo2
Krahn, Eastern2
Jiiddu2
Lese2
Libinza2
Banda-Bambari2
Logorik2
Liberian English2
Libido2
Ligbi2
Likila2
Ligenza2
Logba2
Leti2
Päri2
Lengola2
Lenje2
Lega-Shabunda2
Lufu2
Tso2
Landoma2
Leelau2
Lemoro2
Lamja-Dengsa-Tola2
Kenye2
Olushisa2
Luri2
Lopit2
Bo-Rukul2
Luyana2
Lwalu2
Lumbu2
Luimbi2
Olu’bo2
Luna2
Laro2
Olumarama2
Lopa2
Lokoya2
Lombo2
Mongo-Nkundu2
Logol2
Lonzo2
Lamnso’2
Mbalanhu2
Langbashe2
Lombi2
Teke-Laali2
Lala-Roba2
Lenyima2
Duya2
Seroa2
Kusu2
Kpagua2
Kunama2
Kulere2
Kushi2
Kupa2
Juǀ’hoansi2
Karanga2
Kwami2
Kholok2
Kwaami2
Kwang2
Kofa2
Kumba2
Shuwa-Zamani2
Krahn, Western2
Korop2
Sapo2
Kurama2
Korana2
Mser2
Kabalai2
Kendem2
Luchazi2
Kwaya2
Larteh2
Lalia2
Langi2
Lafofa2
Kango2
Kaningi2
Karaboro, Western2
Kpatili2
Krache2
Likuba2
Likwala2
Kulfa2
Karo2
Krobu2
Kwadi2
Kaba Naa, Sara2
Kwese2
Kodia2
Alumu-Tesu2
Kofyar2
Kaba Démé, Sara2
Jimjimen2
Kerewe2
Jilbe2
Hungu2
Ilue2
Hozo2
Inor2
Holoholo2
Nkem-Nkum2
Holu2
Horom2
Holma2
ǁAni2
Isanzu2
Hungana2
Hunde2
Kahe2
Hijuk2
Lamang2
Kerak2
Hai|ǁom2
Itu Mbon Uzo2
Isekiri2
Ihievbe2
Ito2
Ikizu2
Wané2
Mesaka2
Etkywan2
Teke-Fuumu2
Ifè2
Idesa2
Biseni2
Indri2
Idere2
Ede Idaca2
Ajiya2
Luidakho-Luisukha-Lutirichi2
Ede Ica2
Iku-Gora-Ankwa2
Ibuoro2
Ibino2
Ijo, Southeast2
Agoi2
Akpes2
Hya2
Hwana2
Ikulu2
Olulumo-Ikom2
Ikpeshi2
Iyayu2
Igwe2
Yaka2
Shiki2
Gwandara2
Hemba2
Gweno2
Ale2
Dza2
Jere2
Gwa2
Gurmana2
Gumuz2
Gusilay2
Nafusi2
Taznatit2
Tugbiri-Niragu2
Ga’anda2
Gwak2
Jina2
Gupa-Abawa2
Goundo2
Gobu2
Gorowa2
Godié2
Gua2
Moo2
Bijim2
Jahanka2
Gyem2
Ganza2
Hangaza2
Ha2
Havu2
Gyaazi2
Gayil2
Hamba2
Heiban2
Izii2
Jara2
Herdé2
Gbayi2
Hyam2
Aari1
Gbe, Maxi1
Comorian, Ndzwani1
Mokpwe1
Lukabaras1
Komo1
Guébie1
Kagoro1
Principense1
Bozo, Tieyaxo1
Kenyan Sign Language1
Yangben1
Tanzanian Sign Language1
Réunion French Creole1
Moroccan Sign Language1
Khwedam1
Winyé1
Fa d’Ambu1
Dendi1
Pinyin1
Malagasy, Northern Betsimisaraka1
Tiéfo1
Sénoufo, Senara1
Malagasy, Southern Betsimisaraka1
Malagasy, Tandroy-Mahafaly1
Nǁng1
Arabic, Chadian Spoken1
Tamajeq, Tayart1
Chala1
Silt’e1
Malagasy, Sakalava1
O’chi’chi’0
Lijili0
Loo0
Sango, Riverain0
Ahwai0
Sawknah0
Grebo, Gboloo0
Lala-Bisa0
Wè Southern0
Bube0
ǀGwi0
Gbe, Ayizo0
Malagasy, Tsimihety0
Jeri Kuo0
Lele0
Worodougou0
Chungmboko0
Gbe, Xwela0
ut-Ma’in0
Gbe, Western Xwla0
Mbembe, Cross River0
Aushi0
Ma’di, Southern0
Ngando0
Yaka0
Lutos0
Malagasy, Antankarana0
Gbe, Gbesi0
Weyto0
Kara0
Miship0
Mwera0
Muskum0
Centúúm0
Nuni, Northern0
Gbe, Defi0
Nyokon0
Dikaka0
ǀXam0
Mungaka0
Njen0
Laka0
Samo, Maya0
Mbato0
La’bi0
Libyan Sign Language0
Nago, Northern0
Gbe, Ci0
Gbe, Saxwe0
Gbe, Eastern Xwla0
Supapya0
Kla-Dan0
Dugun0
Nindi0
Bamali0
Pinji0
Samba0
Barombi0
Atong0
Guinean Sign Language0
Wojenaka0
Mauritian Sign Language0
Loma0
Sisaala, Western0
Gula0
Zimbabwe Sign Language0
Grebo, Southern0
Dogon, Dogul Dom0
Ngbee0
Gbaya-Mbodomo0
Zizilivakan0
Zula0
Burundian Sign Language0
Ambele0
Arabic, Sa’idi Spoken0
Mpuono0
Grebo, Barclayville0
Adamorobe Sign Language0
Dogon, Nanga Dama0
Bu0
Dagik0
Dogon, Ben Tey0
Kasabe0
Kobo0
Malawian Sign Language0
Dogon, Bondum Dom0
Zambian Sign Language0
Dogon, Donno So0
Arabic, Algerian Saharan Spoken0
Afade0
Grebo, Central0
Longuda0
So_0
Lele0
Algerian Sign Language0
Gban0
Nyiha, Malawi0
Hasha0
Dogon, Ampari0
Yeni0
Samo, Matya0
Jonkor Bourmataguil0
Gail0
Maaka0
Dogon, Bunoge0
Girirra0
Giziga, North0
Gilima0
Igo0
Akum0
Limba, East0
Menka0
Ghanaian Sign Language0
Ajawa0
Arabic, Judeo-Moroccan0
Arabic, Judeo-Tunisian0
Laimbue0
‡Ungkue0
Mashi0
Belning0
Aizi, Aproumu0
Bon Gula0
Aizi, Mobumrin0
Chadian Sign Language0
Fulfulde, Bagirmi0
Dogon, Jamsay0
Kpati0
Dibole0
Dogon, Ana Tinga0
Koro Wachi0
Tugen0
Ibani0
Mozambican Sign Language0
Tadaksahak0
Bwamu, Láá Láá0
Baga Kaloum0
Kwanja0
Tamazight, Tidikelt0
Tasawaq0
Tewe0
Malagasy, Tanosy0
Dogon, Tene Kan0
Ngongo0
Nchumbulu0
Teke-Tyee0
Sierra Leone Sign Language0
Iko0
Nayi0
Ruwila0
Nago, Southern0
Bwa0
Wuzlam0
Mbo’0
Ugandan Sign Language0
Malian Sign Language0
Maninkakan, Kita0
Kono0
Gbe, Tofin0
Gamo-Ningi0
Sénoufo, Sìcìté0
Me’en0
Dogon, Toro Tegu0
Dogon, Tebul Ure0
Sénoufo, Shempire0
Duungooma0
Banda, Togbo-Vara0
To_0
Duli-Gey0
Sha0
Torona0
Bété, Gagnoa0
Malagasy Sign Language0
Dogon, Tommo So0
Tonjon0
Tunisian Sign Language0
Tunzuii0
Baan0
Simba0
Toussian, Northern0
Kuk0
Basa-Gumna0
Nimbari0
Tebul Sign Language0
Malagasy, Tesaka0
Ede Ije0
Mittu0
Tamazight, Temacine0
Mbessa0
Kuce0
Longto0
Kemedzung0
Baloi0
Pyam0
Krumen, Pye0
Bille0
Firan0
Kirya-Konzel0
Fali, North0
Malagasy, Bara0
Pa’a0
Tchitchege0
Gbe, Waci0
Wandji0
Isu0
Gbe, Weme0
Nyika, Tanzania0
Naami0
Tembo0
Dogon, Mombo0
Bété, Daloa0
Kw’adza0
Comorian, Mwali0
Wolane0
Wali0
Bété, Guiberoua0
Pana0
Siri_0
Sénoufo, Nanerigé0
Wolof, Gambian0
Kenswei Nsei0
Kami0
Kpala0
Kanuri, Tumari0
Malagasy, Masikoro0
Enwan0
Dass0
Gbe, Kotafon0
Nteng0
Kanuri, Bilma0
Soo0
ǂ’Amkhoe0
Hausa Sign Language0
Egyptian Sign Language0
Beele0
Ethiopian Sign Language0
Horo0
Kibala0
Koro Zuba0
Berakou0
Homa0
Namibian Sign Language0
Baygo0
Boko0
Fali of Baissa0
Rer Bare0
Buyu0
Songo0
Kango0
Ngbandi, Southern0
Mono0
Banda, Mid-Southern0
Koro Nulu0
Ogbah0
View more

Tasks

natural language processing435
machine translation187
automatic speech recognition123
named entity recognition52
speech processing49
text to speech47
language modeling39
sentiment analysis30
embeddings29
dialect27
sentiment classification26
question answering25
part of speech tagging21
news18
text classification16
topic classification15
summarization15
speech translation13
parsing11
natural language understanding9
code switching8
keyword spotting8
language identification8
natural language inference8
media7
7
dependency parsing7
stopwords6
transfer learning5
cross-language transfer5
information extraction5
automatic content extraction5
keywords5
accent5
news classification4
information detection3
sign-language to text3
speaker verification3
biomedical3
natural language generation3
text normalization3
data to text3
conditional text generation3
commonsense reasoning2
automatic speech translation2
speech-to-text translation2
emotion identification2
professional translation1
grammar error correction1
image-text retrieval1
image classification1
joeynmt1
speech language identification1
semantic role labelling1
coreference resolution1
hate speech detection1
information retrieval1
View more

ANTC — African News Topic Classification Dataset

We created a novel dataset, ANTC — African News Topic Classification for 4 African languages. We obtained data from three different news sources: VOA, BBC6 and isolezwe7 . From the VOA data we created datasets for Lingala and Somali. We obtained the topics from data released by Palen-Michel et al. (2022) and used the provided urls to get the news category from the websites. For pidgin and isiZulu, we scrapped news topic from the respective news website (BBC Pidgin and isolezwe respectively) directly base on their category. We noticed that some news topics are not mutually exclusive to their categories, therefore, we filtered such topics with multiple labels. Also, we ensured that each category has at least 200 samples. The categories include but not limited to, Africa, Entertainment, Health, and Politics. The pre-processed datasets were divided into training, development, and test sets using stratified sampling with a ratio of 70:10:20. Appendix A.2 has more details about the dataset size and news topic information.

Expand Abstract

topic classification

A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation

Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models are restricted to the selection of languages originally chosen for pre-training. This work investigates how to optimally leverage existing pre-trained models to create low-resource translation systems for 16 African languages. We focus on two questions: 1) How can pre-trained models be used for languages not included in the initial pre-training? and 2) How can the resulting translation models effectively transfer to new domains? To answer these questions, we create a new African news corpus covering 16 languages, of which eight languages are not part of any existing evaluation dataset. We demonstrate that the most effective strategy for transferring both to additional languages and to additional domains is to fine-tune large pre-trained models on small quantities of high-quality translation data.

Expand Abstract

A grammar of Pichi

Pichi is an Afro-Caribbean English-lexifier Creole spoken on the island of Bioko, Equatorial Guinea. It is an offshoot of 19th century Krio (Sierra Leone) and shares many characteristics with West African relatives like Nigerian Pidgin, Cameroon Pidgin, and Ghanaian Pidgin English, as well as with the English-lexifier creoles of the insular and continental Caribbean. This comprehensive description presents a detailed analysis of the grammar and phonology of Pichi. It also includes a collection of texts and wordlists. Pichi features a nominative-accusative alignment, SVO word order, adjective-noun order, prenominal determiners, and prepositions. The language has a seven-vowel system and twenty-two consonant phonemes. Pichi has a two-tone system with tonal minimal pairs, morphological tone, and tonal processes. The morphological structure is largely isolating. Pichi has a rich system of tense-aspect-mood marking, an indicative-subjunctive opposition, and a complex copular system with several suppletive forms. Many features align Pichi with the Atlantic-Congo languages spoken in the West African littoral zone. At the same time, characteristics like the prenominal position of adjectives and determiners show a typological overlap with its lexifier English, while extensive contact with Spanish has left an imprint on the lexicon and grammar as well.

Expand Abstract

Adapters for African languages -- based on AfroXLMR

MAD-X adapters trained on AfroXLMR-base, it has the same configuration as XLMR-base....

MAD-X adapters trained on AfroXLMR-base, it has the same configuration as XLMR-base.

Expand Abstract

natural language processing

Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning

Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is language adaptive fine-tuning (LAFT) — fine-tuning a multilingual PLM on monolingual texts of a language using the pre-training objective. However, adapting to target language individually takes large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform multilingual adaptive fine-tuning on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent to encourage cross-lingual transfer learning. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT, thus reducing the model size by around 50%. Our evaluation on two multilingual PLMs (AfriBERTa and XLM-R) and three NLP tasks (NER, news topic classification, and sentiment classification) shows that our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space. Additionally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efficient fine-tuning methods.

Expand Abstract

natural language processing

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

This repository contains the code for the paper Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages which appears in the first workshop on Multilingual Representation Learning at EMNLP 2021. AfriBERTa was trained on 11 languages - Afaan Oromoo (also called Oromo), Amharic, Gahuza (a mixed language containing Kinyarwanda and Kirundi), Hausa, Igbo, Nigerian Pidgin, Somali, Swahili, Tigrinya and Yorùbá. AfriBERTa was evaluated on NER and text classification spanning 10 languages (some of which it was not pretrained on). It outperformed mBERT and XLM-R on several languages and is very competitive overall.

Expand Abstract

language modeling

AfriSenti

AfriSenti is the largest sentiment analysis dataset for under-represented African languages, covering 110,000+ annotated tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yoruba).

The datasets are used in the first Afrocentric SemEval shared task, SemEval 2023 Task 12: Sentiment analysis for African languages (AfriSenti-SemEval). AfriSenti allows the research community to build sentiment analysis systems for various African languages and enables the study of sentiment and contemporary language use in African languages.

Expand Abstract

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

Africa is home to over 2000 languages from over six language families and has the highest linguistic diversity among all continents. This includes 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial in enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, which consists of 14 sentiment datasets of 110,000+ tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yorùbá) from four language families annotated by native speakers. The data is used in SemEval 2023 Task 12, the first Afro-centric SemEval shared task. We describe the data collection methodology, annotation process, and related challenges when curating each of the datasets. We conduct experiments with different sentiment classification baselines and discuss their usefulness. We hope AfriSenti enables new work on under-represented languages.

Expand Abstract

AfroLID: A Neural Language Identification Tool for African Languages

AfroLID is a powerful neural toolkit for African languages identification which covers 517 African languages....

AfroLID is a powerful neural toolkit for African languages identification which covers 517 African languages.

Expand Abstract

AfroLID: A Neural Language Identification Tool for African Languages

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural LID toolkit for 517 African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. When evaluated on our blind Test set, AfroLID achieves 95.89 F_1-score. We also compare AfroLID to five existing LID tools that each cover a small number of African languages, finding it to outperform them on most languages. We further show the utility of AfroLID in the wild by testing it on the acutely under-served Twitter domain. Finally, we offer a number of controlled case studies and perform a linguistically-motivated error analysis that allow us to both showcase AfroLID's powerful capabilities and limitations.

Expand Abstract

Analysing the effects of transfer learning on low-resourced named entity recognition performance

Transfer learning has led to large gains in performance for nearly all NLP tasks while making downstream models easier and faster to train. This has also been extended to low-resourced languages, with some success. We investigate the properties of transfer learning between 10 low-resourced languages, from the perspective of a named entity recognition task, specifically how much adaptive fine-tuning improves performance, the efficacy of zero-shot transfer as well as the effect of learning on the contextual embeddings computed from the model. Our results give some insight into zero-shot performance as well as the impact of different training schemes and data overlap between the training and testing languages. Particularly, we find that models with the best generalisation to other languages suffer in individual language performance, while models that perform well on a single language often do so at the expense of generalising to others. In the interest of reproducibility, we publicly release our source code and models.

Expand Abstract

Ancestor-to-Creole Transfer is Not a Walk in the Park

We aim to learn language models for Creole languages for which large volumes of data are not readily available, and therefore explore the potential transfer from ancestor languages (the 'Ancestry Transfer Hypothesis'). We find that standard transfer methods do not facilitate ancestry transfer. Surprisingly, different from other non-Creole languages, a very distinct two-phase pattern emerges for Creoles: As our training losses plateau, and language models begin to overfit on their source languages, perplexity on the Creoles drop. We explore if this compression phase can lead to practically useful language models (the 'Ancestry Bottleneck Hypothesis'), but also falsify this. Moreover, we show that Creoles even exhibit this two-phase pattern even when training on random, unrelated languages. Thus Creoles seem to be typological outliers and we speculate whether there is a link between the two observations.

Expand Abstract

ASR-Nigeria-Pidgin

This is the official repository that contains the impementation of an Automatic Speech Recognition system from Nigerian Pidgin to English.

pidgin, nigerian

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

We introduce BRIGHTER: a new emotion recognition dataset collection in 28 languages that originate from 7 distinct language families. Many of these languages are considered low-resource, and are mainly spoken in regions characterised by a limited availability of NLP resources (e.g., Africa, Asia, Latin America). Our contribuitions: A linguistically diverse multilingual dataset: BRIGHTER consists of nearly 100k emotion-annotated instances in 28 languages, predominantly from Africa, Asia, Eastern Europe, and Latin America. The dataset spans 7 language families and covers a variety of domains, including social media, speeches, news, literature, and reviews. Each instance is multi-labeled with six emotion classes — joy, sadness, anger, fear, surprise, disgust, and neutral — and annotated within four emotion intensity levels, ranging from 0 to 3. Baseline Evaluation: We provide an initial set of monolingual and crosslingual experiments, benchmarking Large Language Models (LLMs) for multi-label emotion identification and intensity prediction. Our results highlight the performance disparities across languages, showing that LLMs struggle with perceived emotions in text, especially for low-resource languages, and often perform better when prompted in English.

Expand Abstract

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

People worldwide use language in subtle and complex ways to express emotions. While emotion recognition -- an umbrella term for several NLP tasks -- significantly impacts different applications in NLP and other fields, most work in the area is focused on high-resource languages. Therefore, this has led to major disparities in research and proposed solutions, especially for low-resource languages that suffer from the lack of high-quality datasets. In this paper, we present BRIGHTER -- a collection of multilabeled emotion-annotated datasets in 28 different languages. BRIGHTER covers predominantly low-resource languages from Africa, Asia, Eastern Europe, and Latin America, with instances from various domains annotated by fluent speakers. We describe the data collection and annotation processes and the challenges of building these datasets. Then, we report different experimental results for monolingual and crosslingual multi-label emotion identification, as well as intensity-level emotion recognition. We investigate results with and without using LLMs and analyse the large variability in performance across languages and text domains. We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition and discuss their impact and utility.

Expand Abstract

Lanfrica Mailing List

Thank you for subscribing to our newsletter.

Filter Records

Languages

Tasks

Record Types

Tags