Please, consider giving your feedback on using Lanfrica so that we can know how best to serve you. To get started, .
X

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

We introduce BRIGHTER: a new emotion recognition dataset collection in 28 languages that originate from 7 distinct language families. Many of these languages are considered low-resource, and are mainly spoken in regions characterised by a limited availability of NLP resources (e.g., Africa, Asia, Latin America). Our contribuitions: A linguistically diverse multilingual dataset: BRIGHTER consists of nearly 100k emotion-annotated instances in 28 languages, predominantly from Africa, Asia, Eastern Europe, and Latin America. The dataset spans 7 language families and covers a variety of domains, including social media, speeches, news, literature, and reviews. Each instance is multi-labeled with six emotion classes — joy, sadness, anger, fear, surprise, disgust, and neutral — and annotated within four emotion intensity levels, ranging from 0 to 3. Baseline Evaluation: We provide an initial set of monolingual and crosslingual experiments, benchmarking Large Language Models (LLMs) for multi-label emotion identification and intensity prediction. Our results highlight the performance disparities across languages, showing that LLMs struggle with perceived emotions in text, especially for low-resource languages, and often perform better when prompted in English.


Link Other Links

CONNECTED RECORDS

LANGUAGES

TASKS

TAGS