First things to do:
- Join our WhatsApp group to facilitate communication: https://chat.whatsapp.com/F7wUN10vLjuJWwKnApaNNk
- Click here to watch the recording of the info session to learn more details of the project.
Background: Web Languages Project
Welcome! This is a crowd-sourced effort to improve crawling of low-resource languages. This dataset is public. Common Crawl recognizes a lot of languages, and we can see that we don’t have enough of languages. We are interested in languages from all over the world. If you choose to help, you’ll be helping create lists of websites related to languages that you read or speak.
Why This Matters for Nigeria’s Digital Landscape
Nigeria, with over 500 indigenous languages, represents one of the world’s most linguistically diverse nations. While some of our languages like Yoruba, Hausa, and Igbo have established online presence, many others need better documentation and digital representation. In this short event-sprint, we aim to create a comprehensive map of Nigerian languages on the web, supporting language preservation and digital inclusion.
What We Need You To Do
Join us for a collaborative documentation sprint focused on discovering and recording online spaces where Nigerian languages and/or cultures live. We’ll document
- News websites in Nigerian languages
- Cultural and historical resources
- Government services offering local language options
- Educational platforms
- Community forums and social media spaces
- Digital language learning resources
How to Participate
- Go to this form
- Choose the Nigerian language of your choice. There are over 500 options there, use the search to filter for the language you are looking for.
- Then you will be presented with different categories. For each category, put urls that contain information about that category in the language of your choice. For example, for the “News” section, put urls to websites that talk about the news in your language.
- Put as many links as you can. Separate them with new line
- If you don’t know any links for a section, just leave it blank
Below is an example of what the form looks like for the “News” section:
What happens to my contributions?
- Your submissions will be added to the Web of Languages project for that particular project. All contributions are released under the CC0 license.
- If you indicate to be recognized as the contributor, your name (and link) will be displayed on the GitHub README for the language.
- Your submissions will also be linked on Lanfrica Records, in an effort to link African digital resources
Remuneration
- This is a voluntary position, so there will be no financial compensation of any form.
- We are thinking of giving out certificates, if that will be useful to the community.
Deadline
- Friday 20th, December 2024 at 23:59
Join Us
Help us create a digital map of Nigeria’s linguistic diversity! Whether you speak (or don’t speak but know the culture of) Hausa, Igbo, Yoruba, Ibibio, Edo, Fulfulde, Kanuri, Tiv, or any other Nigerian language, your knowledge is valuable for this project.
About Lanfrica
Lanfrica catalogues and links African language resources in order to mitigate the difficulty encountered in discovering African works.
About Common Crawl
We believe that everyone should have the opportunity to indulge their curiosities, analyze the world, and pursue brilliant ideas. Small startups or even individuals can now access high quality crawl data that was previously only available to large search engine corporations.
Researchers, entrepreneurs, and developers gain unrestricted access to a wealth of information, enabling them to explore, analyze, and create novel applications and services. This, in turn, can lead to the development of groundbreaking technologies, data-driven solutions, and insights that would have otherwise been inaccessible or expensive to obtain.
Common Crawl has revolutionized access to web data, providing an open repository that anyone can use. This extensive database allows researchers, developers, and analysts to access vast amounts of web information without the need for costly web crawling or data gathering. The availability of this data fosters innovation and helps drive forward various technological advancements.