The impact of NLP progress is moving fast, but the world tends to be mostly moved by larger trends (like very huge language models/datasets or research from prestigious conferences). As a consequence, smaller, yet impactful efforts do not get much attention. This talk series provides a platform for anyone to share their efforts in NLP, no matter how small. We love things including one or more African or low-resource languages, but we include everything!

Lanfrica Talks

Showcasing efforts in language technologies.



About Lanfrica Talks

Why are we doing this

The impact of language technologies is growing quickly, but the world tends to be mostly moved by larger trends (like very large language models/datasets or research papers from prestigious conferences). As a consequence, smaller, yet impactful efforts do not get much attention.

The Lanfrica Talks series highlights and showcases language technology efforts (research, projects, software, applications, datasets, models, initiatives, etc.) geared towards under-represented languages around the world! We aim to create an online library of knowledge around language technologies and the advancement of African low-resource languages.

Is this only for African languages

No.

We are equally interested in efforts targeting (or that can be transferred to) low-resource languages (these are languages with not much data, societal/research efforts or technologies and recognition) and endangered languages.

Who can give a talk here

We try to be as inclusive as possible.
From students, beginners/experts in NLP, non-NLP, linguists, etc….to senior researchers, lecturers, engineers, scientists, etc. We do not place constraint on your level or field of research.

What can I present/ talk about

There are many things you can give a presentation on.

Here are some examples:
– a research you are working on (whether it is finished or a work in progress)
– some projects you or your organization/institution did/is doing involving African languages
– a pitching session (for example a 5 min pitch after which you get the opinions of the audience)
– a dataset you created (no matter how small)
– thesis presentations (Bachelors, Masters, PhD)
– a talk about a critical question and follow up discussions
– a coding tutorial (for example a project you did on translation or ASR from English to an African language.
– a deep dive into some theory
– a linguistics focused session
– any more ideas you may have

Why should I give a presentation/talk here

There are several advantages you get from giving a talk here:
– advertising your work/research: you especially benefit from a diverse audience involving both people in your field of research and people in other fields -an audience made up of students, researchers, engineers, linguists, etc. This enables you to get very useful, diverse feedback. Also, sharing what you are doing is the best way to get the world to know and benefit from your efforts.
– finding collaboration and/or funding opportunities for your project/research/idea.
– getting useful feedback: a unique feature of this space is that you also get to meet indigenous communities in Africa and beyond who might be directly impacted by your project (and therefore are best positioned to give you useful feedback).
– putting your work (or yourself) on the map: as we record all talks and put them on YouTube, your effort will be somewhere easily accessible many years later!
– practicing a conference/workshop talk: this can be a good venue to practice a presentation you plan to give at another venue. Practice makes perfect!

Why should I attend this talk

– you learn and expand your knowledge: by attending this talk, you get to know about important contributions, ideas, resources, etc. in the space of African languages. For the first time, there is a place for people from different fields (linguistics, science, philosophy, cultural studies, NLP, computer scientists, etc.) to learn from one another.
– networking opportunities: this offers you the opportunity to meet people from different parts of the world, working on exciting topics involving African languages. All this can lead to getting project/research collaborations, internships and other job opportunities.

I want to present/give a talk. What do I do.

Please reach out (via email) or fill this form.

We are here to help, so if you have issues or questions, feel free to write us ([email protected]).

What is the format of each talk/presentation

Time: the time for each talk is flexible. Depending on what you want to present/talk about, we can do up to 30mins for short talks and up to 50mins for long talks. This is just an estimate – we will decide together with You what duration is good for you.

Frequency: our talks are bi-weekly.

Presentation style: we accommodate any style you prefer. We’ve found that using slides with adequate illustrations (pictures, less text) works well.

Recording: a major goal of this talk is to showcase your work for the (now and future) general public. Therefore, unless you strongly disagree, we will be recording the presentation/talk and uploading it to our YouTube Channel. We record all talks so people in the future can benefit from what you have to offer (that is our main goal!).

I have more questions to ask

That’s wonderful! We are looking forward to questions.

You can reach us in any of the following ways that is convenient for you:
Join our Slack and ask your questions in the channel called talks. That’s where we meet and discuss things related to Lanfrica Talks.
Join our mailing list and ask your questions as an email conversation.
– Send an email directly to [email protected]


Upcoming Talk

Abstract

NLP systems are limited by the availability of text data, and because machine-readable text exists only in a few hundred languages, most of the world’s languages are under-represented in modern language technologies.

Text data exists in many more languages! However, it is locked away in printed books and handwritten documents, and training a high-performance optical character recognition (OCR) system to extract the text is challenging for most under-resourced languages.

In this talk, I will describe two methods for improving text recognition in low-resource settings using automatic OCR post-correction. The first is a multi-source encoder-decoder model with structural biases to efficiently learn from limited data. The second is a semi-supervised learning technique that uses raw unlabeled images to improve performance without additional manual annotation. The method combines self-training with automatically derived lexica through the use of weighted finite-state automata (WFSA) to improve post-correction. I will present empirical evaluation on multiple under-resourced languages to illustrate the effectiveness of the proposed approaches as well as future applications in using the extracted texts to expand multilingual NLP models to many more languages.

Bio
Shruti Rijhwani is a Research Scientist at Google. She recently graduated with a Ph.D. from Carnegie Mellon University, where her thesis focused on improving optical character recognition for endangered languages (which is what she will be talking about in this presentation!). Her broad research interests lie in improving language technologies for under-represented languages and communities. She was awarded the Bloomberg Fellowship to support her Ph.D. research and was named to the Forbes 30 Under 30 list in Science for her work on building NLP for endangered languages.


Schedule


Recorded Talks

You can either watch our talks on our YouTube channel or listen to their corresponding podcast episodes on you preferred platform. To get the complete visual experience, we recommend watching the presentation slides on our YouTube channel.


Meet the organizing team

Chris Emezue spends his time between studying, doing research on structure learning/ causal inference at the Mila Quebec AI Institute, ML advocacy at Hugging Face, AfricaNLP research with Masakhane and improving Lanfrica.
Linkedin, Twitter

Bonaventure Dossou holds a Bachelor of Science in Mathematics and a Master of Science in Data Engineering. He is a Deep Learning Researcher at Mila Quebec AI Institute, working under the supervision of Professor Yoshua Bengio. He previously interned at Roche Canada, and at Google AI. His research areas include Machine & Deep learning (and its application in computer vision, natural language processing for Healthcare, and African Languages).
Linkedin, Twitter, Facebook

Olanrewaju Samuel is a Linguist who aims at creating applicable dataset to solve NLP problems. He does the phonetics of Linguistic phenomena.
Linkedin, Twitter