{"id":5147,"date":"2026-05-01T19:46:53","date_gmt":"2026-05-01T19:46:53","guid":{"rendered":"https:\/\/lanfrica.com\/blog\/?p=5147"},"modified":"2026-05-01T21:33:15","modified_gmt":"2026-05-01T21:33:15","slug":"licensing-as-a-barrier-to-the-usability-of-african-language-datasets","status":"publish","type":"post","link":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/","title":{"rendered":"Licensing as a Barrier to the Usability of African Language Datasets"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Artificial intelligence is transforming economies globally and has become part of daily life. It is estimated that within the next decade, AI will contribute nearly $7 trillion to global GDP. However, much of this growth is anticipated in the West, raising concerns about the inequities in AI. Africa, home to over 2,000 diverse languages, currently has only 42 of its languages represented in AI applications. As AI becomes increasingly embedded in critical sectors such as healthcare, agriculture, and education, this linguistic gap leaves billions of users underserved.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As a result, there has been a growing push to make AI more inclusive and context-aware, ensuring that more people can benefit from it. This push has created a huge demand for diverse, high-quality datasets that reflect local languages and contexts, making African language datasets crucial for inclusive AI. That demand has not gone unanswered. African research communities and organisations have spent the better part of a decade creating, cataloguing language datasets and improving their visibility so that Africa is not left behind in the age of AI. However, despite the progress, many of these datasets remain underutilised. This is because the challenge is no longer only about creation and visibility, but whether they can actually be used.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Take the example of a researcher who creates a multilingual dataset for healthcare use in the Democratic Republic of Congo. This dataset features carefully gathered and labelled data in Congolese Swahili, Lingala, and Tsiluba. The researcher makes the dataset publicly available, intending to make it highly useful in healthcare AI tools and for others to build upon.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Eventually, a healthcare company or research team, searching for language resources, discovers it. The dataset appears highly relevant and promising. However, upon closer inspection, the company finds that no license is attached. This immediately raises concerns about legal usability. Without clear terms specifying how they can use the dataset, the company ultimately decides to pass on it and continues exploring other options.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Despite its quality and relevance, the dataset cannot be used for one simple reason: it is not usable. Unfortunately, this situation is not unique.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-large-font-size\">Investigating the Barriers to Usability of African Language Datasets<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To understand what actually prevents African language datasets from being used, in February 2026, we conducted a research audit of African language datasets created between 2002 and 2026. The aim was to take a snapshot of the ecosystem and look beyond visibility. We wanted to understand not just whether a dataset could be found, but whether it carried the basic information needed for someone to safely and confidently use it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The audit was supported by <a href=\"https:\/\/www.gsma.com\/\">GSMA<\/a> and implemented by Lanfrica Labs, in collaboration with the <a href=\"https:\/\/www.masakhane.io\/masakhane-african-languages-hub\/about\">Masakhane Research Foundation<\/a>, <a href=\"https:\/\/zindi.africa\/\">Zindi<\/a>, and <a href=\"https:\/\/lelapa.ai\/\">Lelapa AI<\/a>. Since it was not possible to audit every dataset in the ecosystem, we built a representative sample of 300 datasets drawn from different sources.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To build this sample, we drew from different parts of the ecosystem:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Datasets from partner organisations and platforms known to create, host, or support African language resources, including Zindi, Lelapa AI, <a href=\"https:\/\/datacollective.mozillafoundation.org\/datasets\">Mozilla Data Collective<\/a>, and the Masakhane community.<\/li>\n\n\n\n<li>Large African language dataset projects, including <a href=\"https:\/\/docs.lanfrica.com\/insights\/understanding-the-african-next-voices-datasets\">African Next Voices<\/a>, <a href=\"https:\/\/naijavoices.com\/\">NaijaVoices<\/a>, and <a href=\"https:\/\/aclanthology.org\/2023.jlcl-2.1\/\">KenCorpus<\/a>, to understand how major dataset-building efforts handle usability and documentation.<\/li>\n\n\n\n<li>Random samples from <a href=\"https:\/\/lanfrica.com\/en\/discover\">Lanfrica\u2019s catalogue<\/a>, allowing us to pick from other corners of the ecosystem and reduce bias toward only the most popular, well-documented, or widely cited datasets.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">This gave us a broad enough sample to study not only whether African language datasets exist, but whether they are ready to be used in real research, development, and AI deployment contexts.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For each dataset, our skilled annotators looked for five core pieces of information: its original source, creation date, publication date, hosting platform, and license type. These metadata were chosen because together they provide important details about a dataset&#8217;s provenance and whether it can be legally reused.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To ensure that the most accurate and complete information was captured, the annotators looked beyond the details immediately visible on dataset hosting platforms. They traced datasets back to their original sources, reviewed project documentation and dataset descriptions, and cross-referenced associated research papers or publications where necessary.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-large-font-size\">Beneath the surface lies a fragmented licensing ecosystem<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In our research, we observed that datasets created in the early 2010s were typically shared without explicit licenses. However, around 2017, as awareness of open data began to grow in Africa, more dataset creators started to include licenses when publishing their work. Despite this progress, it hasn&#8217;t been consistent. Over the past five years, the visibility of licenses has varied, averaging around 50%, with no clear upward trend in sight.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1920\" height=\"1080\" src=\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/4-1.png\" alt=\"License visibility has varied over time since 2002 to early 2026.\" class=\"wp-image-5185\" srcset=\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/4-1.png 1920w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/4-1-300x169.png 300w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/4-1-1024x576.png 1024w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/4-1-768x432.png 768w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/4-1-1536x864.png 1536w\" sizes=\"(max-width: 1920px) 100vw, 1920px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" style=\"font-size:28px\">A number of datasets are practically unusable<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">One in every three datasets in our audit did not include a license, leaving them legally ambiguous, as the absence of licensing information does not carry a universal meaning.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">US-based companies, for example, would view a dataset without an attached license as the owner retaining all rights, so reuse requires explicit permission. In contrast, within the EU, the same situation might be considered an implicit dedication to the public domain. This ambiguity can discourage dataset users from reusing the data altogether.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The consequences go beyond legal implications. Missing licenses can slow down model development, data sharing, benchmarking, and the creation of more robust datasets. Every unlicensed dataset is a valuable resource that the larger AI community cannot safely build on.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" style=\"font-size:24px\">  What leads to license omissions?<\/h4>\n\n\n\n<h4 class=\"wp-block-heading\" style=\"font-size:22px\"><em>     1. Platform design gaps<\/em><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Some commonly used hosting platforms lack guidance for dataset creators on ensuring data usability. GitHub, for example, does not require a license field or provide structured metadata or documentation frameworks.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Microsoft\u2019s AI for Good Research Lab conducted an audit on data uploaded to GitHub between 2008 and 2023 and found that 75% of the datasets lacked clear licensing. With GitHub being the largest hosting platform in our sample, accounting for 93 of the 292 datasets, this pattern is likely reflected in our sample. This highlights how platform design choices, such as the absence of mandatory licensing fields, can systematically contribute to missing licenses at scale.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" style=\"font-size:22px\"><em>    2. Legal uncertainty on ownership<\/em><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Dataset creators are often uncertain about their legal rights to license the data they\u2019ve compiled. This is particularly common for datasets compiled from web sources, crowdsourcing platforms, or speech participants. Where copyright rules are unclear, inconsistent, or poorly enforced, determining ownership and licensing rights becomes difficult. As a result, creators may opt to forgo specifying licenses entirely, which can lead to significant legal ambiguities.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" style=\"font-size:22px\"><em>    3. Misconceptions About \u201cOpen\u201d Data<\/em><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Some researchers, with the intention to encourage wider collaboration and reuse, assume that omitting a license makes their dataset freely usable. However, in practice, the absence of a license creates legal ambiguity rather than openness, unintentionally restricting its use.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" style=\"font-size:28px\">Commonly used licenses<\/h3>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" width=\"1920\" height=\"1080\" src=\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-1-1.png\" alt=\"\" class=\"wp-image-5186\" srcset=\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-1-1.png 1920w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-1-1-300x169.png 300w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-1-1-1024x576.png 1024w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-1-1-768x432.png 768w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-1-1-1536x864.png 1536w\" sizes=\"(max-width: 1920px) 100vw, 1920px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The licenses used ranged from fully open to permissive and proprietary, reflecting the diverse intentions and contexts of dataset creators. Open and permissive licenses were the most common. Creative Commons alone accounted for 143 out of 202 licensed datasets, with the MIT License appearing in 13 and Apache 2.0 in 9. These licenses reduce legal barriers by enabling broad access, reuse, and modification, forming the foundation of open AI ecosystems.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">At the other end of the spectrum, more restrictive frameworks were also present. The Linguistic Data Consortium license appeared in 19 datasets, IARPA Babel licenses in 4, and 3 datasets used custom arrangements, each carrying specific conditions that limit how and by whom the data can be used.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Even among datasets that do carry licenses, the lack of standardisation creates its own barrier. Developers building AI systems often combine multiple datasets to improve accuracy and robustness, and navigating conflicting usage terms adds unnecessary complexity to a process that is already technically demanding.<\/p>\n\n\n\n<figure class=\"styled-quote\">\n  <blockquote>\n    <p>\n      Across the Zindi community, we\u2019ve seen firsthand that the challenge is no longer just building African language datasets\u2014it&#8217;s making them truly usable. Without clear, consistent licensing, even the most valuable datasets risk sitting idle instead of powering research and real-world AI solutions.\n    <\/p>\n  <\/blockquote>\n\n  <figcaption>\n    &#8211; Megan Yates, CTO &amp; Co-Founder, Zindi\n\n  <\/figcaption>\n<\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" style=\"font-size:28px\">The communities behind Africa\u2019s NLP progress<\/h3>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" width=\"1920\" height=\"1080\" src=\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-2-1.png\" alt=\"The vibrant ecosystem of dataset creators in Africa.\" class=\"wp-image-5188\" srcset=\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-2-1.png 1920w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-2-1-300x169.png 300w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-2-1-1024x576.png 1024w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-2-1-768x432.png 768w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-2-1-1536x864.png 1536w\" sizes=\"(max-width: 1920px) 100vw, 1920px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Since 2019, grassroots research communities have been taking the initiative to build the tools and resources that form the backbone of the African NLP ecosystem. That ecosystem has continued to grow, with more communities and researchers collaborating to address the scarcity, quality, and evolution of African language resources.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Our audit identified over 70 distinct dataset creators among the 292 datasets we examined. These creators range from globally recognised open data initiatives, such as <a href=\"https:\/\/commonvoice.mozilla.org\/en\">Mozilla Common Voice<\/a>, an open-source voice dataset, powered by diverse voices of volunteer contributors around the world, and the Linguistic Data Consortium, which supports language-related research and technology development by producing and distributing linguistic resources.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Additionally, we found African-driven NLP communities like Masakhane, which focuses on machine translation research and created resources for more than 30 African languages through open-source collaboration, and Zindi, which hosts regular challenges aimed at solving real NLP problems on the continent. Also noteworthy were universities across three continents and smaller specialised organisations that are diligently working to place African languages at the forefront of AI development.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The spirit and resolve are clearly there. What is missing is the infrastructure, governance, and policy support needed to turn this work into sustainable, usable, and reproducible datasets.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-large-font-size\">What is the way forward?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The global market for licensed AI training data was valued at around $4.8 billion in 2025 and is expected to grow to over $22.6 billion by 2034. Yet Africa, Latin America, and the Middle East together account for only 9.1% of this market, despite rising demand for diverse and multilingual datasets. Without deliberate action, Africa risks remaining on the margins of an economy it is already helping to build while capturing little of its value.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Closing this gap requires coordinated action between researchers, platforms, organisations, and communities.<\/p>\n\n\n\n<figure class=\"styled-quote\">\n  <blockquote>\n    <p>\n      The African language gap in AI is solvable, but not by any single company acting alone. Through the African AI Language Initiative, the GSMA is convening the operators, developers, and data partners who can turn fragmented efforts into shared infrastructure \u2014 and those who move now will help shape a market of more than a billion users for the next decade.\n    <\/p>\n  <\/blockquote>\n\n  <figcaption>\n    &#8211; Louis Powell, Director \u2013 AI Technologies @GSMA\n\n  <\/figcaption>\n<\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" style=\"font-size:24px\"><strong>Improving Awareness of Data Ownership and Licensing&nbsp;<\/strong><\/h3>\n\n\n\n<ol class=\"wp-block-list\"><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">One of the clearest findings from this audit is that missing licenses are often the result of lack of knowledge rather than deliberate choice. Many dataset creators are unaware of available licensing frameworks and more importantly, how licensing decisions affect data ownership.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is why licensing awareness must go beyond usability alone and toward empowering dataset creators to make informed decisions about ownership, control, and long-term value. Initiatives from communities such as Masakhane, Zindi, and Mozilla are starting to move the ecosystem in this direction. However, broader support and awareness are still needed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" style=\"font-size:24px\"><strong>Collaborating with Hosting Platforms to Improve Dataset Documentation<\/strong><\/h3>\n\n\n\n<ol start=\"2\" class=\"wp-block-list\"><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">African language datasets are hosted across a wide range of platforms, and understanding where creators tend to publish is an important step toward improving dataset documentation and licensing. Our audit revealed some of the major repositories for African language datasets.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"1080\" src=\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-3-1.png\" alt=\"Where African dataset creators host their datasets\" class=\"wp-image-5187\" srcset=\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-3-1.png 1920w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-3-1-300x169.png 300w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-3-1-1024x576.png 1024w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-3-1-768x432.png 768w, https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/image-3-1-1536x864.png 1536w\" sizes=\"(max-width: 1920px) 100vw, 1920px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">These platforms handle documentation and licensing differently, which creates friction for dataset creators, particularly because there is often a lack of clear guidance to help navigate these differences. As a result, inconsistencies in licensing and documentation crop up, ultimately affecting dataset usability.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To address this, research groups and organisations should work with these repositories to improve documentation processes. This can be done by introducing guided prompts on licensing options and documentation standards at the point of upload, which could reduce the occurrence of missing or unclear licences.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" style=\"font-size:24px\"><strong>Support Mutually Beneficial Licensing Initiatives<\/strong><\/h3>\n\n\n\n<ol start=\"3\" class=\"wp-block-list\"><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Current licensing frameworks were not designed for the collaborative, community-led reality of African language dataset creation. Much of this work happens under financial and technical constraints, yet the communities involved are rarely recognised or rewarded when their data is used.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This makes it urgent to support new licensing approaches that not only promote open access but also focus on data ownership, fairness, and long-term economic benefits for African researchers and communities.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/licensingafricandatasets.com\/\">The NOODL (Nwulite Obodo Open Data License)<\/a> aims to balance open access with community benefit, ensuring contributors are not excluded from the value their data creates. <a href=\"https:\/\/aclanthology.org\/2025.acl-long.1487.pdf\">The Esethu Framework<\/a>, developed by Lelapa AI in collaboration with <a href=\"https:\/\/waywithwords.ai\/\">Way With Words<\/a> and <a href=\"https:\/\/www.dsfsi.co.za\/\">Data Science for Social Impact (DSFSI)<\/a>, introduces an economic model where licensing revenue is reinvested into dataset expansion, supporting both data growth and community returns. Additionally, projects such as NaijaVoices have adopted custom licensing models that protect community interests while enabling reuse of datasets.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Supporting and funding these approaches allows African researchers to be active participants in the AI value chain.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" style=\"font-size:24px\"><strong>Harmonise copyright law regionally&nbsp;<\/strong><\/h3>\n\n\n\n<ol start=\"4\" class=\"wp-block-list\"><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Copyright laws vary significantly across African countries, which creates uncertainty for both dataset creators and users. This fragmentation makes it difficult to understand ownership, reuse rights, and cross-border data sharing, especially for datasets that span multiple languages and jurisdictions.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Addressing this will require stronger coordination at the regional level and building on the African Union Data Policy Framework\u2019s focus on trust, interoperability, and coordination. This would reduce uncertainty, make cross-border collaboration easier, and support safer, more reliable reuse of shared language resources.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Artificial intelligence is transforming economies globally and has become part of daily life. It is estimated that within the next decade, AI will contribute nearly $7 trillion to global GDP. However, much of this growth is anticipated in the West, raising concerns about the inequities in AI. Africa, home to over 2,000 diverse languages, currently [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":5171,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[89,1],"tags":[112,50,114,115,113],"class_list":["post-5147","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-community","category-lanfrica","tag-african-language-datasets","tag-african-languages","tag-data-ownership","tag-gsma","tag-licensing"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.9 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Licensing as a Barrier to the Usability of African Language Datasets - Lanfrica Blog<\/title>\n<meta name=\"description\" content=\"Our study into African language datasets found a key critical piece that is affecting how and if they can be used in real-world applications\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Licensing as a Barrier to the Usability of African Language Datasets - Lanfrica Blog\" \/>\n<meta property=\"og:description\" content=\"Our study into African language datasets found a key critical piece that is affecting how and if they can be used in real-world applications\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/\" \/>\n<meta property=\"og:site_name\" content=\"Lanfrica Blog\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-01T19:46:53+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-01T21:33:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/The-missing-piece-in-usability-of-african-language-datasets.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"2240\" \/>\n\t<meta property=\"og:image:height\" content=\"1260\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Lanfrica\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@lanfrica\" \/>\n<meta name=\"twitter:site\" content=\"@lanfrica\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Lanfrica\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/\"},\"author\":{\"name\":\"Lanfrica\",\"@id\":\"https:\/\/lanfrica.com\/blog\/#\/schema\/person\/6106dfacf7d12e2e642e8b269bc6d08d\"},\"headline\":\"Licensing as a Barrier to the Usability of African Language Datasets\",\"datePublished\":\"2026-05-01T19:46:53+00:00\",\"dateModified\":\"2026-05-01T21:33:15+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/\"},\"wordCount\":2189,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/lanfrica.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/The-missing-piece-in-usability-of-african-language-datasets.jpeg\",\"keywords\":[\"African language datasets\",\"african languages\",\"Data ownership\",\"GSMA\",\"Licensing\"],\"articleSection\":[\"Community\",\"Lanfrica\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/\",\"url\":\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/\",\"name\":\"Licensing as a Barrier to the Usability of African Language Datasets - Lanfrica Blog\",\"isPartOf\":{\"@id\":\"https:\/\/lanfrica.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/The-missing-piece-in-usability-of-african-language-datasets.jpeg\",\"datePublished\":\"2026-05-01T19:46:53+00:00\",\"dateModified\":\"2026-05-01T21:33:15+00:00\",\"description\":\"Our study into African language datasets found a key critical piece that is affecting how and if they can be used in real-world applications\",\"breadcrumb\":{\"@id\":\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#primaryimage\",\"url\":\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/The-missing-piece-in-usability-of-african-language-datasets.jpeg\",\"contentUrl\":\"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/The-missing-piece-in-usability-of-african-language-datasets.jpeg\",\"width\":2240,\"height\":1260,\"caption\":\"The missing piece in usability of African language datasets\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/lanfrica.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Licensing as a Barrier to the Usability of African Language Datasets\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lanfrica.com\/blog\/#website\",\"url\":\"https:\/\/lanfrica.com\/blog\/\",\"name\":\"Lanfrica Blog\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/lanfrica.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lanfrica.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/lanfrica.com\/blog\/#organization\",\"name\":\"Lanfrica\",\"url\":\"https:\/\/lanfrica.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lanfrica.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/ww2.lanfrica.com\/wp-content\/uploads\/2022\/05\/cropped-favicon-1.png\",\"contentUrl\":\"https:\/\/ww2.lanfrica.com\/wp-content\/uploads\/2022\/05\/cropped-favicon-1.png\",\"width\":512,\"height\":512,\"caption\":\"Lanfrica\"},\"image\":{\"@id\":\"https:\/\/lanfrica.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/lanfrica\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/lanfrica.com\/blog\/#\/schema\/person\/6106dfacf7d12e2e642e8b269bc6d08d\",\"name\":\"Lanfrica\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lanfrica.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/83f4a38b43593dcd229a29dc1c23bbc0ca0dd5cc875c1dca530aae3be531325e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/83f4a38b43593dcd229a29dc1c23bbc0ca0dd5cc875c1dca530aae3be531325e?s=96&d=mm&r=g\",\"caption\":\"Lanfrica\"},\"sameAs\":[\"https:\/\/lanfrica.com\/blog\"],\"url\":\"https:\/\/lanfrica.com\/blog\/author\/lanfrica\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Licensing as a Barrier to the Usability of African Language Datasets - Lanfrica Blog","description":"Our study into African language datasets found a key critical piece that is affecting how and if they can be used in real-world applications","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/","og_locale":"en_US","og_type":"article","og_title":"Licensing as a Barrier to the Usability of African Language Datasets - Lanfrica Blog","og_description":"Our study into African language datasets found a key critical piece that is affecting how and if they can be used in real-world applications","og_url":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/","og_site_name":"Lanfrica Blog","article_published_time":"2026-05-01T19:46:53+00:00","article_modified_time":"2026-05-01T21:33:15+00:00","og_image":[{"width":2240,"height":1260,"url":"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/The-missing-piece-in-usability-of-african-language-datasets.jpeg","type":"image\/jpeg"}],"author":"Lanfrica","twitter_card":"summary_large_image","twitter_creator":"@lanfrica","twitter_site":"@lanfrica","twitter_misc":{"Written by":"Lanfrica","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#article","isPartOf":{"@id":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/"},"author":{"name":"Lanfrica","@id":"https:\/\/lanfrica.com\/blog\/#\/schema\/person\/6106dfacf7d12e2e642e8b269bc6d08d"},"headline":"Licensing as a Barrier to the Usability of African Language Datasets","datePublished":"2026-05-01T19:46:53+00:00","dateModified":"2026-05-01T21:33:15+00:00","mainEntityOfPage":{"@id":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/"},"wordCount":2189,"commentCount":0,"publisher":{"@id":"https:\/\/lanfrica.com\/blog\/#organization"},"image":{"@id":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#primaryimage"},"thumbnailUrl":"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/The-missing-piece-in-usability-of-african-language-datasets.jpeg","keywords":["African language datasets","african languages","Data ownership","GSMA","Licensing"],"articleSection":["Community","Lanfrica"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/","url":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/","name":"Licensing as a Barrier to the Usability of African Language Datasets - Lanfrica Blog","isPartOf":{"@id":"https:\/\/lanfrica.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#primaryimage"},"image":{"@id":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#primaryimage"},"thumbnailUrl":"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/The-missing-piece-in-usability-of-african-language-datasets.jpeg","datePublished":"2026-05-01T19:46:53+00:00","dateModified":"2026-05-01T21:33:15+00:00","description":"Our study into African language datasets found a key critical piece that is affecting how and if they can be used in real-world applications","breadcrumb":{"@id":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#primaryimage","url":"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/The-missing-piece-in-usability-of-african-language-datasets.jpeg","contentUrl":"https:\/\/lanfrica.com\/blog\/wp-content\/uploads\/2026\/05\/The-missing-piece-in-usability-of-african-language-datasets.jpeg","width":2240,"height":1260,"caption":"The missing piece in usability of African language datasets"},{"@type":"BreadcrumbList","@id":"https:\/\/lanfrica.com\/blog\/licensing-as-a-barrier-to-the-usability-of-african-language-datasets\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/lanfrica.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Licensing as a Barrier to the Usability of African Language Datasets"}]},{"@type":"WebSite","@id":"https:\/\/lanfrica.com\/blog\/#website","url":"https:\/\/lanfrica.com\/blog\/","name":"Lanfrica Blog","description":"","publisher":{"@id":"https:\/\/lanfrica.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lanfrica.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/lanfrica.com\/blog\/#organization","name":"Lanfrica","url":"https:\/\/lanfrica.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lanfrica.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/ww2.lanfrica.com\/wp-content\/uploads\/2022\/05\/cropped-favicon-1.png","contentUrl":"https:\/\/ww2.lanfrica.com\/wp-content\/uploads\/2022\/05\/cropped-favicon-1.png","width":512,"height":512,"caption":"Lanfrica"},"image":{"@id":"https:\/\/lanfrica.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/lanfrica"]},{"@type":"Person","@id":"https:\/\/lanfrica.com\/blog\/#\/schema\/person\/6106dfacf7d12e2e642e8b269bc6d08d","name":"Lanfrica","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lanfrica.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/83f4a38b43593dcd229a29dc1c23bbc0ca0dd5cc875c1dca530aae3be531325e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/83f4a38b43593dcd229a29dc1c23bbc0ca0dd5cc875c1dca530aae3be531325e?s=96&d=mm&r=g","caption":"Lanfrica"},"sameAs":["https:\/\/lanfrica.com\/blog"],"url":"https:\/\/lanfrica.com\/blog\/author\/lanfrica\/"}]}},"_links":{"self":[{"href":"https:\/\/lanfrica.com\/blog\/wp-json\/wp\/v2\/posts\/5147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lanfrica.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lanfrica.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lanfrica.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lanfrica.com\/blog\/wp-json\/wp\/v2\/comments?post=5147"}],"version-history":[{"count":27,"href":"https:\/\/lanfrica.com\/blog\/wp-json\/wp\/v2\/posts\/5147\/revisions"}],"predecessor-version":[{"id":5190,"href":"https:\/\/lanfrica.com\/blog\/wp-json\/wp\/v2\/posts\/5147\/revisions\/5190"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lanfrica.com\/blog\/wp-json\/wp\/v2\/media\/5171"}],"wp:attachment":[{"href":"https:\/\/lanfrica.com\/blog\/wp-json\/wp\/v2\/media?parent=5147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lanfrica.com\/blog\/wp-json\/wp\/v2\/categories?post=5147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lanfrica.com\/blog\/wp-json\/wp\/v2\/tags?post=5147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}