Lanfrica Data Governance Policy

Last updated: 7th May, 2025

A. Introduction

This Data Governance Policy is meant to help Lanfrica users and resource/data owners to understand what African language resources we link and catalogue on Lanfrica, why we link and catalogue African language resources, how we link and catalogue the resources i.e., how we identify and select links to provide on Lanfrica, how we protect the resources, and how we ensure ethical handling of resources linked/catalogued on Lanfrica while fostering a community-driven approach to African language resource sharing.

B. About Lanfrica

Our digital world is a rich tapestry of ideas, languages, cultures, and knowledge. However, our access to and understanding of these resources is skewed; some gain significant visibility, while others remain under-represented, and obscure (even when available on the web). Our understanding is largely defined by what's findable. In today's fast-paced digital age, online discoverability is essential: if information cannot be found, it is often perceived as nonexistent and consequently under-utilized. Lanfrica connects and organizes under-represented knowledge. It connects, and aggregates hidden, valuable digital resources from diverse sources, making them findable. In so doing, Lanfrica accelerates the discovery and utilization of language resources, which is key to unlocking breakthroughs in science and language technology for under-served language communities.

Our strategy is to aggregate "resources" from multiple knowledge sources (e.g. repositories), such as arXiv, AfricaArXiv, and Zenodo, for purposes of ensuring a broad collection of African language data. We work to support NLP research, data accessibility, and responsible data governance for African languages. This, we do with an aim of mitigating the difficulty encountered in discovering African language resources by creating a centralised hub. If a researcher for example is looking for a natural language dataset, Lanfrica will point them to the different sources on the web that have such datasets in the language.

Our strategy is to aggregate "resources" from multiple knowledge sources (e.g. repositories), such as arXiv, AfricaArXiv, and Zenodo, for purposes of ensuring a broad collection of African language data. We work to support NLP research, data accessibility, and responsible data governance for African languages. This, we do with an aim of mitigating the difficulty encountered in discovering African language resources by creating a centralised hub. If a researcher for example is looking for a natural language dataset, Lanfrica will point them to the different sources on the web that have such datasets in the language.

C. Scope of Policy and Terms

This Data Governance Policy applies to:
  1. Lanfrica users i.e., any individual or entity accessing and utilizing the catalogued resources for research, development, or other purposes.
  2. Data/resource owners whose resources/datasets have been linked/catalogued on the platform.
  3. Lanfrica platform administrators responsible for curating, managing, and ensuring compliance with data governance principles.
  4. Partner organisations i.e., entities collaborating with Lanfrica to expand and enhance the platform's data infrastructure.
Terms used in this Data Governance Policy shall have the following meanings:
  • "Resources" means resources listed on Part D.
  • "User" means a person who uses the Lanfrica platform.
  • "NLP Researchers" means a person who creates algorithms and models that allow machines to comprehend, process, and produce human language, facilitating interaction between humans and computers.

D. Resources Linked in Lanfrica

Lanfrica aggregates and organizes the following resources:
  1. African linguistic datasets
  2. African language models
  3. Text corpora
  4. African language dictionaries
  5. African language translation tools
  6. African language learning tools
  7. African language libraries
  8. African language news sites
  9. African language Bible translations
  10. Computational linguistic publications
  11. Publications on sociology, legal and political
  12. Media coverage on African languages
  13. African language policy documents

E. Methodology Applied in Linking and Cataloguing Resources

To curate and identify African language resources for Lanfrica, we begin the process by defining resource categories to ensure a clear scope. We classify resources into corpora (text, speech, and multimodal data), NLP models, lexical resources (dictionaries and word embeddings), research papers, and practical tools such as APIs and spell checkers. This categorization helps us to systematically organise and link relevant materials on the platform.

The next step we take in this process is source identification and data collection. This involves gathering resources from various reputable sources. These include: academic databases like Google Scholar, ACL Anthology, and ArXiv; institutional repositories such as universities, research centers, AI labs; government and NGO reports; and open-source platforms such as Hugging Face and GitHub. Community contributions from researchers, linguists, and NLP practitioners are also essential, as are insights from social media and blogs where language research is discussed.

Once resources are identified, we undertake verification and quality assessment to ensure credibility and usefulness. This process includes checking the authenticity of the source, evaluating the relevance of the resource to African languages, and verifying language coverage, especially for underrepresented languages. Additionally, licensing terms and accessibility are reviewed to ensure ethical usage.

To maintain consistency, we structure data and standardize metadata. In doing this, each resource is tagged with key metadata such as language(s) covered, source, license type, availability, and last updated date.

F. Ownership of Resources

We acknowledge and respect the collaborative efforts and creative contributions of data owners - communities, researchers, institutions, contributors, linguists, data scientists and users who's resources have been linked and catalogued on the platform. Accordingly, we do not claim ownership to the data linked and catalogued on the platform. Resources linked/catalogued on the Lanfica platform remain the property of the original data providers as we do not claim ownership over the resources, but we serve a facilitating role for providing broader accessibility to the African community. If a data contributor requests removal of their content, we will process such requests in a timely manner.

G. Intellectual Property (Limitation of Liability)

Lanfrica only links and catalogues African language resources housed in other platforms/websites. For this reason, we will not be liable for copyright infringement by reason of linking users to online resources containing infringing material or infringing activity, if:

  • We do not have actual knowledge that the resource or activity is infringing;
  • In the absence of such knowledge, we are not aware of facts or circumstances from which infringing activity is apparent;
  • Upon obtaining such knowledge or awareness, we act expeditiously to remove or disable the resource;
  • We do not receive financial benefit directly attributable to the notification of claimed infringement;
  • Upon notification of claimed infringement, we respond expeditiously to remove, or disable access to, the resource that is claimed to be infringing or to be the subject of infringing activity.

Additionally, we do not claim copyright ownership of the resources linked and catalogued on Lanfrica. Our platform operates as an aggregator that provides links to external sources and repositories. For this reason, copyright ownership remains with the original authors, publishers, or hosting institutions.

We reserve the right to remove any content that is found to violate intellectual property laws or is subject to a valid takedown request. Additionally, users who access and utilize resources from the platform must adhere to the licensing terms set by the original data providers.

H. Contributor Responsibilities

Contributors who share resources with Lanfrica must ensure they have the necessary rights and permissions to share the data on Lanfrica. This includes verifying that the data does not infringe on any third-party intellectual property rights or other rights and complies with applicable laws, including those related to privacy and data protection.

I. Lanfrica Roles and Responsibilities

Lanfrica has several key roles and responsibilities regarding the resources linked and cataloged on its platform, ensuring ethical, legal, and transparent data governance. These include:

  • Maintaining Data Integrity and Accuracy - We are responsible for verifying metadata, ensuring correct attribution, and updating resource links to maintain accuracy and usability.
  • Handling Removal Requests - In case of data contributors requesting removal, Lanfrica must respond in a timely and fair manner.

J. Data Owners Rights

Data owners can decide whether they want their resources linked in Lanfrica. If you wish to have your resource delinked from the platform you can contact us at [email protected]. We will remove your resource from our platform or refrain from linking it.