Cookies are used on the Lanfrica website to ensure you get the best experience.
Some words, like “the” or “and” in English, are used a lot in speech and writing. For most Natural Language Processing applications, you will want to remove these very frequent words. This is usually done using a list of “stopwords” which has been complied by hand. This project uses the source texts provided by the African Storybook Project as a corpus and provides a number of tools to extract frequency lists and lists of stopwords from this corpus for the 60+ languages covered by ASP.