Cookies are used on the Lanfrica website to ensure you get the best experience.
This paper describes the CMU Wilderness Multilingual Speech Dataset. A dataset of over 700 different languages providing audio, aligned text and word pronunciations. On average each language provides around 20 hours of sentence-lengthed transcriptions. We describe our multi-pass alignment techniques and evaluate the results by building speech synthesizers on the aligned data. Most of the resulting synthesizers are good enough for deployment and use. The tools to do this work are released as open source, and instructions on how to apply such alignment for novel languages are given.