Meta’s open-source speech AI recognizes over 4,000 spoken languages

May 22, 2023

158

Meta has created an AI language model that (in a refreshing change of pace) isn’t a ChatGPT clone. The company’s Massively Multilingual Speech (MMS) project can recognize over 4,000 spoken languages and produce speech (text-to-speech) in over 1,100. Like most of its other publicly announced AI projects, Meta is open-sourcing MMS today to help preserve language diversity and encourage researchers to build on its foundation. “Today, we are publicly sharing our models and code so that others in the research community can build upon our work,” the company wrote. “Through this work, we hope to make a small contribution to preserve the incredible language diversity of the world.”

Speech recognition and text-to-speech models typically require training on thousands of hours of audio with accompanying transcription labels. (Labels are crucial to machine learning, allowing the algorithms to correctly categorize and “understand” the data.) But for languages that aren’t widely used in industrialized nations — many of which are in danger of disappearing in the coming decades — “this data simply does not exist,” as Meta puts it.

Meta used an unconventional approach to collecting audio data: tapping into audio recordings of translated religious texts. “We turned to religious texts, such as the Bible, that have been translated in many different languages and whose translations have been widely studied for text-based language translation research,” the company said. “These translations have publicly available audio recordings of people reading these texts in different languages.” Incorporating the unlabeled recordings of the Bible and similar texts, Meta’s researchers increased the model’s available languages to over 4,000.

If you’re like me, that approach may raise your eyebrows at first glance, as it sounds like a recipe for an AI model heavily biased toward Christian worldviews. But Meta says that isn’t the case. “While the content of the audio recordings is religious, our analysis shows that this does not bias the model to produce more religious language,” Meta wrote. “We believe this is because we use a connectionist temporal classification (CTC) approach, which is far more constrained compared with large language models (LLMs) or sequence-to-sequence models for speech recognition.” Furthermore, despite most of the religious recordings being read by male speakers, that didn’t introduce a male bias either — performing equally well in female and male voices.

After training an alignment model to make the data more usable, Meta used wav2vec 2.0, the company’s “self-supervised speech representation learning” model, which can train on unlabeled data. Combining unconventional data sources and a self-supervised speech model led to impressive outcomes. “Our results show that the Massively Multilingual Speech models perform well compared with existing models and cover 10 times as many languages.” Specifically, Meta compared MMS to OpenAI’s Whisper, and it exceeded expectations. “We found that models trained on the Massively Multilingual Speech data achieve half the word error rate, but Massively Multilingual Speech covers 11 times more languages.”

Meta cautions that its new models aren’t perfect. “For example, there is some risk that the speech-to-text model may mistranscribe select words or phrases,” the company wrote. “Depending on the output, this could result in offensive and/or inaccurate language. We continue to believe that collaboration across the AI community is critical to the responsible development of AI technologies.”

Now that Meta has released MMS for open-source research, it hopes it can reverse the trend of technology dwindling the world’s languages to the 100 or fewer most often supported by Big Tech. It sees a world where assistive technology, TTS and even VR / AR tech allow everyone to speak and learn in their native tongues. It said, “We envision a world where technology has the opposite effect, encouraging people to keep their languages alive since they can access information and use technology by speaking in their preferred language.”

This story originally appeared on Engadget

Meta’s open-source speech AI recognizes over 4,000 spoken languages

LinkedIn is developing in-app games to further distract you from your job hunt

I’m here for the hoverboard

Apple can’t get out of facing a class-action lawsuit over AirTags stalking claims

Most Popular

Electric Transmission Buildout Could Cost Americans Trillions of Dollars | The Gateway Pundit

positive interest rates By Reuters

Exploring Omega’s Constellation Meteorite Collection

Khris Middleton sparks Bucks past Suns after 16-game absence

Recent Comments

WORLD NEWS

Israel launches night raid on Gaza’s al-Shifa hospital

Putin poised to rule for another six years after re-election in Russia

North Korea fires ballistic missile as top US diplomat visits Seoul

TRENDING NEWS

Judy Garland ‘Wizard of Oz’ Ruby Slippers Theft: Second Man Charged

Justin Timberlake’s ‘Everything I Thought It Was’ Voted Best New Music

North West Gives First Interview on ‘Elementary School Dropout’ Album

POPULAR CATEGORY

ABOUT US

FOLLOW US