Meta launched a new speech-to-text model that can translate almost 100 languages called SeamlessM4T, as the company continues to try to make a universal translator.
SeamlessM4T, which stands for Massively Multilingual and Multimodal Machine Translation, which the company says can translate speech-to-text and text-to-text in almost 100 languages. For speech-to-speech and text-to-speech actions, it recognizes 100 input languages and converts them to 35 output languages.
It is published under a Creative Commons CC BY-NC 4.0 license, allowing researchers to iterate on it.
Along with SeamlessM4T, Meta has also released the metadata for its SeamlessAlign open translation dataset.
“Build a universal language translator, like the fictional Babel Fish in The Hitchhiker’s Guide to the Galaxyit’s challenging because existing speech-to-speech and speech-to-text systems only cover a small fraction of the world’s languages,” Meta said.
He Hitchhiker’s Guide Babel Fish, as envisioned by author Douglas Adams, is a fish that you can place in your ear to instantly understand any language. if you are a doctor who fan, you could compare Meta’s tool to a translation matrix in the TARDIS that converts even strange words into English.
Meta said that SeamlessM4T represents “a significant advance” because this new model performs the entire translation task in one go, unlike other large translation models that split translation across different systems.
One of the interesting features of SeamlessM4T, if it can work properly, is its supposed ability to recognize when a speaker is code-switching or when someone moves between two or more languages in a sentence. For example, Meta demonstrated in a video that the model immediately differentiates between Hindi, Telugu and English. I haven’t tested the model, but I frequently switch code between my two native languages (Filipino and English), as do most people who speak different languages, and from personal experience, it’s not something most software does. AI voice recognition detect. quickly.
SeamlessM4T is based on previous Meta translation models. Last year, Meta launched its No Language Left Behind text-to-text machine translation model, which supported 200 languages. He developed SpeechMatrix, a dataset for multilingual speech-to-speech translation, and Massively Multilingual Speech for speech recognition. Meta demonstrated its universal voice translator last year, converting spoken Hokkien, a widely used language in China that has no official writing system, into English.
Language translation is important to companies like Meta, which employ thousands of people to moderate a deluge of Facebook and Instagram posts in different languages. Quite often, non-core languages have smaller teams and end up relying on automatic moderation that works poorly with those languages. AI, if given access to a dataset of these smaller languages, can be a tool for companies like Meta to improve moderation.
To build SeamlessM4T, Meta said it redesigned its Fairseq sequence modeling toolkit to create lighter models and handle more data.
While developing SeamlessM4T, Meta said he built a system that identifies toxic or sensitive words. Meta defines toxic words as cases where the “translation may incite hatred, violence, profanity, or abuse.” The goal is to be able to detect when the output translation has toxicity that was not present in the original material.
“We filtered out imbalanced toxicity in the training data. If the input or the output contained different amounts of toxicity, we removed that training sequence,” Meta said.
The researchers also tried to clean up data sets that mistranslated some profanity so that it more accurately detects when it is being used.
Meta says it also recognizes gender bias in languages, saying the model can quantify gender bias in translations. SeamlessM4T can check if the sentence used a gender form of a word, say doctor in Spanish, and assign a feminine pronoun in a target language without an equivalent gender grammar if necessary. Taking a similar approach to toxicity, Meta said that SeamlessM4T counts how many times a translation adds gendered words to terms that were not gender-specific in the original language, i.e., it automatically assumes that the doctor is male when there is no gender distinction in the original language. the English language.
Meta has been releasing many of its AI models to developers and researchers more or less openly. He recently released AudioCraft, a code that enables text-to-sound generation. Meta also provided access to its Llama 2 large language model.