Google & # 39; s Translate can now listen to a language and turn it into an audio translation into the voice of the original speaker
- The tool can convert language without the need for a text-based process
- It also retains the original voice of the person in the audio clip of the new language
- Currently, the Google Translate system uses automatic speech recognition
- This transcribes speech that is then converted to language
- Now it can translate speech from one language directly into speech in another language, without relying on textual representation in both languages
Google has announced a new translation tool that converts one language into another and retains the speaker's original voice.
The new tech giant's system works without the need to convert it to text earlier.
First-of-its-kind, the tool is able to do this while retaining the original speaker's voice and making it & # 39; more realistic & # 39; sounds, says the tech giant.
Google claims that the system, called & # 39; Translatotron & # 39 ;, will be able to retain the original speaker's voice after translation and at the same time understand words better.
Scroll down for video
Google has announced that their new translation tool will translate one language into another without the intermediate text-based process. The first of its kind is able to do this while retaining the voice of the original speaker and the & # 39; more realistic & # 39; sounds
It can translate speech from one language directly into speech in another language, without relying on the intermediate text display in both languages, as required in cascade systems.
& # 39; Translatotron & # 39; is the first end-to-end model that translates speech from one language directly into another, & # 39; wrote Google in a blog post.
The Google Translate system currently uses three phases.
Automatic speech recognition, which transcribes speech as text; machine translation, which translates this text into another language; and text-to-speech synthesis, which uses this text to generate speech.
The tech giant now says it will use a single model without the need for text.
This system avoids dividing the task into separate phases, & # 39; said the blog post from Google AI software engineers Ye Jia and Ron Weiss.
According to the company, this means a faster translation speed and fewer errors.
The system retains the speaker's voice by using spectrograms, a visual representation of the sound waves, as input.
HOW DOES TRANSLATOTRON WORK?
Translatotron is based on a sequence-to-sequence network that takes source spectrograms, a visual representation of the sound waves, as input and generates spectrograms of the translated content in the target language.
It also uses two other separately trained components: a neural vocoder that converts output spectrograms to waveforms.
Optionally a speaker encoder that can be used to preserve the character of the source speaker voice in the synthesized translated speech.
During the training, the sequence-to-sequence model uses a multitask target to predict source and target transcripts at the same time as target spectrogram generation.
However, no transcripts or other intermediate text representations are used during the inference.
The system retains the speaker's voice by using spectrograms, a visual representation of the sound waves, as input. It then generates these spectrograms, also relying on a neural vocoder and a speaker encoder, which means that the speaker's voice remains the same after translation
It then generates these spectrograms, also relying on a neural vocoder and a speaker encoder, which means that the vocal characteristics of the speaker remain the same after translation.
Google admitted that the system must be refined by further training the algorithm.
Sound clips that were published in the post were & # 39; more realistic & # 39; then a machine call, but still unmistakably generated by the computer.
. [TagsToTranslate] Dailymail