Advertisements
Google has announced that their new translation tool will translate one language into another without the intermediate text-based process. The first of its kind is able to do this while retaining the voice of the original speaker and the & # 39; more realistic & # 39; sounds

Google & # 39; s Translate can now listen to a language and turn it into an audio translation into the voice of the original speaker

  • The tool can convert language without the need for a text-based process
  • It also retains the original voice of the person in the audio clip of the new language
  • Currently, the Google Translate system uses automatic speech recognition
  • This transcribes speech that is then converted to language
  • Now it can translate speech from one language directly into speech in another language, without relying on textual representation in both languages
Advertisements

Google has announced a new translation tool that converts one language into another and retains the speaker's original voice.

The new tech giant's system works without the need to convert it to text earlier.

First-of-its-kind, the tool is able to do this while retaining the original speaker's voice and making it & # 39; more realistic & # 39; sounds, says the tech giant.

Google claims that the system, called & # 39; Translatotron & # 39 ;, will be able to retain the original speaker's voice after translation and at the same time understand words better.

Advertisements

Scroll down for video

Google has announced that their new translation tool will translate one language into another without the intermediate text-based process. The first of its kind is able to do this while retaining the voice of the original speaker and the & # 39; more realistic & # 39; sounds

Google has announced that their new translation tool will translate one language into another without the intermediate text-based process. The first of its kind is able to do this while retaining the voice of the original speaker and the & # 39; more realistic & # 39; sounds

It can translate speech from one language directly into speech in another language, without relying on the intermediate text display in both languages, as required in cascade systems.

& # 39; Translatotron & # 39; is the first end-to-end model that translates speech from one language directly into another, & # 39; wrote Google in a blog post.

The Google Translate system currently uses three phases.

Automatic speech recognition, which transcribes speech as text; machine translation, which translates this text into another language; and text-to-speech synthesis, which uses this text to generate speech.

Advertisements

The tech giant now says it will use a single model without the need for text.

This system avoids dividing the task into separate phases, & # 39; said the blog post from Google AI software engineers Ye Jia and Ron Weiss.

According to the company, this means a faster translation speed and fewer errors.

The system retains the speaker's voice by using spectrograms, a visual representation of the sound waves, as input.

HOW DOES TRANSLATOTRON WORK?

Translatotron is based on a sequence-to-sequence network that takes source spectrograms, a visual representation of the sound waves, as input and generates spectrograms of the translated content in the target language.

Advertisements

It also uses two other separately trained components: a neural vocoder that converts output spectrograms to waveforms.

Optionally a speaker encoder that can be used to preserve the character of the source speaker voice in the synthesized translated speech.

During the training, the sequence-to-sequence model uses a multitask target to predict source and target transcripts at the same time as target spectrogram generation.

However, no transcripts or other intermediate text representations are used during the inference.

The system retains the speaker's voice by using spectrograms, a visual representation of the sound waves, as input. It then generates these spectrograms, also relying on a neural vocoder and a speaker encoder, which means that the speaker's voice remains the same after translation

The system retains the speaker's voice by using spectrograms, a visual representation of the sound waves, as input. It then generates these spectrograms, also relying on a neural vocoder and a speaker encoder, which means that the speaker's voice remains the same after translation

Advertisements

The system retains the speaker's voice by using spectrograms, a visual representation of the sound waves, as input. It then generates these spectrograms, also relying on a neural vocoder and a speaker encoder, which means that the speaker's voice remains the same after translation

It then generates these spectrograms, also relying on a neural vocoder and a speaker encoder, which means that the vocal characteristics of the speaker remain the same after translation.

Google admitted that the system must be refined by further training the algorithm.

Sound clips that were published in the post were & # 39; more realistic & # 39; then a machine call, but still unmistakably generated by the computer.

Advertisements

. [TagsToTranslate] Dailymail