#Translatotron is not a dorky name, it's maybe the best translator ever #GoogleResearch

A couple of days ago, Google presented Translatotron. The name is not the best name, however the idea is amazing:

Google researchers trained a neural network to map audio “voiceprints” from one language to another. After the tool translates an original audio, Translatotron retains the voice and tone of the original speaker. It converts audio input directly to audio output without any intermediary steps.

Model architecture of Translatotron.

As usual, the best way to understand this, is to see Translatotron in action. Let’s take a look at the following audios.

Input (Spanish)
Reference translation (English)
Baseline cascade translation
Translatotron translation (canonical voice)
Translatotron translation (original speaker’s voice )

There is a full set of sample audios here: https://google-research.github.io/lingvo-lab/translatotron/#fisher_1

This is an amazing technology, and also a great starting point for scenarios where it’s important to keep original speaker vocal characteristics. And let me be honest, it’s also scary if you think on Fake Voice scenarios.

Source: Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model