Hi !
A couple of days ago, Google presented Translatotron. The name is not the best name, however the idea is amazing:
Google researchers trained a neural network to map audio “voiceprints” from one language to another. After the tool translates an original audio, Translatotron retains the voice and tone of the original speaker. It converts audio input directly to audio output without any intermediary steps.

As usual, the best way to understand this, is to see Translatotron in action. Let’s take a look at the following audios.
There is a full set of sample audios here: https://google-research.github.io/lingvo-lab/translatotron/#fisher_1
This is an amazing technology, and also a great starting point for scenarios where it’s important to keep original speaker vocal characteristics. And let me be honest, it’s also scary if you think on Fake Voice scenarios.
Happy coding!
Greetings @ Toronto
El Bruno
Source: Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model