Adobe’s Project VoCo, two years in the making, is designed to make audio editing “really easy for the average person” according to Zeyu Jin, an audio researcher and intern at Adobe’s Creative Technologies Lab, according to Mental Floss.
The program debuted as one of 11 experimental projects at Adobe Sneaks, an event where the company shows off new technology “that doesn’t have a place in a product yet—or may never,” as Adobe Senior Research Scientist Stephen DiVerdi explains it.
Project VoCo just needs an audio sample and a transcript of the recording, then you can edit the transcript and let the program handle the audio, instead of cropping and stitching together the recording yourself. If you need to edit out curses or misspoken words, it’s just a matter of searching the text of the transcript.
More impressively, the program can analyze a person’s voice and create new speech that sounds just like them, by cobbling together syllables and sounds the person used in the initial recording.
It doesn’t take much data for the program to be able to synthesize someone’s voice—it can do it with 10 minutes of audio, though for a really good mimic, 30 minutes is better.
In the ideal use case, you could fire up this program to fix speeches or podcasts or voice-overs where there was a mistake in the initial recording, and you need to re-record. Since audio is so sensitive, changes in the sound of the room or in the person’s voice (say, if they have a cold) make it next to impossible to re-record just a segment of the audio clip in question—to make it sound really good, you need to re-record the whole thing. Here, you can make corrections that sound seamless. That said, the ability to create audio featuring someone’s voice saying words that never came out of their mouth is ripe for serious misuse. But the Adobe researchers say that it’s not unlike the ability to Photoshop misleading images, like the fake viral images that circulate on the web.
Still, Jin says they “are looking for a technological solution to prevent misuse. We are investigating deep learning detectors to find the edited part [of the audio]” and create some sort of watermark for it.