Is Project VoCo a Loco Idea?
If you thought Autotune signalled the death of music, you probably want to divert your eyes. Adobe, the maker of Photoshop (aka the Autotune of photography), has teamed up with Princeton University on Project VoCo. Adobe developer Zeyu Jin debuted the technology at Adobe’s Max conference during the Sneak Peeks segment. He performed a simple yet effective trick on co-host Jordan Peele (one half of comedic duo Key & Peele) by transforming a snippet of conversation between he and Michael Key. While Jin said the program needs about 20 minutes of speech on which to learn a voice, he focused on one particular line during the demo about Key kissing his dogs and wife.
He dismantled the sentence with surgical precision — literally cutting the phrase with a scissor tool. Firstly, he reordered the words, so Key kissed his wife before his dogs. Okay, nice trick, considering he didn’t actually copy and paste the audio, he just typed in the words. But then came the real shocker, Jin substituted ‘wife’ with ‘Jordan’. Scandalous! For two reasons. Key & Peele aren’t romantic, and putting words in someone’s mouth has obvious ethical ramifications. Also, as far as we know — there’s not much information about VoCo — Jin’s program synthesised the word ‘Jordan’ based on 20 minutes of machine learning, in perfect tone, meter and pitch. It didn’t suddenly drop into a robotic voice mid-sentence, the technology was virtually undetectable.
It’s early days for the technology. Jin’s version looked like it was built using the GUI from Photoshop version one, but it performed the task. Adobe pitched it as a tool for editing voiceover content. For instance, they say ‘perfect’ and you want to switch it to something a little less generic, like ‘flawless’, but who knows where it might end up.
Voice manipulation technology has come a long way since the early days of Autotune. Celemony’s Melodyne went for a deep dive into polyphonic correction; Synchro Arts’ Revoice Pro is a godsend for matching dubbed vocals with pre-recorded ones; there are stompboxes that let you create multiple harmonies out of one vocal take in realtime; and plug-ins like Output’s Exhale let you create vocals from scratch in a way that goes well beyond a typical choir patch. Project VoCo pushes the technology of voice manipulation even further. Just as technology has put drums, ethnic instruments, symphonic orchestras and unattainable hardware at our fingertips, who knows, maybe we’ll be synthesising perfectly in tune and in time lead singers before too long. And they’ll say exactly what we tell them too.
RESPONSES