"Roadrunner": A Film About Anthony Bourdain was released in cinemas on Friday. It mostly features footage of the celebrity chef and television host who died in 2018. Morgan Neville, the film's director, said that artificial intelligence technology was used to create a small amount of dialogue.
This has rekindled the debate about voice-cloning technology's future, not only in entertainment but also in politics and in a fast-growing commercial industry dedicated to translating text into human speech.
Andrew Mason, founder and CEO, voice generator Descript wrote in a blog post that "unapproved voice cloning" is a dangerous slope. It won't take long before you find yourself in a world where your subjective judgements about ethicality of specific cases are made.
Prior to this week, the main public debate around these technologies was focused on the creation difficult-to-detect deepfakes using simulation audio and/or visual and their potential fueling of misinformation and political conflict.
Mason, the founder and former leader of Groupon, stated in an interview that Descript had repeatedly refused requests to bring back a voice from anyone, even "people who have lost someone and who are grieving."
He said, "It's just not so much that it's our intention to pass judgement." "We are just saying that you need to draw clear lines about what is OK and what is not."
Anger and discomfort at the Bourdain voice cloning case are a reflection of expectations and issues regarding disclosure and consent, according to Sam Gregory, program director for Witness, a non-profit working on video technology for human rights. He said that it would have been appropriate to obtain consent and disclose the technowizardry at his workplace. Instead, viewers were shocked by the audio falsity and then the apparent dismissal of ethical questions by the director. They expressed their displeasure online.
Gregory stated that the topic touches on fears of death as well as ideas about how people could control our digital likeness, making it possible to say or do anything without having any control over it.
Neville doesn't know what tool he used for Bourdain's voice, but he said that he used the tool to create a few sentences Bourdain had written but never spoke aloud.
Neville stated in writing that AI technology was used with the blessing of Tony's estate and literary agent. It was a modern storytelling method that I used in certain places where it was important to make Tony’s words come alive.
GQ magazine also reported that Neville had received approval from Bourdain’s widow and literary executor. Ottavia Busia, the wife of the chef, tweeted: "I certainly wasn't the one who said Tony would be cool with that."
While tech giants such as Amazon, Google, and Microsoft have dominated text to speech research, Descript is one of many startups that offers voice-cloning software. These applications include voice-cloning software, video games, and podcasting.
These voice cloning firms often have an ethics policy that is clearly displayed on their websites. It explains the rules of use. Nearly a dozen companies were contacted by The Associated Press. Many said that they did not recreate Bourdain’s voice and would not if asked. Others did not respond.
Zohaib Ahmed is the founder and CEO at Resemble AI in Toronto, which sells an AI voice generator service. "When you create a voice clone it requires consent from the voice that it is.
Ahmed stated that he allowed posthumous voice cloning only on rare occasions, such as for academic research and a project with Winston Churchill's voice, which he died in 1965.
Ahmed stated that editing TV commercials by voice actors is a common use. Then, he customized it to the region by adding a local reference. He said that it can also be used to subtitle anime movies or other videos by using a voice actor who speaks one language and then making it speak another language.
He compared it with past innovations in entertainment, such as greenscreen technology and stunt actors.
An AI system can learn from just seconds to minutes of human speech recordings. However, it may take a lot of training to get the machine to produce synthetic speech. Rupal Patel, a professor of Northeastern University, who also runs VocaliD, which focuses on voice-generating chatbots for customer service, stated that even though it is easy to record human speech and teach it how to do so, it might require more.
She said that if you wanted the algorithm to speak like Bourdain, it would need lots of data, perhaps 90 minutes worth. "You are creating an algorithm that can learn to speak like Bourdain."
Neville is a renowned documentarian. He also directed "Won’t You Be my Neighbor?" by Fred Rogers and "20 Feet From Stardom" by Oscar-winning director "20 Feet From Stardom." His latest film was made in 2019, over a year after Bourdain's suicide death in June 2018.
Copyright 2021 The Associated Press All rights reserved. Without permission, this material may not be broadcast, rewritten, or redistributed.