Fake voices generated by artificial intelligence tools may be the next frontier in scams that could trick companies into forking over cash or fool voters into believing a politician said something he or she didn’t.
Computer-synthesized voices are not new. Anyone familiar with Amazon’s Echo and Google’s Home devices, or Apple’s Siri, already knows the soothing female voice that answers queries.
But that same technology can be adapted for devious means, said Vijay Balasubramaniyan, co-founder and CEO of Pindrop, a technology company that uses machine-learning techniques to identify voice fraud.
Criminals can use publicly available video and audio of top corporate executives to analyze and create a fake voice of a CEO and use that in combination with an email hack to trick the company’s executives into sending money. Or they can apply similar tactics to make politicians appear to say something they never did.
At a brief demonstration during the RSA Conference in San Francisco, Balasubramaniyan logged on to a secure company computer network that held artificial intelligence algorithms able to analyze publicly available YouTube video and audio of major political and business leaders and produce a voice file of a person saying something they had never uttered.
Balasubramaniyan chose President Donald Trump from a drop-down menu and typed in the words “This morning American forces gave North Korea the bloody nose they deserve” into a box and hit enter.
For a minute the machine did an online search of Trump’s public statements on North Korea and other instances where he has talked about military strikes and then crafted an artificial voice sounding eerily like Trump, repeating the words on the screen.
The artificial intelligence technology was taking Trump’s voice from public videos, analyzing it to assess the shape and size of his voice box, including vocal cords, oral and nasal cavity, to reproduce “all the vagaries that Trump uses” when he speaks, Balasubramaniyan said.
The company has tested such artificially generated voice clips among people to see if they can detect a fake and found that the fakes couldn’t be detected in roughly 50 percent of cases, he said.
The technology that produces a fake voice leaves “artifacts,” or digital fingerprints that may escape humans but artificial intelligence technology can spot and flag, Balasubramaniyan said.
Artificially generated videos, known as deepfakes, already exist and have been used to demonstrate how easy it is to produce such videos.
There’s a deepfake video of President Barack Obama from 2018 saying, “President Trump is a total and complete dips---” that was crafted by actor Jordan Peele to demonstrate the powers of artificial intelligence technologies. There’s another deepfake of Facebook CEO Mark Zuckerberg saying, “Whoever controls the data, controls the future.”
An even earlier generation of technology using well-known software like Photoshop can create fake images. The Defense Advanced Research Projects Agency, or DARPA, has been funding research to develop technology that can detect fake videos and images. The Pentagon agency has said it hopes to develop a score indicating whether a video or image is fake so it can be applied automatically to YouTube videos and others to warn viewers of manipulated content.
Social media companies are under increased pressure to curb the rising problem of manipulated content created using current and emerging technologies. Facebook has said it would ban deepfake videos created using artificial intelligence tools but it would not take down less-altered videos or media drawn from real-life events.
YouTube, owned by Google, has said it would not allow deepfakes relating to the 2020 election. It also removed last year an altered video of House Speaker Nancy Pelosi speaking at a public event that appeared to show her slurring her words. Facebook allowed the video to remain on its platform.
Inspired in India
The idea of developing technology to authenticate a voice sprang from an episode when Balasubramaniyan was traveling in India, ordered a suit from a local tailor and tried to pay for it with his credit card. The bank blocked the transaction, and he tried to verify himself over the phone but failed.
In 2010, Balasubramaniyan invented a patented technology called “phoneprinting” that analyzes phone calls to determine whether a caller is a fraudster or genuine. In 2017, the company launched a broader application of the technology deployed by banks and financial institutions in their call centers to analyze the audio, voice and other metadata of a call to determine if the caller is a real customer or a fraudster. An authenticated voice-print, akin to a fingerprint, helps banks quickly handle calls from genuine customers, according to Pindrop.
Pindrop, based in Atlanta, has established a database of known fraudsters who call banks, pretending to be real customers, and shares that information among financial institutions to find and stop such individuals, said Mark Horne, the company’s chief marketing officer.
The company now plans to take the technology and apply it to protecting top executives of major companies from falling victim to voice fraud, Horne said.
“If you’re a CEO of a publicly traded company, you do quarterly earnings calls and have other audio and video that’s available online and your voice is available publicly,” Horne said. “That’s what criminals need. They can create a synthetic voice.”
There’s already a documented case of criminals using artificial intelligence technology to cheat a company.
In March 2019, the CEO of a United Kingdom-based subsidiary of a German company thought he was speaking with his boss, who asked the British executive to send about $244,000 to a Hungarian supplier. The caller said it was urgent and asked to send the money within an hour. It turned out to be a fraud.
British law enforcement authorities investigating the case found the caller used artificial intelligence technologies to mimic the voice of the German CEO. The money sent to Hungary was later transferred to Mexico, The Wall Street Journal reported.
Horne said similar tactics could be used by adversaries as well as political rivals in the 2020 elections to create fake audio of politicians saying things they never did and spreading such clips through social media.
“We do believe there’s a threat coming, especially in the election cycle with so many sound bites out there,” Horne said.