The Deepfakes Analysis Unit (DAU) examined a video in which the Israeli actress Gal Gadot of “Wonder Woman” fame is apparently sharing investment tips on “The Tonight Show”, an American talk show hosted by Jimmy Fallon. After putting the video through A.I. detection tools, and escalating it to our detection and forensic partners based in California, we were able to assess that the voice of Gadot and Fallon as heard in the video was not theirs, it was synthetic speech produced using generative A.I.
The 50-second video in English was sent to the DAU tipline for assessment. The words “Tesler” adjacent to the letter “T” — an arrangement of fonts in blue — were visible on the bottom right throughout the video. In several parts of the video, there was a mismatch between the lip movements of Ms. Gadot and Mr. Fallon and the corresponding words that were heard. Their delivery sounded static with no intonation, neither could audience laughter or applause be heard in the background; overall the audio sounded synthesised.
The video featured Fallon only for a few seconds in two separate sequences, Gadot’s sequences were longer with some visuals in both of her appearances being identical but patched with separate audio tracks. The other noticeable feature of the audio of Gadot’s is that being a non-native English speaker she has a distinct accent, which was missing from the video.
Given the string of inconsistencies, we undertook a reverse image search using screenshots from the video under investigation, which led us to the original video that had been published from the Youtube channel of The Tonight Show on Oct. 6, 2017. In that video and the manipulated video, the setting, clothes, and the body language of Gadot and Fallon are exactly the same. However, neither was there a trace of the words “Tesler” or the letter “T” presented in a logo-like format in the original video nor was there any similarity in the audio track in the two videos.
To check whether the audio was synthetic or not, we ran the video through a series of A.I. detection tools.
The voice detection tool of Loccus.ai, a company that specialises in artificial intelligence solutions for voice safety, returned results which indicated that the probability of the audio being real was negligible at 0.01 percent, which suggested that there was a very high percentage of synthetic speech in the video.
We also used TrueMedia’s deepfake detector, which overall categorised the video as “highly suspicious”, calling attention to a high probability of A.I. use in the production of this video. In a further breakdown of the analysis, a 100 percent confidence score was given to three sub-categories — “A.I. generated audio detection”, “face manipulation”, and “deepfake face detection” — all indicators of elements of A.I. traced in the video.
HIVE AI’s deepfake video detection tool recognised portions of the video where A.I. manipulation was most apparent. The audio detection tool identified strong indicators of audio manipulation using A.I. throughout the length of the video.
We also ran the video through the audio detection tool of our California-based partner DeepTrust to discern moments in the video that had a high percentage of synthetic audio.
The heat-map above indicates that there is a 59.3 percent probability of A.I.-generated audio in the video; this is an average percentage that the tool computed based on the patterns of generated speech it picked up from the video. The green strips in the heat-map stand for real audio and red for A.I.-generated audio. The red bits could also signify background noise, talking, or commotion like cheering or music in those portions of the audio.
To get another expert to weigh in on the nature of manipulation, we escalated the video to a lab run by the team of Dr. Hany Farid, a professor of computer science at the University of California in Berkeley, who specialises in digital forensics and A.I. detection. As per their analysis, they consider the entire audio track in the video to be A.I.-generated voices.
They added that it seemed like a case where someone had looped parts of the original video on top of a fake audio track with little attempt at aligning the lip movements to the audio. They also noted that it was very likely that the audio had been produced with the help of ElevenLabs, which specialises in text-to-speech A.I. voice generation.
Based on our findings and expert analysis we can conclude that the conversation between Gadot and Fallon purported through this video was fabricated, the fake audio was produced using A.I.
(Written by Debraj Sarkar and edited by Pamposh Raina.)
Kindly Note: The manipulated audio/video files that we receive on our tipline are not embedded in our assessment reports because we do not intend to contribute to their virality.