Joe Biden’s “Magical Pistachio” Video Has A.I. Voice

DAU Secretariat

The Deepfakes Analysis Unit (DAU) analysed a video in which U.S. President Joe Biden appears to be narrating an anecdote about a “magical pistachio”. After putting the video through A.I. detection tools and seeking analysis from our partners and experts, we were able to establish that the video was manipulated using A.I.-generated audio.

The 28-second video in English was sent to the DAU tipline for assessment. A logo visible on the bottom right corner of the video frame resembles that of MSNBC, an American news television channel. Superimposed text next to it runs across the width of the frame reiterating the subject of Mr. Biden’s supposed story, right above that a “breaking news” super can be seen.

We noticed obvious signs of visual and audio inconsistencies in the video. The pace of Biden’s speech seemed much faster and robotic compared to his characteristic style of public delivery marked by pauses between words and sentences. The voice does resemble his natural voice, however, it has a certain synthetic quality to it.

The video is slightly blurry, especially around the face. The movement of his lips appears to align with the words that can be heard, however, his teeth seem to intermittently disappear and reappear as his mouth opens and closes. At the 13-second mark, despite an open mouth, the teeth are not visible.

We undertook a reverse image search using screenshots from the video, which led us to the original video, published on Dec. 23, 2022 on the MSNBC website. The original video shows Biden addressing the public ahead of Christmas. The backdrop, clothing, and the body language of Biden, including hand gestures and head movement, are identical in the original as well as the manipulated video. The camera angle is also the same in both the videos.

We also noticed that the original video and the manipulated video featured a live clock on the top left corner of the video frame. On comparing the time codes for the digital clock in both the videos, we were able to establish that it was highly likely that the initial 28-second segment from the original video was clipped and used to create the fabricated video.

The reverse image search also led us to this video posted from a YouTube channel; it is identical to the video we received on the tipline. While the caption under the video says, “Not real. Made with AI”, we could not establish if the account that posted the video also created the clip. A Google keyword search helped us trace the transcript of Biden’s televised address, published on the White House website. The transcript does not make any reference to the supposed “magical pistachio” story.

We wanted to see if A.I. had been used to fabricate the video, so we put the video through A.I. detection tools.

The voice detection tool of Loccus.ai, a company that specialises in artificial intelligence solutions for voice safety, returned results which indicated that the probability of the audio being real was negligible at 0.03 percent, indicating the use of an A.I.-generated audio track in the video.

*Screenshot of the analysis from Loccus.ai’s audio detection tool*

‍

Hive AI’s deepfake video detection tool marked “yes deepfake” on Biden’s face throughout the video, pointing to a high likelihood of the use of A.I. to manipulate the video. Their audio tool too indicated very strong signs of A.I. tampering in the audio featured in the video.

*Screenshot of the analysis from Hive AI’s deepfake video detection tool*

‍

We also ran the video through TrueMedia’s deepfake detector which suggested substantial evidence of manipulation in the video. In a further breakdown of the analysis, it gave a 100 percent confidence score to “face manipulation” and 86 percent confidence score to “generative convolutional vision transformer”, both of which are subcategories that point to the use of A.I. to fabricate faces featured in a video, in this case it’s Biden’s face.

The tool also gave a 100 percent confidence score to the subcategories of “audio analysis” and “A.I. generated audio detection”, both highlight a very strong possibility of the use of an A.I.-generated audio track in the video.

*Screenshot of the overall analysis from TrueMedia’s deepfake detection tool*

*Screenshot of the audio and video analysis from TrueMedia’s deepfake detection tool*

‍

We wanted to get a further analysis on the audio track, so we put it through the A.I. speech classifier of ElevenLabs, a company specialising in voice A.I. research and deployment. It returned results indicating a 98 percent probability that the audio track used in the video was generated using their software.

*Screenshot of the analysis from A.I. speech classifier of ElevenLabs*

‍

We reached out to ElevenLabs to get their analysis. They confirmed to the DAU that the audio is synthetic, implying that A.I. was used to generate the audio track. They noted that the user who broke their “terms of use” while generating the synthetic audio using their software has been identified as a bad actor through their in-house automated moderation system.

They added that they have banned the user from using any of their tools in the future. However, they mentioned that the audio track was generated before they had introduced additional safeguards against impersonations, including “no-go voices”, which helps them detect and prevent the creation of content with voices that are deemed especially high-risk.

Given the unnatural movement of Biden’s lips while enunciating, we wanted to seek an expert view on whether it is a case of a lip-sync deepfake. We escalated the video to our partners at RIT’s DeFake Project. Saniat Sohrawardi from the project told the DAU that, given that there is a real video which has the same visuals as the doctored video and the head doesn’t move differently, it would have to be a lip-sync deepfake.

Mr. Sohrawardi said that the most accessible option for creating a video like this would have been the wav2lip code repository, which would be especially good with Biden’s face. He was referring to a speech-to-lip generation code repository, accessible online, that helps in creating highly accurate lip-sync videos using pre-trained models, essentially aiding in creation of lip-sync deepfakes.

On the basis of our findings and analyses from experts, we can conclude that the words being attributed to Biden were not uttered by him and that a synthetic audio track was used over original visuals to fabricate the video.

‍

‍(Written by Debopriya Bhattacharya, Areeba Falak with inputs from Debraj Sarkar, and edited by Pamposh Raina.)

‍

Kindly Note: The manipulated audio/video files that we receive on our tipline are not embedded in our assessment reports because we do not intend to contribute to their virality.

‍

You can read below the fact-checks related to this piece published by our partners:

‍Viral Video Of US President Joe Biden Talking About Magical Pistachio Is Actually Deepfake

‍Deepfake Video of Joe Biden Talking About Pistachios Goes Viral as Real

‍