A.I. Voice Used to Create Fake “Bitcoin” Audio of Supriya Sule

Updated on Nov. 23, 2024: The Deepfakes Analysis Unit updated the report with additional expert inputs.

‍

The Deepfakes Analysis Unit (DAU) analysed four audio recordings that were apparently linked to Supriya Sule, Member of Parliament from Maharashtra, leader of the Nationalist Congress Party (Sharadchandra Pawar) and Nana Patole, leader of the Indian National Congress (INC) also from Maharashtra. The audios allude to cryptocurrency and cash transactions. After running the audios through A.I. detection tools and seeking expert analysis we were able to conclude that three of the four audios were synthetic. The duration of the fourth was too short for a conclusive analysis.

The supposed audio recordings were escalated to the DAU by multiple fact-checking partners as they went viral on Nov. 20, which was voting day for the state assembly election in Maharashtra. The results are expected on Nov. 23 as per the Election Commission of India. The NCP-SP and INC are part of a coalition contesting the election against the Bharatiya Janata Party (BJP) and its allies, who are running the state government.

These four audios have been circulating on multiple social media platforms including X and Facebook. In one of those — a 33-second audio in English — a female voice can be heard which sounds like Ms. Sule’s. Though there is no mention of her name anywhere in that audio, the accompanying text and in some cases a still photograph of hers has been used with it. She has refuted the claims made in the audio through a tweet posted from her verified account on X.

The sound levels in that recording are not consistent, they drop and pick up at various points. Overall, the audio sounds hastened and scripted without any natural breathing sound and it lacks the intonation and pitch that can be heard in her recorded speeches and interviews. The female voice makes a reference to some “Gaurav” — a common Indian male name — without mentioning a last name with it. There’s also a mention of some “Gupta” — a common Indian last name — without an associated first name, however, a male pronoun is used with that name.

In a second audio, a purported conversation between Mr. Patole and another man, only one male voice can be heard asking someone named “Amitabh” — a popular Indian male name — about cash; there’s no mention of a last name. This audio carries barely five-seconds of speech, which has a strange accent and sounds like a mix of Hindi and Marathi. Here too, the supposed identifier for the INC leader’s voice is a still image of his which has been used with the audio. Graphics with the audio suggest that the audio is a conversation between Patole and someone named Amitabh Gupta, who is apparently from the police.

In a tweet from his verified handle on X, Patole, a member of Maharashtra’s Legislative Assembly, has stated that it is not his voice that can be heard in the audio clip.

The other two audios which have been linked to the supposed audios of the politicians are recordings of purported conversations in English between two men, one of whom is alleged to be a police officer and another one is an employee at some company. No conversation can be heard in either of these audios; instead in each audio a distinct male voice talks about cash and “bitcoin” transactions and some names are mentioned but neither Sule’s nor Patole’s name finds a mention.

The duration of one recording is 16-seconds and the other one is 42-seconds. The sound level in the longer audio is not consistent. The accents in both these recordings sound odd, the speech is fast, scripted, and lacks pauses as well as human breathing sounds. There are graphics along with the audios that identify these voices as those of Amitabh Gupta and Gaurav Mehta.

In all the four audios there is a lack of background noise among other oddities mentioned above. To discern the extent of A.I. manipulation in the audio files we ran them through A.I. detection tools.

The voice tool of Hiya, a company that specialises in artificial intelligence solutions for voice safety, indicated that there is a 97 percent probability that an A.I.-generated voice was used in the audio being attributed to Sule.

*Screenshot of the analysis from Hiya’s audio detection tool for Sule’s supposed audio*

For Patole’s supposed audio track, the tool returned results indicating that there is a 71 percent probability of the audio track being real.

*Screenshot of the analysis from Hiya’s audio detection tool for Patole’s supposed audio*

Hive AI’s audio detection tool indicated that the audio file supposedly carrying Sule’s voice was manipulated with A.I. The tool, however, did not point to any A.I. manipulation in Patole’s supposed audio track.

The deepfake detector of our partner TrueMedia suggested substantial evidence of manipulation in Sule’s supposed audio track but was uncertain of the level of manipulation in Patole’s case.

Breaking up the overall analysis for the audio associated with Sule, the tool gave a 100 percent confidence score to the “A.I.-generated audio detector” subcategory, 97 percent to “audio authenticity detector”, and 95 percent score to “voice anti-spoofing analysis” subcategories. The “voice biometric and voiceprinting analysis” subcategory received a 72 percent score. All the subcategories analyse audio for evidence that it was created by an A.I.- audio generator or by cloning.

*Screenshot of the analysis from TrueMedia’s deepfake detection tool for Sule’s supposed audio*

For Patole’s supposed audio track, the tool gave an overall result of “uncertain”. The subcategory of “voice anti-spoofing analysis” received a 99 percent confidence score; “voice biometric and voiceprinting analysis” and “audio authenticity detector” received confidence scores of 30 and 29 percent respectively. The confidence score for the “A.I.-generated audio detector” subcategory was even lower at 6 percent.

*Screenshot of the analysis from TrueMedia’s deepfake detection tool for Patole’s supposed audio*

We also ran Sule’s and Patole’s audio files through DeepFake-O-Meter, an open platform developed by Media Forensics Lab (MDFL) at UB for deepfake image, video, and audio detection. The tool gives an option of various detectors through which a media file, in this case audio, can be run to receive analysis.

AASIST (2021) and RawNet2 (2021) focus on detecting audio impersonations, voice clones, replay attacks, and other types of audio spoofs. Linear Frequency Cepstral Coefficient (LFCC)-Light Convolutional Neural Network (LCNN) model classifies genuine versus synthetic speech to detect audio deepfakes. RawNet3 (2023) allows for nuanced detection of synthetic audio and RawNet2-Vocoder (2023) is useful in identifying synthesised speech.

RawNet2 (2021), LFCC-LCNN, RawNet3 (2023), and RawNet2-Vocoder work well with analysing single speaker audio tracks, while AASIST (2021) is not limited to single-speaker analysis. We ran the audios being associated with Sule and Patole through these detectors.

For the audio track being attributed to Sule all the detectors gave a high probability of it being A.I.-generated.

*Screenshot of the analysis from Deepfake-O-Meter’s audio detectors for Sule’s supposed audio*

For Patole’s supposed audio only three of the five detectors indicated a high probability of it being A.I.-generated.

*Screenshot of the analysis from Deepfake-O-Meter’s audio detectors for Patole’s supposed audio*

We also ran the other two audios of the supposed exchange between Mehta and Gupta through Hiya, Hive AI, and TrueMedia, and got the following results.

For the purported 16-second conversation between the two, the audio detectors of both Hiya and Hive AI returned results indicating that the audio track was A.I.-generated. TrueMedia’s audio detector gave a 100 percent confidence score to the subcategory of “voice anti-spoofing analysis”, 96 percent confidence score to “A.I.-generated audio detector”, and 86 percent confidence score to “audio authenticity detector”.

For the purported 42-second conversation, the audio detectors of Hiya and Hive AI returned results indicating that the audio is synthetic in nature. TrueMedia gave a 100 percent confidence score to subcategories of “voice anti-spoofing analysis” and “A.I.-generated audio detector”. The tool also gave a 96 percent confidence score to “audio authenticity detector”, and an 83 percent confidence score to “voice biometric and voiceprinting analysis”.

To further analyse the audios we put all four of them through the A.I. speech classifier of ElevenLabs, a company specialising in voice A.I. research and deployment. For each of the recordings, the classifier returned results indicating that it is “very unlikely” that these audio tracks were generated using the ElevenLabs software.

For expert analysis, we shared the four audio files with our detection partner ConTrailsAI, a Bangalore-based startup with its own A.I. tools for detection of audio and video spoofs.

They noted that their audio spoof detection analysis on the audio supposedly carrying Sule’s voice indicated A.I.-generation or manipulation with high confidence. They told us that the “Gaurav” and “Gupta” pronunciation is a clear giveaway of A.I.-generation. They further added that the generation technique is mostly Retrieval Voice Conversion (RVC) based A.I. cloning.

*Screenshot of the analysis from ConTrails AI for Sule’s supposed audio*

For the audio being attributed to Patole they used two audiospoof detection models and both returned results indicating that the audio is real. They said that their “V2 model” is specifically fine-tuned on Hindi and Marathi and that gave a 100 percent confidence score. They added that the words “maza masti” heard in the audio sound authentic among other strong signals such as tone and intonation. However, despite that they underscored that because it’s a very short audio — barely six seconds — they cannot say with conviction that it is real.

The two other audio clips, purportedly of Mehta and Gupta talking were assessed as A.I.-generated by them.

To get another expert to weigh in on the audios, we reached out to our partner Validia, a San Francisco-based deepfake cybersecurity service. They use their proprietary software to check the authenticity of a voice by drawing a comparison between a person’s real voice and generated voice.

We escalated to them the audios being associated with Sule and Patole. Since there has been much speculation about the identities of the two other men, we did not want a comparative voice analysis to be based on incorrect voice samples.

To analyse the purported Sule audio, they first retrieved a clean sample from the file we escalated to them by removing any background audio. Following which they generated a heat-map to compare the retrieved audio with a real voice sample of Sule’s.

*Screenshot of the heat-map analysis for the supposed Sule audio from Validia*

The team at Validia stated that their heat-map analysis revealed significant similarities between the two audio samples, indicating that it was definitely an attempt at deepfaking her voice. They added that the small areas of differences in the map are likely due to the language difference. The sample they used for comparison was taken from a speech of Sule’s in Marathi; the audio escalated was in English.

Based on the structure and flow of the fake audio, as well as the heat-map analysis, they deemed the audio escalated by the DAU to be a moderately well put together deepfake audio sample, and not a voice over.

They concluded that the audio being attributed to Sule has likely used a text-to-speech algorithm, rather than speech-to-speech due to articulation patterns throughout.

After conducting the heat-map analysis for Patole’s supposed audio, the Validia team stated that the similarity scores suggest slight similarity between his real voice sample and the one extracted from the audio file escalated by the DAU. They added that the audio is indicative of a low-quality deepfake or A.I.-generated sample.

They noted that based on the structure of the voice, they believe that the audio being attributed to Patole is A.I.-generated in nature.

To get yet another expert view on the purported Sule and Patole audios, we escalated both files to our partner GetRealLabs, co-founded by Dr. Hany Farid and his team, they specialise in digital forensics and A.I. detection.

The team said that they used a number of techniques to inspect the audio tracks, including automated analysis using multiple models and visual analysis of the spectrogram. They also had the audios analysed by native speakers in their in-house team who listened for signs of A.I.-generation in cadence and intonation.

They noted that the analysis results for the alleged Sule audio suggest it to be likely A.I.-generated. For the supposed Patole audio, they added that the analysis results were inconclusive because the team felt that the track was too short for reliable results.

We would also like to highlight that the tool analysis and the expert analyses so far have not given conclusive results for the audio being attributed to Patole. The short duration of the clip has especially been a challenge.

Based on our review, tool findings as well as expert analyses for the other three audios, we can conclude that they are not authentic.

‍

(Written by Debraj Sarkar and Debopriya Bhattacharya, edited by Pamposh Raina.)

‍

Kindly Note: The manipulated audio/video files that we receive on our tipline are not embedded in our assessment reports because we do not intend to contribute to their virality.

‍

You can read below the fact-checks related to this piece published by our partners:

BJP Posts Fake AI Audio Clips of Sule, Patole, Alleges Poll Fraud

Maharashtra Polls: Here’s The Truth Behind Viral Audio Note ‘Exposing’ Supriya Sule’s ‘Involvement in Bitcoin Scam

’BJP Shares AI Audio Clips of Amitabh Gupta, Sule, Patole To Allege 'Poll Fraud'

Supriya Sule and Nana Patole involved in alleged Bitcoin scam to fund Maharashtra polls? Viral audio clips suspected to be A.I.-generated.

As Maharashtra Witnesses Political Row Over ‘Bitcoin Exposé’, Here’s What We Found About The Viral Audio Notes

Fact Check: सुप्रिया सुले का डीपफेक ऑडियो इस्तेमाल कर बिटकॉइन की ‘हेरा-फेरी’ के जरिए महाराष्ट्र चुनाव की फंडिंग का किया गया दावा

ఈ ఆడియో క్లిప్‌లలో ఉన్న సుప్రియా సూలే, నానా పటోలేల వాయిస్ AI ద్వారా రూపొంచించబడింది (Telugu)

Audio clips of Sule & Patole shared by BJP likely AI-generated, evidence shows

Supriya Sule, Nana Patole viral audios: What various AI detection tools tell us

‍