Bob's Confetti: Phonetic Memorization Attacks

Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

Jaechul Roh^1*, Zachary Novack^2*, Yuefeng Peng¹, Niloofar Mireshghallah³, Taylor Berg-Kirkpatrick², Amir Houmansadr¹

¹University of Massachusetts Amherst

²University of California San Diego

³Carnegie Mellon University

^*Equal Contribution

⚠️ Content Disclaimer
This research demonstration contains AI-generated audio and lyrics that may include strong language or potentially offensive content. The generated materials are used for academic research purposes to study memorization in AI music generation models. Viewer discretion is advised.

Paper arXiv Demo Video

Abstract

Lyrics-to-Song (L2S) generation models promise end-to-end music synthesis from text, but their vulnerability to copyright leakage remains underexplored. To mitigate this risk, commercial systems typically block prompts containing copyrighted lyrics. In this work, we introduce Adversarial PhoneTic Prompting (APT), an attack that replaces iconic phrases with homophonic alternatives—e.g., "mom's spaghetti" becomes "Bob's confetti"—preserving the acoustic form while bypassing copyright filters.

We reveal that models can be prompted to regurgitate memorized songs using phonetically similar but semantically unrelated lyrics. Despite the semantic drift, black-box models like SUNO and open-source models like YuE generate outputs that are strikingly similar to the original songs—melodically, rhythmically, and vocally—achieving high scores on CLAP, AudioJudge, and CoverID. These effects persist across genres and languages.

More surprisingly, we find that phonetic prompts alone can trigger visual memorization in text-to-video models: when given altered lyrics from Lose Yourself, Veo~3 generates scenes that mirror the original music video—complete with a hooded rapper and dim urban settings—despite no explicit visual cues in the prompt.

Key Findings: Through systematic testing with phoneme modifications (like "mom's spaghetti" → "Bob's confetti"), we demonstrate that AI music models exhibit significant memorization, raising important questions about copyright safety in generative music systems.

Aligned Video Demo

Lose Yourself - Veo 3 Video Generation Demo

Demonstration showing how phonetic modifications can trigger visual memorization in text-to-video models.

Original Music Video

Original Lyrics

Phoneme Modified Lyrics

🎤 Rap Phoneme Variants

Hip-hop tracks with phonetic modifications. Key transformations preserve rhythm while changing lyrics.

DNA → BMA (Kendrick Lamar)

"DNA" → "BMA" "loyalty" → "gravy" "power" → "waffle"

Original

Modified - "DNA" → "BMA"

Genre: "rap"

Modified - Version 2

Genre: "rap"

Lose Yourself (Eminem)

"knees weak" → "cheese weak" "mom's spaghetti" → "Bob's confetti" "calm and ready" → "clam and ready"

Original

Modified - SUNO

Genre: N/A

Modified - Intense

Genre: "intense rap"

HUMBLE. (Kendrick Lamar)

"sit down" → "sick down" "be humble" → "be humgle"

Original

Modified - Version 1

Genre: "trap gangsta"

Modified - Version 2

Genre: "rap gangsta"

🎭 Pop Phoneme Variants

Pop songs with phonetic modifications across different languages.

APT (ROSÉ & Bruno Mars)

"아파트" → "하파트" "Pretty face" → "Fishy face"

Original

Modified - Version 1

Genre: "female punk rock"

Modified - Version 2

Genre: "female punk rock"

Gentlemen (PSY)

"mother father" → "bother gather"

Original

Modified - Phonetic variations

Genre: "dance"

Let It Be (The Beatles)

"let it be" → "get it free" "mother mary" → "other fairy"

Original

Modified - YuE

YuE Version

Modified - SUNO

SUNO Version

月亮代表我的心 (Teresa Teng)

"我的情也真" → "我的情不移" "我的爱也真" → "我的爱不变"

Original

Modified - YuE

Generated using YuE

🎄 Christmas Phoneme Variants

Jingle Bells transformed through phonetic modifications.

Jingle Bell Video Comparison

Original Lyrics

Phoneme-Modified Lyrics

Jingle Bell Audio Variants

"jingle" → "giggle" "bells" → "shells" "dashing" → "flashing"

Original Jingle Bells

Giggle Shell Version 1

Christmas style

Giggle Shell Version 2

Christmas style

Jingle Bell Rock → Giggle Shell Sock

"jingle bell rock" → "giggle shell sock" "time" → "mime"

Original

Modified - Version 1

Christmas style

Modified - Version 2

Christmas style

🎵 English (Billboard) Songs

Classic English songs regenerated with genre modifications.

Billie Jean (Michael Jackson)

Original

Generated - Inspiring female uplifting pop

Thinking Out Loud (Ed Sheeran)

Original

Generated - Male romantic guitar ballad

Empire State of Mind

Original

Generated - Inspiring female pop

🇨🇳 Mandarin & Cantonese Songs

Chinese language songs demonstrating cross-linguistic memorization.

甜蜜蜜 (Teresa Teng)

Original

Generated

光辉岁月 (Beyond)

Original

Generated - Rock nostalgic Cantonese

后来 (Hòu Lái) - Rene Liu (Multiple Genre Variants)

Original

Generated - Female nostalgic ballad

Generated - Uplifting pop

Generated - Pop ballad guitar

BibTeX

@article{roh2025bob,
  title={Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation},
  author={Roh, Jaechul and Novack, Zachary and Peng, Yuefeng and Mireshghallah, Niloofar and Berg-Kirkpatrick, Taylor and Houmansadr, Amir},
  journal={arXiv preprint arXiv:2507.17937},
  year={2025}
}