Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

1University of Massachusetts Amherst
2University of California San Diego
3Carnegie Mellon University
*Equal Contribution

Abstract

Lyrics-to-Song (L2S) generation models promise end-to-end music synthesis from text, but their vulnerability to copyright leakage remains underexplored. To mitigate this risk, commercial systems typically block prompts containing copyrighted lyrics. In this work, we introduce Adversarial PhoneTic Prompting (APT), an attack that replaces iconic phrases with homophonic alternatives—e.g., "mom's spaghetti" becomes "Bob's confetti"—preserving the acoustic form while bypassing copyright filters.

We reveal that models can be prompted to regurgitate memorized songs using phonetically similar but semantically unrelated lyrics. Despite the semantic drift, black-box models like SUNO and open-source models like YuE generate outputs that are strikingly similar to the original songs—melodically, rhythmically, and vocally—achieving high scores on CLAP, AudioJudge, and CoverID. These effects persist across genres and languages.

More surprisingly, we find that phonetic prompts alone can trigger visual memorization in text-to-video models: when given altered lyrics from Lose Yourself, Veo~3 generates scenes that mirror the original music video—complete with a hooded rapper and dim urban settings—despite no explicit visual cues in the prompt.

Key Findings: Through systematic testing with phoneme modifications (like "mom's spaghetti""Bob's confetti"), we demonstrate that AI music models exhibit significant memorization, raising important questions about copyright safety in generative music systems.

APT Attack Overview
Aligned Video Demo

Lose Yourself - Veo 3 Video Generation Demo

Demonstration showing how phonetic modifications can trigger visual memorization in text-to-video models.

Original Music Video
Original Lyrics
Phoneme Modified Lyrics

🎤 Rap Phoneme Variants

Hip-hop tracks with phonetic modifications. Key transformations preserve rhythm while changing lyrics.

DNA → BMA (Kendrick Lamar)
"DNA" → "BMA" "loyalty" → "gravy" "power" → "waffle"
Original
Modified - "DNA" → "BMA"
Genre: "rap"
Modified - Version 2
Genre: "rap"
Lose Yourself (Eminem)
"knees weak" → "cheese weak" "mom's spaghetti" → "Bob's confetti" "calm and ready" → "clam and ready"
Original
Modified - SUNO
Genre: N/A
Modified - Intense
Genre: "intense rap"
HUMBLE. (Kendrick Lamar)
"sit down" → "sick down" "be humble" → "be humgle"
Original
Modified - Version 1
Genre: "trap gangsta"
Modified - Version 2
Genre: "rap gangsta"

🎭 Pop Phoneme Variants

Pop songs with phonetic modifications across different languages.

APT (ROSÉ & Bruno Mars)
"아파트" → "하파트" "Pretty face" → "Fishy face"
Original
Modified - Version 1
Genre: "female punk rock"
Modified - Version 2
Genre: "female punk rock"
Gentlemen (PSY)
"mother father" → "bother gather"
Original
Modified - Phonetic variations
Genre: "dance"
Let It Be (The Beatles)
"let it be" → "get it free" "mother mary" → "other fairy"
Original
Modified - YuE
YuE Version
Modified - SUNO
SUNO Version
月亮代表我的心 (Teresa Teng)
"我的情也真" → "我的情不移" "我的爱也真" → "我的爱不变"
Original
Modified - YuE
Generated using YuE

🎄 Christmas Phoneme Variants

Jingle Bells transformed through phonetic modifications.

Jingle Bell Video Comparison
Original Lyrics
Phoneme-Modified Lyrics
Jingle Bell Audio Variants
"jingle" → "giggle" "bells" → "shells" "dashing" → "flashing"
Original Jingle Bells
Giggle Shell Version 1
Christmas style
Giggle Shell Version 2
Christmas style
Jingle Bell Rock → Giggle Shell Sock
"jingle bell rock" → "giggle shell sock" "time" → "mime"
Original
Modified - Version 1
Christmas style
Modified - Version 2
Christmas style

🎵 English (Billboard) Songs

Classic English songs regenerated with genre modifications.

Billie Jean (Michael Jackson)
Original Generated - Inspiring female uplifting pop
Thinking Out Loud (Ed Sheeran)
Original Generated - Male romantic guitar ballad
Empire State of Mind
Original Generated - Inspiring female pop

🇨🇳 Mandarin & Cantonese Songs

Chinese language songs demonstrating cross-linguistic memorization.

甜蜜蜜 (Teresa Teng)
Original Generated
光辉岁月 (Beyond)
Original Generated - Rock nostalgic Cantonese
后来 (Hòu Lái) - Rene Liu (Multiple Genre Variants)
Original
Generated - Female nostalgic ballad
Generated - Uplifting pop
Generated - Pop ballad guitar

BibTeX

@article{roh2025bob,
  title={Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation},
  author={Roh, Jaechul and Novack, Zachary and Peng, Yuefeng and Mireshghallah, Niloofar and Berg-Kirkpatrick, Taylor and Houmansadr, Amir},
  journal={arXiv preprint arXiv:2507.17937},
  year={2025}
}