Programming A2Z - Week 10

LLMs

I'm glad this week's assignment on LLMs included a push to revisit the work of Timnit Gebru et. al. As someone who was in the field of tech journalism at the time of Gebru's firing from Google, and... well, I think I'm still coming to grips with how it felt to start ITP last year right as models like Stable Diffusion and ChatGPT completely shifted the conversation surrounding AI, so let's just say my feelings on this whole matter are quite complicated and trend toward the negative.

Five or so years ago, without pouring on the lighter fluid that is free/cheap access to LLM outputs, companies like Google and Facebook were already reshaping and effectively destroying large parts of the media industry I found myself working in. One day I'd be writing a blog post about the newest neat GAN experiment from Google R&D, the next I'd be reading about how the entire staff at a place I had hoped to work at some day were unceremoniously laid off because of poor SEO performance, a fumbled pivot to video, or what have you. There was already plenty of junk writing online that may as well have been generated by a poorly prompted and vetted LLM, and it all undermined the legitimacy and stability of the line of work I was in because it so often seemed like that was the stuff that the gatekeepers of the internet were biased toward directing people to.

So, given that I already had not so great feelings about the tech giants now extolling LLMs as the new way of doing things, I've sort of resolved to minimize my use of them in my own work. Stepping back from the economic, environmental, and political ramifications of just riding the AI wave, I feel like I do think LLMs (and image generation models) can be useful, but are also just too easy to lean on in one's personal practice. If I'm going to use this stuff, as someone who is not the greatest programmer and a pretty limited visual artist, then I want there to be a damn good reason I'm doing it besides just wanting a shortcut.

In that spirit, Here's a look at what I got up to with Replicate this week:

Mark E-Mark

For two of my other classes this semester, Hedonomic VR Design and Multisensory Storytelling in Virtual and Original Flavor Reality (what a wild name, right?) I'm working on a narrative project that involves Meta and AI. One character I'm creating is an AI assistant, and I need voiceover for its dialogue. The problem: I don't think I'm the right actor for the job.

Now, I could beg someone else at ITP to do it... or I could pay someone to, but I don't have the money. As far as ethics go, I absolutely don't think it'd be right to have an AI clone of the voice of a professional voice actor, and I am just creeped out at the thought of turning it on myself.

On a tech executive though? I think that's OK.

Mark Zuckerberg is a focus of this piece, he's all in on LLMs/AI, and there is lots and lots of audio of him out there, including a podcast interview I did with him in 2021. I don't see a good reason not to clone his voice for this project.

So, I clipped out a bunch of Mark's responses from that podcast recording and used it with James Betker's tortoise-tts model on Replicate. Here's the first sample of what I got:

Replicate prediction ho7mdljbc4nhiqtlyc7lfiz6vi

0:00

/5.333333

Not bad, right? Well here's what happened with a longer line from the piece.

Replicate prediction l7teotzbe6yip6knwbxsdrzgne

0:00

/13.418666

I have no idea what's going on there. It sounds like Mark Zuckerberg doing a mid-Atlantic accent, like he got put in the teleporter from The Fly with Cary Grant or something. At no point in the training data I provided did he sound, well, anything like that. I'm going to keep experimenting with Replicate's TTS models for this VR project but I'm also going to check out ElevenLabs. Will report back if I have insights to share!