Last year, Jen Moss plotted out a little experiment. She challenged her team to produce two podcasts: one made entirely by humans with full creative freedom, and the other made by AI, with as little human intervention as possible. They would then test the two episodes on a group of unsuspecting listeners – without identifying which was which.
Jen Moss is the chief creative officer at podcast production and marketing agency Jar Audio. Jen acts as a ‘podcasting doula’ for her clients, and she previously worked as a producer for CBC.
Here she tells me about how the AI experiment went – including the surprisingly dodgy host that ChatGPT suggested – and which AI tools we should all be using in our audio storytelling.
Tell me how this experiment started.
As a company we make podcasts with brands and organizations, so we needed to create a scenario that was somewhat realistic for us. We’re often given a brand, and its values, and then we tell a story within that value set or sandbox.
So we set up a framework that this would be a podcast about slow travel for a [fictional] brand that was similar to Airbnb. I gave one team the assignment to use as much AI as they could in the process of making the episode. And I gave another producer the assignment to treat [the episode] in the normal way, using personal storytelling, clips, examples — to essentially make it good and listenable.
I was interested in finding out: is it going to be faster? Is it going to be more efficient? Is it going to be better or worse?
What AI tools did the first team use?
We used a combination of ChatGPT; Midjourney to play around with some visuals; and ElevenLabs to deal with voice cloning.
There actually was a lot of human involvement in the process with ChatGPT, because there needs to be in order to get anything sensible at all out of it. Once we had the topic of slow travel, we asked ChatGPT: who should be a guest on this show? What kinds of questions should we ask those people? Who would you recommend hosting the show?
I had a very weird experience. One of the people that ChatGPT suggested as a host looked really good when I read through ChatGPT’s profile of this person and did a perfunctory search on them. So I called them up, and had a long chat. I said, “listen, ChatGPT has recommended you as the host of this podcast. It’s not a real podcast; we’re doing an experiment. Would you mind loaning us your voice to clone?”
This person was a comedian and they surprisingly said, “sure, that sounds like fun.” I was all excited for a minute that we were going to use this fairly well known comedian. But then my producer spidey senses started to tingle. You know when you’re trying to book a guest and it’s like, that was too easy? A celebrity guest should be harder to book, like there should be some caveats.
So I broke my rule [that ChatGPT would make all the decisions] and did a little more digging. I found out that the person ChatGPT had recommended, there was a whole backlash about them online that was quite significant. Like you would never put this person on a show if you cared about brand safety! They had done some really, really dodgy things. They had a stalker problem. ChatGPT had dropped me on the shores of this really weird person. So we had to steer away from that! We ended up using fully cloned voices through ElevenLabs.
ChatGPT wrote the script, and that was very challenging, because it kept doing things like repeating the same information over and over. It was very difficult to get something cohesive and script-shaped out of it.
What was that process of script writing with ChatGPT?
With the prompt engineering, there's going to be a learning curve with any new technology. So we expected that, and we kind of anticipated that there would be a bit of time spent on back and forth. ChatGPT is still learning what makes a good script. It had all the elements; it knows that there should be an intro. It understands what a midpoint rebrand is. It understands what a sound effect is. It looks like a normal script on the surface, but nothing is really where it should be.
And the other challenge we encountered is that it has a word count – it’ll spit out a certain number of words and then it’ll just stop. You have to keep prompting it to keep going, like “no keep going, finish the story”. In terms of structure, there’s not a natural understanding of the arc of an episode. You have to spell all of that out in quite a bit of detail as you’re prompting it. Eventually we did end up with a cobbled together script, and it sort of made sense.
Was the tone of the script really flat?
We had to play around with that. We had to prompt it to add a few conversational ticks like ums and ahs. You have to really manipulate the prompting to try and get the tone you want.
ChatGPT created two fake guests; it gave us background bios on each of them. Funnily enough, it actually chose two female characters with the exact same name, Olivia. Really odd.
Why did it do that?!
We don’t know. In real audio producing, if you have two characters named the same thing – especially if they’re roughly the same age or have similar voices - that’s a real challenge, right? You've got to hold the audience's hand through that, or they will get very confused. So there’s an example of where a human producer would have to figure out how to handle that – but ChatGPT didn’t see any problem.
The next step was to take out patched-together AI script into the voice cloning space. For the guests we used cloned "off the shelf" voices from eleven.io. For the voice of our host we ended up using our producer Matt Stevens’ voice as a sample and cloning that. We found that to be more successful, though there were still some very odd moments … like when Matt suddenly sounded like a demon!
There are weird things that happen with voice cloning — you’ve got to ask it to be more hesitant, or ask it to say ums and ahs. And even when they do that, it’s like fake relaxed, there’s something off about it.
Audio is such an intimate medium, right? If you're betraying that intimacy with fake voices, it's not going to work. I think that the tools have a lot of promise. Like if your host mispronounces a name or mistakenly uses a plural, [you can do] little one-word swaps here and there. To me that’s like the audio equivalent of Photoshop.
But if you tweak things too much, rely on it too heavily, or go full-AI, then I don’t think it makes a real human-to-human connection. It just falls short.
What surprised you about this experiment?
Well, what reassured me was that when we took the experiment out to the public and shared it with a fairly small sample set – we had about 100 responses – that 99% of the people preferred the human-made podcast. It was very personal and warm. It had an anecdote about being chased by dogs. It started with an active scene and took you on a journey. It was able to do that immersive theater of the mind sound design. It felt comprehensive, it felt intentional, and it could create connection, because it was somebody's personal travel recollection, exploring what it means to go to a small town, live there, get to know people, and embody slow travel (which is what we used to call backpacking). It was just a much better listen.
My suspicion was that the results would turn out as they did, but seeing that the test group actually validated that… I was relieved; we felt like, okay, there is still a need for this authentic content.
One of the things I've realized over the past year is now that we have a sea of AI content out there, there’s even more of a need for content that can rise above that and connect. To me, it’s an even stronger case for better storytelling.
That is very reassuring.
I do like [AI] though, I don't want to sound like a Luddite. We use it every day [at JAR Audio]. We use it as a brainstorming partner. We are careful to fact check it because it's often inaccurate. We do find that in the sense that it can access a lot of information quickly, it can be a time saver in some areas.
So how should audio producers be using these tools in our work?
I think there's two main areas. One is as a collaborator, as a brainstorming partner, as an initial whiteboard, so you're not alone with a blank piece of paper. “Give me 50 title ideas for a show about frogs”. That could take me half a day to sit there noodling on that and ChatGPT might spit out 50 in 15 seconds. There might be one you could maybe work with. It is derivative, it’s not going to give you a lot of original content. You’re not going to break the mold with ChatGPT, but it can be a good sounding board, or a diving board into your ideas.
And then the other area that we really like it for is quick audio fixes in the fine edit stage. Rather than call your host back for tiny little pickups, you can just use the tools to tweak it. Descript has some really excellent AI tools built into it. We use those quite often, but we use them sparingly, and only with permission of the host or brand we’re working with.
One of the questions you guys posed at the start of your experiment was: will robots take our jobs? It sounds like you don’t think that’s going to happen anytime soon.
No, I don't think they will ever replace what's at the core of the work that we do, which is human-to-human connection. I think it's legitimately true. Do you want to listen to a robot for half an hour?
Because creativity is not just about product, right? It's actually about process as well. And so part of what you're getting when you listen to a real podcast by a real human is the understanding that they've labored over it. They've thought about it. From their heart to your heart is coming something funny or something informative or something they care about. And there's meaning in that. When you remove that part of the process and press a button to generate the content, it's neat, it's nifty… but it doesn't mean anything.
It’s that soulless… it’s the uncanny valley of audio, basically. It's a chasm you cannot cross, but you can approach it. You can play around with it. You could use it in your process in different ways. Prompt engineering is a fascinating area; you can get faster and better at that. You can tailor it.
So there’s a lot you can do with it, but it’s never going to replace what podcasting and storytelling essentially is, which is connecting.
The creative producer of the AI show was Reaon Ford, with help from a former audience growth specialist Matt Stevens. The producer of the non-AI show was Tori Weldon. You can read about the experiment in full here. Also check out this episode of JAR’s (human-made!) show, How to Get Ahead in Podcasting, Chad GPT - the AI Intern: Using AI in Podcasting.
Inspired by Jen’s approach, I decided to ask AI … well, how AI could help me in my work. Here’s what Claude had to say:
As an audio journalist, I could assist you in several ways:
Script writing: I can help draft or refine scripts for your audio pieces, ensuring they're concise, engaging, and well-structured.
Research: I can gather and summarize information on various topics, helping you prepare for interviews or develop story ideas.
Interview preparation: I can suggest potential questions or angles for your interviews based on the subject matter and interviewee background.
Fact-checking: I can help verify information and sources to ensure accuracy in your reporting.
Transcription analysis: If you provide transcripts of interviews or recordings, I can help analyze them for key themes or interesting quotes.
Story structure: I can assist in organizing your audio stories, suggesting effective ways to present information and maintain listener engagement.
SEO optimization: For podcast descriptions or web content, I can help optimize text for search engines.
Content ideas: I can brainstorm potential story ideas or series concepts based on current events or your areas of interest.
Editing assistance: While I can’t edit audio directly, I can provide feedback on script edits or suggest where cuts or additions might improve the flow.
Audience engagement: I can help craft compelling teasers or social media posts to promote your audio content.
I’d love to hear about your own adventures into the uncanny valley. Keep me posted 💌 🤖
Jobs
🌷Reporter, Youth and Education ~ NHPR ~ Concord, NH
🌷KERA Arts Reporter ~ Dallas, TX
🌷Podcast Host, ICYMI ~ Slate ~ Brooklyn, NY
🌷Audio Producer, Sports ~ Mediahuis ~ Remote
🌷General Assignment Reporter ~ KUOW ~ Seattle, WA
Training and awards
🚶Sound Walk September Awards ~ deadline 30 September
🚶Managing Productions to Managing People: A How to Guide for Producers and Editors (Session V) with Mia Lobel ~ AIR Class ~ Sep 9, 16, 23
🚶How to Pitch Podcast Executives and Get Your Show Made with T.J. Raphael ~ AIR Class ~ Sep 17, 19, 24, & 26
🚶From Idea To Launch: How To Create A Sustainable Podcast with Laurel Morales ~ AIR Class ~ Sep 17, 19, 24, & 26
🚶AI in Audio: Legal insights to know your rights and protect your work with Alicia Pollard, Esq. ~ Free for AIR members / $25 non-members Webinar ~ Sep 19
🚶Navigating the New and Exciting World of Audiobook Originals with Michelle Altman and Hillary Frank ~ free AIR webinar ~ Sep 13
Thanks for reading and supporting The Audio Storyteller, folks!