AI Unbiased

AI Unbiased: Time Saved at What Cost? When AI Took Over Transcription in Nonfiction Filmmaking

AI is a hot topic in the film creative industry. But beyond the hype and the fear, what's really going on?

Alex Clark

Producer June 17th, 2024
AI Unbiased: Time Saved at What Cost? When AI Took Over Transcription in Nonfiction Filmmaking

💥 Introducing VC's newest column: AI Unbiased. This monthly editorial is all about—you guessed it—AI. 👾 Whether you love it, feel sus about it, or are still forming your opinion on it, let long-time VC member and community organizer Alex Clark serve as your guide into the world of AI and its real-time effects on our industry thus far.


It's 2015, and I'm a budding video journalist in New York City. Scouring media listservs, Facebook groups for production, and various online forums, I discovered a demand for quick manual transcription (usually typing verbatim, with timecode) of film and radio interviews. The going rate for this essential work: $50 per hour of audio. In an era when unpaid internships were still the norm, this felt like a solid opportunity to make some extra cash and meet others in media.

Such human transcription gigs are becoming increasingly rare, and on some forums, nearly extinct. The reason? AI transcription tools like Trint and Adobe Premiere's speech-to-text feature have quietly entered the stage, and they're here to stay as essential additions to the filmmaker's toolkit.

If you're burning a ton of time just transcribing, that's time that you're not shooting. That's time that you're not planning out the next day's events.
—Allie Delury
AI transcription is everywhere, y’all.
Depending on the project and your relationship to the project, transcribing yourself can be super valuable.
—Kristina Budelis

I spoke with VC members Allie Delury, Kristina Budelis, and Wes Block about their experiences with AI transcription, namely how it's being used in their workflows and what we can learn from its implementation.

Kristina Budelis, an LA-based documentary filmmaker, entrepreneur, and the creator of the AI & Film newsletter Film Robots, has been using AI transcription tools like Trint for years. Back when manual transcription was the only option, obtaining full transcripts wasn't always feasible. But that's changed a bit: "There were some projects that simply couldn't afford or allocate the time for transcription in the past. With numerous low-budget, quick-turnaround projects, [AI transcription] enables transcription to occur when it otherwise wouldn't," she explains.

Budelis, of course, still believes in the merits of manual transcription: "Depending on the project and your relationship to the project, transcribing yourself can be super valuable." She adds, "I've transcribed an interview or a part of an interview because I want to really, like, get to know it and internalize the material."

Allie Delury is a Brooklyn-based producer working on a feature-length documentary about stand-up comedy. It’s called Local Yokels. To Delury, the time-saving benefits of AI transcription are no joke: "Time is like the most valuable thing. We all know that you don't sleep when you're in the middle of production on a documentary. If you're burning a ton of time just transcribing, that's time that you're not shooting. That's time that you're not planning out the next day's events.”

For insight into where this is leading, I called Wesley Block, VC member, filmmaker, and developer of an AI-powered app called Kino, which utilizes Whisper, an open-source AI transcription model from OpenAI, to generate transcripts. But those transcripts are just a first step. It also uses AI to tag visual characteristics (colors, environments, faces) from every frame. The transcripts combined with all these other characteristics allow users to search through footage for moments, words, emotions, or subjects. "We want to be the best way to retrieve anything from your media library," Block explains.

We want to be the best way to retrieve anything from your media library.
—Wes Block, developer of Kino

When Transcription Became Free

In 2022, OpenAI unveiled the aforementioned Whisper, a powerful automatic speech recognition tool trained on a staggering 680,000 hours of multilingual audio. It can accurately interpret and transcribe dialogue from interviews and scenes in mere moments. To put this advancement into perspective, Apple's revolutionary Siri was introduced in 2011. Just over a decade later, the underlying technology has become so widespread that transcription is now essentially free and accessible to all.

Typical of most present-day AI, there are downsides with usability and practicality. Whisper is highly hardware-dependent and a pain to install. As demand for AI and processing capabilities increase, we can expect to see more user-friendly, standalone apps like Kino, as well as numerous new tools integrated into popular video editing suites and plugins.

Note: A surprising New York Times article from April revealed how Whisper was originally developed to scrape audio data from YouTube videos for training OpenAI's next-generation AI models. As someone who has published thousands of hours of original content on YouTube, I don’t love it.

Time Saved at What Cost?

Advocates of AI (or efficiency) won’t shed tears over the loss of a few undesirable side gigs. After factoring in formatting, timecode insertions, and invoicing, my transcription work for various outlets netted me around $12 per hour.

Allie Delury puts it bluntly: "At this point, in my opinion, it's like slave labor to assume that people are going to work for way below minimum wage just to put in their time on a set, to eventually become something else. If AI helps more people jump to the position that they want the most, then of course I'm going to be a huge proponent of that."

When it comes to transcribing media, nonfiction filmmakers have readily embraced the trade-off: assigning AI a tedious manual task means more time for other aspects of production. However, as AI tools become increasingly sophisticated, offering capabilities such as generative image and audio creation, nonfiction filmmakers face a new set of challenges. They must carefully weigh the benefits and potential drawbacks of these tools, often with limited information, while striving to maintain the human touch that is crucial to nonfiction storytelling.

Alex Clark is a video journalist, documentary filmmaker, and adjunct associate professor at Columbia University's Graduate School of Journalism. His work includes hosting the Emmy-nominated series "Glad You Asked" for Vox, producing the PBS NOVA film "Crypto Decoded," and editing for the Peabody-nominated film "The Picture Taker.”

Thanks to Allie Delury, Kristina Budelis, and Wes Block for sharing their insights for this piece.

Check out Kristina’s newsletter: Film Robots.

Wes is changing the way search works over at Kino AI.

Allie is working on a future-length documentary about stand-up comedy called Local Yokels. Look out for it!


Share this post

You may also like:

Read More
Get the latest news delivered right to your inbox
Get summaries a few times a month of what’s happening at VC.
Enter your address, so we can share industry gatherings in your region. (We won't store your street address; only your city, state or province and country.)

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.