Our End-of-Year Campaign is happening NOW - Support VC 💙

AI Unbiased

AI Unbiased: Can Artificial Intelligence Save Bad Audio? Testing AI Speech Enhancement

How does AI audio editing stack up vs. Audacity?

Alex Clark

Producer August 1st, 2024
AI Unbiased: Can Artificial Intelligence Save Bad Audio? Testing AI Speech Enhancement

💥 Introducing VC's newest column: AI Unbiased. This monthly editorial is all about—you guessed it—AI. 👾 Whether you love it, feel sus about it, or are still forming your opinion on it, let long-time VC member and community organizer Alex Clark serve as your guide into the world of AI and its real-time effects on our industry thus far.


Picture this: You're in the middle of a crucial interview. The subject is really opening up, sharing insights you've waited months to capture. Suddenly, your trusty lavalier mic dies. All you have left is the shotgun mic, yards away from your subject. Can AI save the audio?

As artificial intelligence revolutionizes editing, from generating images to streamlining video cuts, its impact on audio post-production remains an open question.

Today, we're testing Adobe's Enhance Speech feature, a new and popular audio enhancement tool included in Premiere, comparing it to the noise reduction tool included in Audacity, a free program that’s been around for more than two decades.

Enhance Speech promises a one-click, local solution for improving audio quality, with no uploads required. It was created by training a neural network on millions of pairs of before-and-after audio recordings, essentially teaching the AI to process an audio clip in the same way a sound editor would.

A screenshot of Audacity, a free audio editing software. Its noise reduction tool lets users manually sample the noise profile, then apply frequency-based reduction to the entire audio track.

To evaluate our (highly unscientific) results, I've enlisted help from two Video Consortium experts: Zach Egan, a sound designer and re-recording mixer experienced in film and documentary sound design, and Hyacinth Empinado, a seasoned video and podcast editor. Our three tests will compare the Adobe AI “Enhance Speech” audio to both the original audio and audio processed by Audacity's noise reduction tool.

Before we dive in, it's worth noting some limitations of Adobe’s Enhance Speech effect:

1) It struggles with very low signal-to-noise ratios, where background noise overwhelms the main voice.

2) Making a cut, re-opening a project, or duplicating an audio clip often requires re-processing, perhaps making it more suitable for final tweaks rather than as a first step in editing.

Time for testing!

TEST 1: Cleaning up untreated vocals with a condenser mic

For this test, I recorded at my desk using a condenser mic with a pop filter. I asked Zach and Hyacinth to listen to the original version, Adobe’s AI-enhanced version, and a version with Audacity’s noise reduction, which requires users to manually select a bit of room tone or noise and define the reduction amount in decibels.

Zach's thoughts:

The AI completely removed any noise. It sounds ok – no distracting artifacts introduced – but you lose some nice high frequencies in the voice, which makes it sound like an AI-generated voice. Definitely improved it.

The Audacity noise reduction is extremely subtle. I would probably prefer to use Audacity, and dial it up a bit more. Having some room tone in a recording is not a bad thing, depending on the situation. When there’s music underneath you can get away with it, even for a documentary. For a more cinematic scene, you typically need to be more aggressive with the noise reduction. I would choose to use my own set of tools instead of AI, or use the AI more conservatively, dialing down the amount.

Hyacinth's thoughts:

To echo what Zach said, having some room tone is not a bad thing. I usually edit mini docs and explainer videos, and a bit of room tone gives scenes more character. There’s a certain uncanniness to the AI version. It’s too clean. It reminds me of someone reading an audiobook, which is a very specific mood to go for, especially in video.

For my purposes, I would probably opt for treating the vocals first, instead of completely stripping the background noise, especially since the noise is not especially egregious. If it were a loud fan, or a computer whirring, I could see myself trying Enhance Speech.

I usually edit mini docs and explainer videos, and a bit of room tone gives scenes more character. There’s a certain uncanniness to the AI version. It’s too clean. It reminds me of someone reading an audiobook, which is a very specific mood to go for, especially in video.
—Hyacinth Empinado, video and podcast editor

TEST 2: The Nightmare Scenario - Super Noisy Outdoor Audio

Perhaps overly optimistic about AI's capabilities, I recorded a poem as a train passed by. This was captured on a shotgun microphone, a full 10 feet from the camera.

Zach's thoughts:

This is clearly a terrible recording and I can’t see a situation where you would ever use this. I guess for a rare occasion where you have no other choice I would use the AI voiceover but it still sounds pretty bad, lots of artifacts and robotic elements – doesn’t really sound human. The Audacity noise reduction isn’t usable either. If you’re shooting in a noisy area like this the best option would be to use lavs or a dynamic microphone, which are typically less sensitive. Or get the boom as close to the subject as possible. Time for ADR!

Hyacinth's thoughts:

If I had to use this shot, I would use the unprocessed recording with subtitles, as both denoising results were unusable. I was surprised that the AI result sounded much worse than manual processing. Some words were garbled, and the voice sounded like a completely different person. Sometimes, I think AI goes so far with audio processing that people’s voices are completely changed, which I think definitely crosses a line as far as journalistic ethics are concerned.

Sometimes, I think AI goes so far with audio processing that people’s voices are completely changed, which I think definitely crosses a line as far as journalistic ethics are concerned.
—Hyacinth Empinado, video and podcast editor

TEST 3: Less Noisy, But Still Challenging Outdoor Audio

For our final test, we toned down the difficulty, but kept it realistically noisy. Again, we used a shotgun mic 10 feet from the camera. For this one, I even combined both the noise reduction in Audacity with Enhance Speech, and threw in some jazzy background music to mask the remaining issues.

Zach's thoughts:

Another terrible recording. You can hear how robotic it sounds as if there’s a “T-Pain” autotune on the voice. The voice clips and distorts when he says “white chicken.” Like the previous test, the best way to improve this would be to approach the recording process differently. Hire a professional location sound mixer to record your audio. You’ll save yourself a lot of time, money and effort in the long run! Or hire a professional sound editor to fix up your bad audio.

Hyacinth's thoughts:

AI strikes out again. It completely distorted Alex’s voice. The Audacity version isn’t great either, but I would probably try this first on bad audio over the AI tool.

Editing audio is extremely detail-oriented, takes a lot of focus and problem solving skills. I rely on a number of different dialogue editing tools and techniques including de-click, de-hum, equalization, de-essing, clip gain automation, volume automation, and room tone normalization.
—Zach Egan, sound designer

The verdict:

AI tools like Adobe's Enhance Speech or Audacity's noise reduction can't fix poor recordings. They're no substitute for proper recording technique or professional expertise.

As Zach explains: "Editing audio is extremely detail-oriented, takes a lot of focus and problem solving skills. I rely on a number of different dialogue editing tools and techniques including de-click, de-hum, equalization, de-essing, clip gain automation, volume automation, and room tone normalization."

While AI will surely advance, we found Enhance Speech, despite its extensive development, is mainly suitable for minor tasks like enhancing voiceovers recorded in relatively quiet environments, or perhaps making a Zoom interview sound a bit sweeter. Quality audio will continue to depend on good recording practices and skilled professionals.

Thanks to Zach Egan and Hyacinth Empinado for lending their expertise to this experiment.

Keep an eye out for "Tiger Tiger," a documentary that Zach recently worked on. The film, which recently debuted in New York City, will be featured at upcoming festivals. Zach's site is zeganaudio.com.

And you can find more of Hyacinth’s work over at STAT News!

Alex Clark is a video journalist, documentary filmmaker, and adjunct associate professor at Columbia University's Graduate School of Journalism. His work includes hosting the Emmy-nominated series "Glad You Asked" for Vox, producing the PBS NOVA film "Crypto Decoded," and editing for the Peabody-nominated film "The Picture Taker.”

Rough Cut Magazine is edited by Monica Gokey.

Rough Cut Magazine is VC's digital mag for and by industry thought leaders, doc filmmakers, and video journalists across the world.


Share this post

You may also like:

Read More
Get the latest news delivered right to your inbox
Get summaries a few times a month of what’s happening at VC.
Enter your address, so we can share industry gatherings in your region. (We won't store your street address; only your city, state or province and country.)

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.