Skip to main content

I used OpenAI’s new tech to transcribe audio right on my laptop

Illustration of a series of blue microphones on a teal background.
The benefits of AI without the drawbacks of the cloud. | Kristen Radtke / The Verge; Getty Images

OpenAI, the company behind image-generation and meme-spawning program DALL-E and the powerful text autocomplete engine GPT-3, has launched a new, open-source neural network meant to transcribe audio into written text (via TechCrunch). It’s called Whisper, and the company says it “approaches human level robustness and accuracy on English speech recognition” and that it can also automatically recognize, transcribe, and translate other languages like Spanish, Italian, and Japanese.

As someone who’s constantly recording and transcribing interviews, I was immediately hyped about this news — I thought I’d be able to write my own app to securely transcribe audio right from my computer. While cloud-based services like Otter.ai and Trint work for most things and are relatively secure, there are just some interviews where I, or my sources, would feel more comfortable if the audio file stayed off the internet.

Using it turned out to be even easier than I’d imagined; I already have Python and various developer tools set up on my computer, so installing Whisper was as easy as running a single Terminal command. Within 15 minutes, I was able to use Whisper to transcribe a test audio clip that I’d recorded. For someone relatively tech-savvy who didn’t already have Python, FFmpeg, Xcode, and Homebrew set up, it’d probably take closer to an hour or two. There is already someone working on making the process much simpler and user-friendly, though, which we’ll talk about in just a second.

Command-line apps obviously aren’t for everyone, but for something that’s doing a relatively complex job, Whisper’s very easy to use.

While OpenAI definitely saw this use case as a possibility, it’s pretty clear the company is mainly targeting researchers and developers with this release. In the blog post announcing Whisper, the team said its code could “serve as a foundation for building useful applications and for further research on robust speech processing” and that it hopes “Whisper’s high accuracy and ease of use will allow developers to add voice interfaces to a much wider set of applications.” This approach is still notable, however — the company has limited access to its most popular machine-learning projects like DALL-E or GPT-3, citing a desire to “learn more about real-world use and continue to iterate on our safety systems.”

Image showing a text file with the transcribed lyrics for Yung Gravy’s song “Betty (Get Money).” The transcription contains many inaccuracies.
The text files Whisper produces aren’t exactly the easiest to read if you’re using them to write an article, either.

There’s also the fact that it’s not exactly a user-friendly process to install Whisper for most people. However, journalist Peter Sterne has teamed up with GitHub developer advocate Christina Warren to try and fix that, announcing that they’re creating a “free, secure, and easy-to-use transcription app for journalists” based on Whisper’s machine learning model. I spoke to Sterne, and he said that he decided the program, dubbed Stage Whisper, should exist after he ran some interviews through it and determined that it was “the best transcription I’d ever used, with the exception of human transcribers.”

I compared a transcription generated by Whisper to what Otter.ai and Trint put out for the same file, and I would say that it was relatively comparable. There were enough errors in all of them that I would never just copy and paste quotes from them into an article without double-checking the audio (which is, of course, best practice anyway, no matter what service you’re using). But Whisper’s version would absolutely do the job for me; I can search through it to find the sections I need and then just double-check those manually. In theory, Stage Whisper should perform exactly the same since it’ll be using the same model, just with a GUI wrapped around it.

Sterne admitted that tech from Apple and Google could make Stage Whisper obsolete within a few years — the Pixel’s voice recorder app has been able to do offline transcriptions for years, and a version of that feature is starting to roll out to some other Android devices, and Apple has offline dictation built into iOS (though currently there’s not a good way to actually transcribe audio files with it). “But we can’t wait that long,” Sterne said. “Journalists like us need good auto-transcription apps today.” He hopes to have a bare-bones version of the Whisper-based app ready in two weeks.

To be clear, Whisper probably won’t totally obsolete cloud-based services like Otter.ai and Trint, no matter how easy it is to use. For one, OpenAI’s model is missing one of the biggest features of traditional transcription services: being able to label who said what. Sterne said Stage Whisper probably wouldn’t support this feature: “we’re not developing our own machine learning model.”

And while you’re getting the benefits of local processing, you’re also getting the drawbacks. The main one is that your laptop is almost certainly significantly less powerful than the computers a professional transcription service is using. For example, I fed the audio from a 24-minute-long interview into Whisper, running on my M1 MacBook Pro; it took around 52 minutes to transcribe the whole file. (Yes, I did make sure it was using the Apple Silicon version of Python instead of the Intel one.) Otter spat out a transcript in less than eight minutes.

OpenAI’s tech does have one big advantage, though — price. The cloud-based subscription services will almost certainly cost you money if you’re using them professionally (Otter has a free tier, but upcoming changes are going to make it less useful for people who are transcribing things frequently), and the transcription features built-into platforms like Microsoft Word or the Pixel require you to pay for separate software or hardware. Stage Whisper — and Whisper itself— is free and can run on the computer you already have.

Again, OpenAI has higher hopes for Whisper than it being the basis for a secure transcription app — and I’m very excited about what researchers end up doing with it or what they’ll learn by looking at the machine learning model, which was trained on “680,000 hours of multilingual and multitask supervised data collected from the web.” But the fact that it also happens to have a real, practical use today makes it all the more exciting.



Source: The Verge

Popular posts from this blog

Best Buy Takes $200 Off M1 iPad Air and $100 Off iPad Mini 6

Today we're tracking a pair of deals on Apple's sixth-generation iPad mini and fifth-generation iPad Air , which are available on Best Buy and Amazon. Both of these sales are notable for having nearly every model of each tablet on sale right now, with as much as $200 off select models. Note: MacRumors is an affiliate partner with Best Buy. When you click a link and make a purchase, we may receive a small payment, which helps us keep the site running. Starting with the 64GB Wi-Fi M1 iPad Air, Best Buy has this model for $399.99 , down from $599.99. This is an all-time low price on this model of the 2022 iPad Air, and it's available in every color. You'll also find the 256GB Wi-Fi iPad Air on sale, as well as both cellular devices, one of which is only available on Amazon. $200 OFF 64GB Wi-Fi iPad Air for $399.99 $200 OFF 256GB Wi-Fi iPad Air for $549.99 $200 OFF 64GB Cellular iPad Air for $549.99 $200 OFF 256GB Cellular iPad Air for $699.99

Revealed: iOS 18 Will Be Compatible With These iPhone Models

iOS 18 will be compatible with the same iPhone models as iOS 17, according to a post on X today from a private account with a proven track record of sharing build numbers for upcoming iOS updates. iOS 18 will be compatible with the iPhone XR, and hence also the iPhone XS and iPhone XS Max models with the same A12 Bionic chip, but older iPhone models will miss out. Here is the full compatibility list for iOS 18: iPhone 15 iPhone 15 Plus iPhone 15 Pro iPhone 15 Pro Max iPhone 14 iPhone 14 Plus iPhone 14 Pro iPhone 14 Pro Max iPhone 13 iPhone 13 mini iPhone 13 Pro iPhone 13 Pro Max iPhone 12 iPhone 12 mini iPhone 12 Pro iPhone 12 Pro Max iPhone 11 iPhone 11 Pro iPhone 11 Pro Max iPhone XS iPhone XS Max iPhone XR iPhone SE (2nd generation) iPhone SE (3rd generation) Apple is expected to announce iOS 18 at its Worldwide Developers Conference, which begins June 10, and the update should be released to all users with a compatible iPhone in

Apple Has 'Very Serious' DMA Issues, EU to Enforce Rules 'Soon'

Apple is facing a "number" of "very serious" issues with its Digital Markets Act compliance in Europe, EU competition commissioner Margrethe Vestager said in an interview with CNBC . The European Commission opened an investigation into Apple in March to determine if Apple's EU app marketplace changes were complying with DMA regulations. The DMA requires Apple to provide developers with an option to distribute apps outside of the App Store and without ‌App Store‌'s fees. Apple implemented support for app marketplaces with iOS 17.4, but it charges an 0.50 euro Core Technology Fee for each download after the initial 1 million annual installs. Vestager does not believe that Apple's changes meet the requirements of the DMA. "We have a number of Apple issues; I find them very serious," she said. "I was very surprised that we would have such suspicions of Apple being non-compliant." She went on to say that this implementation "

iOS 18 Introduces More Charging Limit Options on iPhone 15 Models: 85%, 90%, and 95%

All four iPhone 15 models feature a setting that prevents the devices from charging beyond 80% while toggled on, which can potentially improve an iPhone battery's lifespan by reducing the time that the battery is fully charged. And with iOS 18, Apple has taken this feature a step further by adding new 85%, 90%, and 95% charging limit options. Screenshot: Aaron Perris The feature can be found in the Settings app under Battery → Charging. The charging limit feature remains exclusive to the iPhone 15 lineup on the first iOS 18 beta, with the option not found on the iPhone 14 Pro or older. The first iOS 18 beta is now available for Apple Developer Program members, and a public beta will follow in July. Following beta testing, the update should be widely released to all users with an iPhone XS or newer in September. Related Roundup: iOS 18 This article, " iOS 18 Introduces More Charging Limit Options on iPhone 15 Models: 85%, 90%, and 95% " first appeared on MacRum