OpenAI hears you whisper | Hackaday

If you want to try high-quality voice recognition without buying anything, good luck. Sure, you can borrow the speech recognition on your phone or force some virtual assistants on a Raspberry Pi to handle the processing for you, but they’re not good for larger work that you don’t want to be tied to a closed source code. solution. OpenAI has introduced Whisper, which it claims is an open source neural network that “approaches human-level robustness and accuracy in English speech recognition.” It seems to work in at least some other languages ​​too.

If you try the demonstrations, you’ll see that speaking fast or with a nice accent doesn’t seem to affect the results. The post mentions that it was trained on 680,000 hours of monitored data. If you were to talk that much to an artificial intelligence, it would take you 77 years without sleep!

Internally, speech is divided into 30-second chunks that feed a spectrogram. Encoders process the spectrogram and decoders digest the results using some predictions and other heuristics. About a third of the data was from non-English speaking sources and then translated. You can read the paper about how the generalized training underperforms some specifically trained models on standard benchmarks, but they think Whisper does better on random speech beyond specific benchmarks.

The size of the model at the “small” variation is still 39 megabytes, and the “large” variation is over one and a half gigs. So this is probably not going to run on your Arduino anytime soon. If you want to code though, it’s all on GitHub.

There are other solutions, but not as robust. If you want to go the assistant-based route, here’s some inspiration.

Leave a Reply

Your email address will not be published.