Possible to generate and display subtitles in real time using Whisper.cpp?
#1
I just came across this article on Mastodon, it is from the New Yorker:

Whispers of A.I.’s Modular Future

And also this video:

Auto-generating subtitles using Whisper



My question is, is there any chance that someone who knows how to create Kodi addons could perhaps create one that could leverage this technology to provide closed captions on live or recorded TV streams and other videos?  My thinking on the way this could possibly work is you buffer the incoming video and use the software on the audio to do the captioning, but delay playing the video until the captions are ready (assuming the captioning can be done in more or less real time).  I would not mind a 10 to 20 second delay before a stream starts playing if that is what it takes to add the captions.  Now I know some will say that all TV shows are supposed to be closed captioned but in the real world that is not always the case, and also ffmpeg (which is used by damn near everything that processes audio and video) is often terrible at preserving closed captions, depending on the source.  So this would be another way to caption programs that aren't supplied with closed captions, or where the captions have been lost in translation.

The alternative would be to wait until a program is fully recorded and then try to post-process it to add the captions, but maybe that would be more complicated than trying to do it in real time?  I don't know, I am not a programmer.  But just thought I'd throw this out there because it would be a great thing to have for hearing impaired people, especially in scenes that had poor microphone placement, or where people whisper or mumble or have a really thick accent (assuming the AI is smart enough to deal with those situations, which I realize it may not be - yet).
Reply
#2
I see one problem with auto-generated subtitles, and that is actors who mumble and are therefore impossible to hear correctly besides all the other audio bits flying around.

Here is some clarification on the subject:
Reply
#3
Well I did not say it would be perfect, but apparently this Whisper.cpp thing is supposed to be much better at decoding speech than anything that has come previously, so unless the actor mumbles something totally unintelligible there is a chance it could work.  Also on sources with multichannel audio, it might be worth having a setting to only monitor the center channel for speech, since on many sources all speech comes from the center channel and the other channels are reserved for background music and sound effects.

Apparently there is already something like this for Linux but it is not a Kodi addon, and I don't know if it uses Whisper.cpp or something else:

Open source video captioning on Linux
Reply
#4
That particular implementation appears to be optimized for Apple.  The actual OpenAI source is here https://github.com/openai/whisper. I suppose not surprising that Github (MS) is pushing it.

I see that whisper source is published under MIT license, so I guess is compatible with Kodi GPLv2 or later.

scott s.
.
Reply

Logout Mark Read Team Forum Stats Members Help
Possible to generate and display subtitles in real time using Whisper.cpp?0