T O P

  • By -

aschmelyun

Hey everyone! I built [Subvert](https://github.com/aschmelyun/subvert) over the weekend and just released the first version of it. I wanted something to automate the process of adding and translating subtitles and summaries for a video course I'm working on. Didn't feel like paying for an existing option and wanted to try out the Whisper API so I figured why not scratch my own itch? You can run the app with a single command via a self-contained Docker image. It's powered by OpenAI's Whisper and GPT-3.5 APIs, PHP (Laravel), JavaScript (Vue), Sqlite, and FFMpeg. Would love any feedback, and hope you enjoy it! [github.com/aschmelyun/subvert](https://github.com/aschmelyun/subvert)


[deleted]

Is the OpenAI API access free?


saintshing

I havent tried the openai api as it is not available where I live(Hong Kong). I recently read an article(author works at huggingface) comparing the performance and cost of their text embedding service compared to free open source models. I was shocked free models can achieve pretty much the same or better with much lower cost. https://medium.com/@nils_reimers/openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9 from the conclusion >The text similarity models are weaker than e.g. Universal Sentence Encoder from 2018 and much weaker than text embedding models from 2021. They are even weaker than the all-MiniLM-L6-v1 model, which is so small & efficient that it can run in your browser. >The text-search models perform much stronger, achieving good results. But they are just on-par with open models like SPLADEv2 or multi-qa-mpnet-base-dot-v1. >The biggest downside for the OpenAI embeddings endpoint is the high costs (about 8,000–600,000 times more expensive than open models on your infrastructure), the high dimensionality of up to 12288 dimensions (making downstream applications slow), and the extreme latency when computing embeddings. This hinders the actual usage of the embeddings for any search applications. disclaimer: I am just learning ML, I haven't personally verified their results and I am not sure if the license of those open source models may limit their commercial use


Chreutz

You pay per token (0.002 $ / 1000 tokens). A token is on average 0.75 words (some words are multiple tokens).


madiele

That is for the chat api Whisper costs 6 cents for 10 minutes


SnooMarzipans1345

Is the website down? I cannot connect to it. [https://subvert.dev/](https://subvert.dev/) "ERR\_CONNECTION\_TIMED\_OUT"


hushrom

Hey there, I'm going to start creating my own PHP Laravel web application, should I use it's built in authentication solution or create one from scratch? Also did you use static analysis like PHPStan for your app?


leonguyen52

I cannot make it work with cloudflare zerotrust tunnels, it worked only http and port only but not ssl 🥹 any idea to solve it


[deleted]

[удалено]


aschmelyun

Goal is to get it working with some of the llama/alpaca offline proof of concepts, fingers crossed!


[deleted]

[удалено]


cdemi

Whisper yes, but GPT 3.5 no


SnooMarzipans1345

following this thread.


sirrush7

I'll try this out shortly, could be quite handy for wife's work where she waits for an ancient terrible low powered laptop to generate chapters in videos, and she has to manually transcribe everything herself.... Which can be hard with specialized terminology, accents and dialects etc... This seems like it could be a dream! Since it uses ffmpeg, can it utilize a GPU to speed things up or do multiple concurrently?


aschmelyun

I will say, using OpenAI's Whisper API to do the translations has been insane. My videos are programming tutorials and contain a lot of tech jargon, usually auto-generated subtitles like those on YouTube are pretty bad at picking that stuff up, but I've had no problem with this grabbing those specialized terms. I'm not 100% sure since it's being utilized through a PHP library. To be fair though, the only thing it's doing is extracting the audio, so the gains made by running through the GPU might be limited...


sirrush7

Oh I see, so it doesn't really need to chew through the entire video file the way I was thinking... Very neat. Well I think if you can get a version that uses a self-hosted ai library of some type, as well as the online version, this will be fantastic. Some of the video files I have a use case for are anywhere from like 100mb to 3gb though!


Chreutz

If you collapse the audio track to mono and use AAC with a low, variable bitrate, speech should still be plenty understandable (transcribable?), and you can cram quite a bit of time into the 25 MiB limit of OpenAI Whisper.


sirrush7

Oh now I get it... Thanks! So it's stripping the audio first... I really need to try this out, seems great then!


Chreutz

The tool OP made actually does the audio stripping already. But the Whisper API is limited to an audio file size, not length (although you pay according to the length), so optimizing for audio file size can make it less times you have to run the app.


SnooMarzipans1345

Does your wife want a side job using this tech as a proof that it works? 2 birds one stone. ;) I have hundreds of pages that need to transcribe of videos and translated to about 2 to 5 other languages each video. i surely dont want to \*\*sigh... go through thousands of videos to transcribe in the wiki database video library i have been working on. sorry if i sound like i am being a bad guy im not. i am new to using redit. please down down vote me guyes.


sirrush7

Sounds like you have pages already typed, that needs to be transcribed into the video? If I am understanding this correctly? Thousands of videos sounds exactly like what this tool could be great for!


SnooMarzipans1345

Sorry for the confusion, sir, Miss, Mrs. I was thinking pages of Microsoft onenote, which I have been using to create databases of content, videos in particular hare richer and denser in content at times which I need help to exact that content out of the videos with its context intacted then insert that output into an another input into a chain of other I/O later. But I can concern with is data scientist- kind of field of work where the person is get the data formatted correctly, I need mine data formatted a few different ways. Data scientist- I am not professionally trained, but I have been working on world(UN,WHO,homesteading and more) problems of various kinds. So I need a professional ghostwriter, and editor, and a project planner, a project mangers, and transscrbier. I have been the researcher all these years. I need someone to organize the mess of my research , and out of hand organizational structure


BelugaBilliam

Very cool project! I'll be checking this out!


rungdung

How is Whisper doing with other languages?


aschmelyun

From the small tests I’ve ran with Spanish and Portuguese audio, pretty well actually


SunStarved_Cassandra

Is there a full list of languages it's capable of working in somewhere?


trashcluster

On the openai/whisper github repo But basically all of them are supported


Ephoras

tested it with german physics education videos... worked like a charm :) Just don't pick a language while running the tool. There seems to be a bug at the moment, but whisper will figure it out :)


kiliankoe

This is fantastic! I just recently had the need for something like this and just threw together a few scripts with a very similar workflow. Creating audio tracks with ffmpeg, subtitling those through Whisper and then translating them against the DeepL API. But of course it was nowhere even close to as polished as this, awesome work! If I could with for a feature it would be DeepL translation integration. Then it would check all of my boxes \o/


aschmelyun

I'm sure it wouldn't be too difficult to add in support and a conditional to use that API if you want DeepL instead of OpenAI. Feel free to open up an issue in the repo and I'll work on it when I can!


daYMAN007

Any chance of this having a config flag to use local whisper installation? (preferably whisper.cpp or faster whisper)


[deleted]

[удалено]


s-maerken

Have a look at [this repo](https://github.com/m1guelpf/auto-subtitle) , it generates subtitles with whisper locally


aschmelyun

That's definitely a big goal. Add in an issue with the request to the repo and I'll work on it as soon as I can! I'll add it to my to-do list just in case so it doesn't get lost.


BooleanTriplets

I just got this up and running in CasaOS in a few seconds (so good work there on user friendliness) and I am excited to try this out on a few old home videos. In the spirit of your project, I used ChatGPT to quickly generate a favicon for Subvert so that I could have an icon on my CasaOS dashboard.First I had one made in black and white: `` But I decided I wanted something I but more colorful, so I asked for it to match the gradient from the UI and give that to the background: `` That was a bit overstated for my tastes, so I decided to swap the color palettes between the text and the background (I could have done this myself, since we have the "gradient" id, but in the spirit of the project I had GPT do this revision as well): `` Finally, I asked it why it had chosen Arial as its font family, and it had a very well thought out and helpful response to that: >I chose Arial as the first example because it is a widely recognized and commonly used sans-serif font. Arial is available on most devices and operating systems, making it a safe choice for ensuring compatibility across different platforms. It's known for its legibility and neutrality, which is why it's often used as a default or fallback font in many designs. > >There are many font families to choose from, and the best choice depends on your personal preference and the style you are going for. Here are a few popular font families that you might consider:Roboto (sans-serif)Open Sans (sans-serif)Lato (sans-serif)Montserrat (sans-serif)Raleway (sans-serif)Merriweather (serif)Playfair Display (serif)Oswald (sans-serif)Nunito (sans-serif)Poppins (sans-serif)To use one of these fonts, you can either reference a local copy of the font file or link to an online resource like Google Fonts. For example, if you want to use the Roboto font, you can add a link to Google Fonts in your HTML file: > >`` > >Then, you can update the font-family attribute in the SVG code: > >`` > >Feel free to try different font families from the list above or explore other options to find the one that best suits your needs. I think I will use the last version for my icon, for now. Anyone else that wants to use the svg is obviously welcome to copy the text and save it as a .svg file.


aschmelyun

Good to hear you got it up and running pretty fast, that's awesome! Also, nice work on the favicon, I'll get that added in when I get a chance.


fishbarrel_2016

Thanks. I ran it, using this docker run -it -p 8001:8001 -e OPENAI_API_KEY=sk-q…..O aschmelyun/subvert It shows this INFO: Server running on [http://0.0.0.0:80] But when open a browser it says "The Connection was reset" I've tried localhost:80, localhost:8001, 0.0.0.0:80, 0.0.0.0:8001 and other ports in the command.


nudelholz1

You need to run `docker run -it -p 8001:80 -e OPENAI\_API\_KEY=sk-q…..O aschmelyun/subvert` Then you can access it on localhost:8001 You also need a open ai api-key


aschmelyun

Yep, this is correct. In the line, the `-p 8001:80` means that you're binding *your* port 8001, to the *container's* port 80. The only port that is available in that container is 80, so your second number always needs to be that. Hope that helps!


fishbarrel_2016

Many thanks, working now; I have an API key.


Kaziopu123

can it generate subtle from a movie?


aschmelyun

Default max upload size is 128M and the timeout for the processing is 60 minutes, but if you bypass those, there's not a reason it shouldn't! Just be aware of the cost associated with the API calls lol


helium_uplands

How good is the quality of the text?


aschmelyun

For the transcriptions? Top-notch, miles better than what YouTube usually generates on my videos. The summaries have also been pretty great. The chapters are still hit and miss, and I've tweaked the prompt a few times to try and get things solid. Occasionally it'll focus on just one section of the video instead of the whole thing, or have some wonky timestamps.


Invisible_Walrus

This is awesome!! Can it utilize Nvidia GPUs for the subtitle processing?


Khyta

OP said it uses OpenAI's Whisper API


a-fried-pOtaTO

Seems kinda weird that this project is on r/selfhosted when it requires the OpenAI Whisper API. But I did read the dev would like to implement local processing if one chooses to do so.


Invisible_Walrus

Sure, but whisper's torch package can use cuda based acceleration, but I'm not sure how to implement that myself


This_not-my_name

Cool project! Could be useful to optimize the media library. If you need further ideas on features: I'd like an option to generate only the forced subtitles (passages in movies in foreign languages), since there are already many sources for downloading full subtitles for almost everything, but forced are very rare.


Chandlarr

Awesome. How much help did you got from Chat-GPT? :D


aschmelyun

None! Copilot on the other hand... (;


TechieWasteLan

Is this inspired by ThioJoe's video to sync subtitles?


DelScipio

Does it translates one subtitle into another language?


qknemess

Very cool, will try it out.


spiltlevel

This is pretty cool, awesome work! I actually started doing something similar just last week. Can i ask how did you manage to generate chapter markers? Thats the one thing I just don't fully understand.


WitnessDifferent2159

This looks awesome. Well done!


FIDST

I am stoked to try this out. I am not seeing a docker compose, is that possible?


pauseframes

This could be absolutely clutch for technical documentation and how-to-videos. Break out sessions and the like. If you’re a technical writer or a tech blogger, teacher, this is a must. Just recording a webex session or anything would help fix a lot of “where’s the documentation” issues for many many places!!


sasukefan01234

RemindMe! 6 months


illwon

This is pretty cool! Would this be able to pull in videos from youtube? Or other online sources? Perhaps a yt-dlp plugin of sorts?


professorhummingbird

Wow very nice


ECrispy

Thank you for this! How expensive is this, i.e average time taken? I'd imagine it depends on things like codec, the network time shoukd be relatively constant as long as its broadband right? Also this might be a good place to ask. Is there a service that returns chapters for a movie rip based on the original chapters in its dvd etc?


siphoneee

Does this work with movie files too such as .mp4 and .mkv?


[deleted]

[удалено]


aschmelyun

Yep, handled by GPT-3.5. Should auto-detect the language in your video, and there's a drop-down for selecting a translated language for whatever output(s) you choose!


fishbarrel_2016

Thanks- I deleted my comment (can it translate from German?) because when I looked at Whisper it said English.


aschmelyun

Whisper does the highest accuracy with English, but it should detect and transcribe a bunch of different languages (including German). I’ll run a test on it in the morning and let you know!


[deleted]

[удалено]


RemindMeBot

I will be messaging you in 6 months on [**2023-09-29 14:42:52 UTC**](http://www.wolframalpha.com/input/?i=2023-09-29%2014:42:52%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/selfhosted/comments/125c5oo/built_this_app_to_generate_subtitles_summaries/je58x32/?context=3) [**1 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fselfhosted%2Fcomments%2F125c5oo%2Fbuilt_this_app_to_generate_subtitles_summaries%2Fje58x32%2F%5D%0A%0ARemindMe%21%202023-09-29%2014%3A42%3A52%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%20125c5oo) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


Moehrenstein

I tried to bind it to a subdomain with nginx proxy manager and a cert from there. Sadly it only shows "Subvert GitHub Thrown together in a weekend by Andrew Schmelyun" Some working collegues are predastinated to try this tool out; but they are not im my area:) (And I am not a pro; just a very motivated beginner\^\^)


squirrelhoodie

Does it deal well with longer silences or music? Last time I tried OpenAI Whisper for creating video subtitles, often they started way earlier than the actual speech if there was no speech before that, so I always had to do lots of manual adjustments.


FatalVengeance

Hey there! Great app, I'm going to deploy it in the next day or so. Is it possible to have this added to Unraid as a native install app (via the built in docker)? Thanks:)


AWES_AF

Dope man. Is vtt the only exportable format? Any seamless integrations avail or recent updates that'll allow user to burn the subs onto the video as well?


tamenqt

Tried out your software and it's pretty cool, but have a problem. Whisper AI wasn't happy with my file size, so I shrunk the MP3 down to under 25MB using FFmpeg. Then I got this weird server error message. The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. Is the file maybe just too long? Anyone else run into this? I saw also [this](https://github.com/aschmelyun/subvert/issues/24) on GitHub. Is there any fixes to this issue?


aschmelyun

That's weird, I haven't run into that response back from OpenAI yet. Did re-running the same request have the error come up every time? I'm slowly working on a more optimized version of this app which I hope to release in a couple weeks. If you'd like to help me out, would you mind if I messaged you when this new version's out to see if it fixed your issue?


Raccount_1337

Looking forward to the new version . Hopes : .srt output possibility more supported formats like mkv , mp3 etc. complete directory processing instead of only 1 file


tamenqt

Sorry for getting back to you late—I somehow missed your reply. Yes, definitely! I'm really looking forward to your update since I'm working on generating subtitles for my lectures.


bquarks

Thank you so much for your solution. I am learning German and one of the ways I am building my vocabulary is by using your application to extract the .vtt files from German children's cartoons, which I then combine with languagereactor. Now my daughter and I have a lot of fun watching cartoons together.