aschmelyun 1 year ago

Hey everyone! I built [Subvert](https://github.com/aschmelyun/subvert) over the weekend and just released the first version of it. I wanted something to automate the process of adding and translating subtitles and summaries for a video course I'm working on. Didn't feel like paying for an existing option and wanted to try out the Whisper API so I figured why not scratch my own itch? You can run the app with a single command via a self-contained Docker image. It's powered by OpenAI's Whisper and GPT-3.5 APIs, PHP (Laravel), JavaScript (Vue), Sqlite, and FFMpeg. Would love any feedback, and hope you enjoy it! [github.com/aschmelyun/subvert](https://github.com/aschmelyun/subvert)

[deleted] 1 year ago

Is the OpenAI API access free?

saintshing 1 year ago

I havent tried the openai api as it is not available where I live(Hong Kong). I recently read an article(author works at huggingface) comparing the performance and cost of their text embedding service compared to free open source models. I was shocked free models can achieve pretty much the same or better with much lower cost. https://medium.com/@nils_reimers/openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9 from the conclusion >The text similarity models are weaker than e.g. Universal Sentence Encoder from 2018 and much weaker than text embedding models from 2021. They are even weaker than the all-MiniLM-L6-v1 model, which is so small & efficient that it can run in your browser. >The text-search models perform much stronger, achieving good results. But they are just on-par with open models like SPLADEv2 or multi-qa-mpnet-base-dot-v1. >The biggest downside for the OpenAI embeddings endpoint is the high costs (about 8,000–600,000 times more expensive than open models on your infrastructure), the high dimensionality of up to 12288 dimensions (making downstream applications slow), and the extreme latency when computing embeddings. This hinders the actual usage of the embeddings for any search applications. disclaimer: I am just learning ML, I haven't personally verified their results and I am not sure if the license of those open source models may limit their commercial use

Chreutz 1 year ago

You pay per token (0.002 $ / 1000 tokens). A token is on average 0.75 words (some words are multiple tokens).

madiele 1 year ago

That is for the chat api Whisper costs 6 cents for 10 minutes

SnooMarzipans1345 1 year ago

Is the website down? I cannot connect to it. [https://subvert.dev/](https://subvert.dev/) "ERR\_CONNECTION\_TIMED\_OUT"

hushrom 1 year ago

Hey there, I'm going to start creating my own PHP Laravel web application, should I use it's built in authentication solution or create one from scratch? Also did you use static analysis like PHPStan for your app?

leonguyen52 1 year ago

I cannot make it work with cloudflare zerotrust tunnels, it worked only http and port only but not ssl 🥹 any idea to solve it

[deleted] 1 year ago

[удалено]

aschmelyun 1 year ago

Goal is to get it working with some of the llama/alpaca offline proof of concepts, fingers crossed!

[deleted] 1 year ago

[удалено]

cdemi 1 year ago

Whisper yes, but GPT 3.5 no

SnooMarzipans1345 1 year ago

following this thread.

sirrush7 1 year ago

I'll try this out shortly, could be quite handy for wife's work where she waits for an ancient terrible low powered laptop to generate chapters in videos, and she has to manually transcribe everything herself.... Which can be hard with specialized terminology, accents and dialects etc... This seems like it could be a dream! Since it uses ffmpeg, can it utilize a GPU to speed things up or do multiple concurrently?

aschmelyun 1 year ago

I will say, using OpenAI's Whisper API to do the translations has been insane. My videos are programming tutorials and contain a lot of tech jargon, usually auto-generated subtitles like those on YouTube are pretty bad at picking that stuff up, but I've had no problem with this grabbing those specialized terms. I'm not 100% sure since it's being utilized through a PHP library. To be fair though, the only thing it's doing is extracting the audio, so the gains made by running through the GPU might be limited...

sirrush7 1 year ago

Oh I see, so it doesn't really need to chew through the entire video file the way I was thinking... Very neat. Well I think if you can get a version that uses a self-hosted ai library of some type, as well as the online version, this will be fantastic. Some of the video files I have a use case for are anywhere from like 100mb to 3gb though!

Chreutz 1 year ago

If you collapse the audio track to mono and use AAC with a low, variable bitrate, speech should still be plenty understandable (transcribable?), and you can cram quite a bit of time into the 25 MiB limit of OpenAI Whisper.

sirrush7 1 year ago

Oh now I get it... Thanks! So it's stripping the audio first... I really need to try this out, seems great then!

Chreutz 1 year ago

The tool OP made actually does the audio stripping already. But the Whisper API is limited to an audio file size, not length (although you pay according to the length), so optimizing for audio file size can make it less times you have to run the app.

SnooMarzipans1345 1 year ago

Does your wife want a side job using this tech as a proof that it works? 2 birds one stone. ;) I have hundreds of pages that need to transcribe of videos and translated to about 2 to 5 other languages each video. i surely dont want to \*\*sigh... go through thousands of videos to transcribe in the wiki database video library i have been working on. sorry if i sound like i am being a bad guy im not. i am new to using redit. please down down vote me guyes.

sirrush7 1 year ago

Sounds like you have pages already typed, that needs to be transcribed into the video? If I am understanding this correctly? Thousands of videos sounds exactly like what this tool could be great for!

SnooMarzipans1345 1 year ago

Sorry for the confusion, sir, Miss, Mrs. I was thinking pages of Microsoft onenote, which I have been using to create databases of content, videos in particular hare richer and denser in content at times which I need help to exact that content out of the videos with its context intacted then insert that output into an another input into a chain of other I/O later. But I can concern with is data scientist- kind of field of work where the person is get the data formatted correctly, I need mine data formatted a few different ways. Data scientist- I am not professionally trained, but I have been working on world(UN,WHO,homesteading and more) problems of various kinds. So I need a professional ghostwriter, and editor, and a project planner, a project mangers, and transscrbier. I have been the researcher all these years. I need someone to organize the mess of my research , and out of hand organizational structure

BelugaBilliam 1 year ago

Very cool project! I'll be checking this out!

rungdung 1 year ago

How is Whisper doing with other languages?

aschmelyun 1 year ago

From the small tests I’ve ran with Spanish and Portuguese audio, pretty well actually

SunStarved_Cassandra 1 year ago

Is there a full list of languages it's capable of working in somewhere?

trashcluster 1 year ago

On the openai/whisper github repo But basically all of them are supported

Ephoras 1 year ago

tested it with german physics education videos... worked like a charm :) Just don't pick a language while running the tool. There seems to be a bug at the moment, but whisper will figure it out :)

kiliankoe 1 year ago

This is fantastic! I just recently had the need for something like this and just threw together a few scripts with a very similar workflow. Creating audio tracks with ffmpeg, subtitling those through Whisper and then translating them against the DeepL API. But of course it was nowhere even close to as polished as this, awesome work! If I could with for a feature it would be DeepL translation integration. Then it would check all of my boxes \o/

aschmelyun 1 year ago

I'm sure it wouldn't be too difficult to add in support and a conditional to use that API if you want DeepL instead of OpenAI. Feel free to open up an issue in the repo and I'll work on it when I can!

daYMAN007 1 year ago

Any chance of this having a config flag to use local whisper installation? (preferably whisper.cpp or faster whisper)

[deleted] 1 year ago

[удалено]

s-maerken 1 year ago

Have a look at [this repo](https://github.com/m1guelpf/auto-subtitle) , it generates subtitles with whisper locally

aschmelyun 1 year ago

That's definitely a big goal. Add in an issue with the request to the repo and I'll work on it as soon as I can! I'll add it to my to-do list just in case so it doesn't get lost.

BooleanTriplets 1 year ago

I just got this up and running in CasaOS in a few seconds (so good work there on user friendliness) and I am excited to try this out on a few old home videos. In the spirit of your project, I used ChatGPT to quickly generate a favicon for Subvert so that I could have an icon on my CasaOS dashboard.First I had one made in black and white: `` But I decided I wanted something I but more colorful, so I asked for it to match the gradient from the UI and give that to the background: `` That was a bit overstated for my tastes, so I decided to swap the color palettes between the text and the background (I could have done this myself, since we have the "gradient" id, but in the spirit of the project I had GPT do this revision as well): `` Finally, I asked it why it had chosen Arial as its font family, and it had a very well thought out and helpful response to that: >I chose Arial as the first example because it is a widely recognized and commonly used sans-serif font. Arial is available on most devices and operating systems, making it a safe choice for ensuring compatibility across different platforms. It's known for its legibility and neutrality, which is why it's often used as a default or fallback font in many designs. > >There are many font families to choose from, and the best choice depends on your personal preference and the style you are going for. Here are a few popular font families that you might consider:Roboto (sans-serif)Open Sans (sans-serif)Lato (sans-serif)Montserrat (sans-serif)Raleway (sans-serif)Merriweather (serif)Playfair Display (serif)Oswald (sans-serif)Nunito (sans-serif)Poppins (sans-serif)To use one of these fonts, you can either reference a local copy of the font file or link to an online resource like Google Fonts. For example, if you want to use the Roboto font, you can add a link to Google Fonts in your HTML file: > >`` > >Then, you can update the font-family attribute in the SVG code: > >`` > >Feel free to try different font families from the list above or explore other options to find the one that best suits your needs. I think I will use the last version for my icon, for now. Anyone else that wants to use the svg is obviously welcome to copy the text and save it as a .svg file.

aschmelyun 1 year ago

Good to hear you got it up and running pretty fast, that's awesome! Also, nice work on the favicon, I'll get that added in when I get a chance.

fishbarrel_2016 1 year ago

Thanks. I ran it, using this docker run -it -p 8001:8001 -e OPENAI_API_KEY=sk-q…..O aschmelyun/subvert It shows this INFO: Server running on [http://0.0.0.0:80] But when open a browser it says "The Connection was reset" I've tried localhost:80, localhost:8001, 0.0.0.0:80, 0.0.0.0:8001 and other ports in the command.

nudelholz1 1 year ago

You need to run `docker run -it -p 8001:80 -e OPENAI\_API\_KEY=sk-q…..O aschmelyun/subvert` Then you can access it on localhost:8001 You also need a open ai api-key

aschmelyun 1 year ago

Yep, this is correct. In the line, the `-p 8001:80` means that you're binding *your* port 8001, to the *container's* port 80. The only port that is available in that container is 80, so your second number always needs to be that. Hope that helps!

fishbarrel_2016 1 year ago

Many thanks, working now; I have an API key.

Kaziopu123 1 year ago

can it generate subtle from a movie?

aschmelyun 1 year ago

Default max upload size is 128M and the timeout for the processing is 60 minutes, but if you bypass those, there's not a reason it shouldn't! Just be aware of the cost associated with the API calls lol

helium_uplands 1 year ago

How good is the quality of the text?

aschmelyun 1 year ago

For the transcriptions? Top-notch, miles better than what YouTube usually generates on my videos. The summaries have also been pretty great. The chapters are still hit and miss, and I've tweaked the prompt a few times to try and get things solid. Occasionally it'll focus on just one section of the video instead of the whole thing, or have some wonky timestamps.

Invisible_Walrus 1 year ago

This is awesome!! Can it utilize Nvidia GPUs for the subtitle processing?

Khyta 1 year ago

OP said it uses OpenAI's Whisper API

a-fried-pOtaTO 1 month ago

Seems kinda weird that this project is on r/selfhosted when it requires the OpenAI Whisper API. But I did read the dev would like to implement local processing if one chooses to do so.

Invisible_Walrus 1 year ago

Sure, but whisper's torch package can use cuda based acceleration, but I'm not sure how to implement that myself

This_not-my_name 1 year ago

Cool project! Could be useful to optimize the media library. If you need further ideas on features: I'd like an option to generate only the forced subtitles (passages in movies in foreign languages), since there are already many sources for downloading full subtitles for almost everything, but forced are very rare.

Chandlarr 1 year ago

Awesome. How much help did you got from Chat-GPT? :D

aschmelyun 1 year ago

None! Copilot on the other hand... (;

TechieWasteLan 1 year ago

Is this inspired by ThioJoe's video to sync subtitles?

DelScipio 1 year ago

Does it translates one subtitle into another language?

qknemess 1 year ago

Very cool, will try it out.

spiltlevel 1 year ago

This is pretty cool, awesome work! I actually started doing something similar just last week. Can i ask how did you manage to generate chapter markers? Thats the one thing I just don't fully understand.

WitnessDifferent2159 1 year ago

This looks awesome. Well done!

FIDST 1 year ago

I am stoked to try this out. I am not seeing a docker compose, is that possible?

pauseframes 1 year ago

This could be absolutely clutch for technical documentation and how-to-videos. Break out sessions and the like. If you’re a technical writer or a tech blogger, teacher, this is a must. Just recording a webex session or anything would help fix a lot of “where’s the documentation” issues for many many places!!

sasukefan01234 1 year ago

RemindMe! 6 months

illwon 1 year ago

This is pretty cool! Would this be able to pull in videos from youtube? Or other online sources? Perhaps a yt-dlp plugin of sorts?

professorhummingbird 1 year ago

Wow very nice

ECrispy 1 year ago

Thank you for this! How expensive is this, i.e average time taken? I'd imagine it depends on things like codec, the network time shoukd be relatively constant as long as its broadband right? Also this might be a good place to ask. Is there a service that returns chapters for a movie rip based on the original chapters in its dvd etc?

siphoneee 1 year ago

Does this work with movie files too such as .mp4 and .mkv?

[deleted] 1 year ago

[удалено]

aschmelyun 1 year ago

Yep, handled by GPT-3.5. Should auto-detect the language in your video, and there's a drop-down for selecting a translated language for whatever output(s) you choose!

fishbarrel_2016 1 year ago

Thanks- I deleted my comment (can it translate from German?) because when I looked at Whisper it said English.

aschmelyun 1 year ago

Whisper does the highest accuracy with English, but it should detect and transcribe a bunch of different languages (including German). I’ll run a test on it in the morning and let you know!

[deleted] 1 year ago

[удалено]

RemindMeBot 1 year ago

I will be messaging you in 6 months on [**2023-09-29 14:42:52 UTC**](http://www.wolframalpha.com/input/?i=2023-09-29%2014:42:52%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/selfhosted/comments/125c5oo/built_this_app_to_generate_subtitles_summaries/je58x32/?context=3) [**1 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fselfhosted%2Fcomments%2F125c5oo%2Fbuilt_this_app_to_generate_subtitles_summaries%2Fje58x32%2F%5D%0A%0ARemindMe%21%202023-09-29%2014%3A42%3A52%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%20125c5oo) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|

Moehrenstein 1 year ago

I tried to bind it to a subdomain with nginx proxy manager and a cert from there. Sadly it only shows "Subvert GitHub Thrown together in a weekend by Andrew Schmelyun" Some working collegues are predastinated to try this tool out; but they are not im my area:) (And I am not a pro; just a very motivated beginner\^\^)

squirrelhoodie 1 year ago

Does it deal well with longer silences or music? Last time I tried OpenAI Whisper for creating video subtitles, often they started way earlier than the actual speech if there was no speech before that, so I always had to do lots of manual adjustments.

FatalVengeance 1 year ago

Hey there! Great app, I'm going to deploy it in the next day or so. Is it possible to have this added to Unraid as a native install app (via the built in docker)? Thanks:)

AWES_AF 9 months ago

Dope man. Is vtt the only exportable format? Any seamless integrations avail or recent updates that'll allow user to burn the subs onto the video as well?

tamenqt 5 months ago

Tried out your software and it's pretty cool, but have a problem. Whisper AI wasn't happy with my file size, so I shrunk the MP3 down to under 25MB using FFmpeg. Then I got this weird server error message. The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. Is the file maybe just too long? Anyone else run into this? I saw also [this](https://github.com/aschmelyun/subvert/issues/24) on GitHub. Is there any fixes to this issue?

aschmelyun 5 months ago

That's weird, I haven't run into that response back from OpenAI yet. Did re-running the same request have the error come up every time? I'm slowly working on a more optimized version of this app which I hope to release in a couple weeks. If you'd like to help me out, would you mind if I messaged you when this new version's out to see if it fixed your issue?

Raccount_1337 5 months ago

Looking forward to the new version . Hopes : .srt output possibility more supported formats like mkv , mp3 etc. complete directory processing instead of only 1 file

tamenqt 5 months ago

Sorry for getting back to you late—I somehow missed your reply. Yes, definitely! I'm really looking forward to your update since I'm working on generating subtitles for my lectures.

bquarks 2 months ago

Thank you so much for your solution. I am learning German and one of the ways I am building my vocabulary is by using your application to extract the .vtt files from German children's cartoons, which I then combine with languagereactor. Now my daughter and I have a lot of fun watching cartoons together.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe