GithubHelp home page GithubHelp logo

Comments (9)

jianfch avatar jianfch commented on August 20, 2024 1

You can try to lower the --refine_ts_num (default: 100). Or just disable refinement with --refine_ts_num 0.

from stable-ts.

jianfch avatar jianfch commented on August 20, 2024 1

If you still see a spike even with --suppress_silence false. Then the spike is likely from whisper.log_mel_spectrogram which the default part of whisper loading the audio. Passing a 19hr long array into whisper.log_mel_spectrogram causes 23GB spike on my end. I suggest splitting that audio track down to shorter tracks.

import whisper
mel = whisper.log_mel_spectrogram('audio.mp3')

from stable-ts.

kanjieater avatar kanjieater commented on August 20, 2024

You can try to lower the --refine_ts_num (default: 100). Or just disable refinement with --refine_ts_num 0.

Thanks - I'll give it a try. Could you explain more about how that parameter affects the model so I can tune it accurately? If I disable it with 0, what will be the impact?

from stable-ts.

jianfch avatar jianfch commented on August 20, 2024

So it seems refine_ts_num doesn't have a significant effect on memory usage. But there does appear to be a surge in memory usage when loading the model with default whisper function. This surge elevates the baseline memory usage. This surge should be fixed in 0b42339. added --sync_empty which can also reduce memory usage during inference.

from stable-ts.

kanjieater avatar kanjieater commented on August 20, 2024

Thank you for the quick response. I tried your suggestion and latest version. Unfortunately, there was no change, as the memory still filled up quickly

8737 Killed stable-ts "$FOLDER/audio.mp3" --language Japanese --output_dir "$FOLDER/" --model large-v2 -o "$FOLDER/captions.ass" --sync_empty
image

The memory starts lower for a time, then around that peak it crashes, it's not an immediate crash but it is within a 3 minutes.

from stable-ts.

jianfch avatar jianfch commented on August 20, 2024

My apologies, I misread the issue. I was assuming we were talking about GPU memory. The previous solution only works for GPU memory.
It is expected that stable-ts has higher CPU memory usage than official whisper and other implementations because it stores significantly more data (in RAM) for stabilizing the timestamps. The spike and crash you're seeing might be due to the stable-ts trying to generate a timestamp mask for your the entire audio track at once. So this spike is likely before inference (--verbose should tell you if there is not text output to the console before it crashes). If this is the case, --suppress_silence False should drastically lower the RAM usage.

from stable-ts.

kanjieater avatar kanjieater commented on August 20, 2024

I didn't see any output when running with the --verbose command.
19625 Killed stable-ts "$FOLDER/audio.mp3" --language Japanese --output_dir "$FOLDER/" --model large-v2 -o "$FOLDER/captions.ass" --sync_empty --verbose

I will try removing the sync_empty flag, and running again to see if verbose shows anything (accidentally left it in. I'll try running with the --suppress_silence False as well.

Update:
Verbose didn't output anything unfortunately
20378 Killed stable-ts "$FOLDER/audio.mp3" --language Japanese --output_dir "$FOLDER/" --model large-v2 -o "$FOLDER/captions.ass" --verbose

I also ran it with suppress_silence, and got the same result 22053 Killed stable-ts "$FOLDER/audio.mp3" --language Japanese --output_dir "$FOLDER/" --model large-v2 -o "$FOLDER/captions.ass" --suppress_silence false --overwrite

Memory usage and CPU usage spike at the same time when the Out of Memory error occurs.

Just to be clear, my specs are:
i9-13900ks
4070TI
32GB DDR5 ram

All of this is stable and working well. It runs inside of WSL2 on Win11 (which has access to CPU, GPU and RAM - works fine for whisper and whisperx as far as resources). I've allocated additional memory as well:
image

Would you like me to send you the 1GB file somewhere so you could see if you can reproduce as well? I can run it successfully for smaller files.

from stable-ts.

kanjieater avatar kanjieater commented on August 20, 2024

I just started a run on a 6 hour wav file that is 700mb. The progress bar started very quickly. The progress bar never showed for my 19hr 1GB file and always crashed.

Update: The 6 hour wav completed w/o issue.

from stable-ts.

kanjieater avatar kanjieater commented on August 20, 2024

You are correct. The input file is too large when Whisper starts, so I either need more RAM or for Whisper to fix it upstream. Thank you for your help with this.

from stable-ts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.