GithubHelp home page GithubHelp logo

tsurumeso / vocal-remover Goto Github PK

View Code? Open in Web Editor NEW
1.5K 40.0 215.0 160 KB

Vocal Remover using Deep Neural Networks

License: MIT License

Python 100.00%
deep-learning spectrogram audio pytorch segmentation vocal-separation

vocal-remover's Introduction

Hi there πŸ‘‹

tsurumeso's GitHub stats

vocal-remover's People

Contributors

averwhy avatar dannyball710 avatar dependanz avatar moehr1z avatar mohamadjaber1 avatar mrzomka avatar rogermiranda1000 avatar tsurumeso avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vocal-remover's Issues

Unrealistic amounts of RAM required for training

Hello,
I'm trying to train on a dataset with thousands of .wav files that are about 4 minutes long, but I run out of RAM with anything larger than 75 files with 16GB of RAM:

Traceback (most recent call last):
  File "train.py", line 98, in <module>
    hop_length=args.hop_length)
  File "C:\vocal-remover\lib\dataset.py", line 33, in create_dataset
    (len_dataset, 2, hop_length, cropsize), dtype=np.float32)
MemoryError

So theorically I would still run out of RAM even if I had 128GB and tried to train on more than 600 files (128 / 16 = 8 , 8 * 75 = 600) , I'm trying to achieve similar results to http://krisp.ai , but that needs thousands of files, not just a few hundred.
Is it possible to increase the number of files or make the training scalable? (ie. split it on 75 files chunks and resume training on the next chunk automatically)

Discord!

I have an audio splitting discord, it’s based on Spleeter but it’s for really any kind of audio splitting tools!
Here’s the link: https://discord.gg/P7FhQFH

feature

Tsurumeso, Can you add in this program a feature to separate the woman’s voice from the man’s voice

Missing models/baseline.pth

Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
loading model... Traceback (most recent call last):
File "inference.py", line 104, in
main()
File "inference.py", line 31, in main
model.load_state_dict(torch.load(args.model, map_location=device))
File "/media/luissiqueira/storage/workspace/roove/vocal-remover/venv/lib/python3.6/site-packages/torch/serialization.py", line 584, in load
with _open_file_like(f, 'rb') as opened_file:
File "/media/luissiqueira/storage/workspace/roove/vocal-remover/venv/lib/python3.6/site-packages/torch/serialization.py", line 234, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/media/luissiqueira/storage/workspace/roove/vocal-remover/venv/lib/python3.6/site-packages/torch/serialization.py", line 215, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'models/baseline.pth'

Question Regarding Training Process

Hello again!

I have a question about the training process. I'm currently training a model with 170 pairs and I'm currently on my 29th epoch after about 16 hours -

_# epoch 29

  • inner epoch 0
    • training loss = 1.303589, validation loss = 1.348374
    • best validation loss
  • inner epoch 1
    • training loss = 1.296407, validation loss = 1.359492
  • inner epoch 2
    • training loss = 1.291624, validation loss = 1.358964
  • inner epoch 3
    • training loss = 1.288329, validation loss = 1.359871_

My understanding is the training loss & validation loss should be close to each other, with the validation loss always needing to be slightly higher than the training loss.

Here are my questions -

  1. What is the ideal training loss & validation loss I should be looking out for?
  2. I noticed new the "model_iter.pth" files spawn after most epochs complete training. Do I need to do anything to the last the "model_iter.pth" file when training runs to the last set epoch (in my case 100 epochs)?
  3. I noticed all "model_iter.pth" files are the same size, 37,830KB's, but your newest model is 53,130KB's. Does a larger model size mean better training, or will it increase with a larger dataset?
  4. How do I increase the size of the model?
  5. Sometimes I don't get a "model_iter.pth" model when an epoch finishes, is there a way I can set it so that I do after every epoch?

Thank you so much in advance!

question about pre-trained data

Could you tell me about the trained data?
How much time and what data set was used?
and I want to know what options used for trained data.
thank you.

remove vocal from song

I tried some songs,The singer's voice is not removed from these songs
make your vocal remover remove all vocals from all songs

Jetson nano implementation

I tried to run the inferece on jetson nano platform, for a 28 seconds music, it took 35 seconds to separate the song into vocal and music. the network seems too big to suit for real-time processing on lightweight platform. Is there any effort on how to reduce the network size?

Wanted to Let Everyone Know a GUI Was Created for This Vocal Remover!

This isn't an issue, but I wanted to let you and the community here know. This GUI is 100% based on your vocal remover, options included. This was a joint project between another coder and myself. Feel free to use, edit, and implement as you wish! Great job on this AI!

I also included 2 additional models that I trained myself. One trained on 700 pairs and another trained on trained data. Both of them came out GREAT!

You can access it via the following:

  • Main page here
  • Release page with models here

Issues Training on RTX 2080 Ti 11GB

I've upgraded my PC recently and I now have the Nvidia Geoforce RTX 2080 Ti with 11GB's of V-RAM. I'm able to run conversions just fine with all the correct cuda drivers installed, but I'm having a lot of trouble training. Once the data loads and it starts on the first Epoch, it just stops after a few seconds. No errors, nothing. It just closes. I tried debugging the script in Visual Studio Code to look for errors, but even then nothing, no errors. I verified my configuration and verified that my other PC with a GTX 1060 is able to run the exact same script with no issues.

I verified there was nothing wrong with the GPU as it's able to run high end games with absolutely no issues. Have you experienced this? Is there something I need to look out for?

System specs -

CPU: i9 9900K
RAM: 48 GB
Motherboard: Z390 ASAP Phantom Gaming 6

Make voсals removal more aggressive

Comparing this solution with a Spleeter, I noticed that in most cases the Spleeter keeps less vocals, but greatly degrades the quality of the accompaniment. I like the sound quality of this vocal remover better, but unfortunately it doesn’t remove all the vocals.

I would like to try to find a compromise between cleanliness and quality of accompaniment. Is it possible to adjust?

RuntimeError: CUDA error: invalid device ordinal

I just installed everything and tried the script. I have 2 GPUs, the internal one is 1GB and NVIDIA 2GB. When I use command gpu 0 it tells me you ran out of memory but when I use gpu 1 it says the following

c:\vocal-remover>inference.py --input 1.wav --gpu 1
loading model... Traceback (most recent call last):
File "C:\vocal-remover\inference.py", line 104, in
main()
File "C:\vocal-remover\inference.py", line 34, in main
model.to(device)
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 443, in to
return self._apply(convert)
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 203, in _apply
module._apply(fn)
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 203, in _apply
module._apply(fn)
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 203, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 225, in _apply
param_applied = fn(param)
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 441, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal

*PS. I am a noob with 0 knowledge and would love a simple run down on how to fix this. Thank you.

ZeroDivisionError: division by zero

(env) C:\Users\...\vocal-remover>python train.py -i "dataset_test/instrumentals" -m "dataset_test/mixtures" -M
03_test_mix.wav 03_test_inst.wav
01_test_mix.wav 01_test_inst.wav
04_test_mix.wav 04_test_inst.wav
05_test_mix.wav 05_test_inst.wav
02_test_mix.wav 02_test_inst.wav
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [01:52<00:00, 22.41s/it]
0it [00:00, ?it/s]
# epoch 0
  * inner epoch 0
Traceback (most recent call last):
  File "train.py", line 133, in <module>
    train_loss = sum_loss / len(X_train)
ZeroDivisionError: division by zero

I get this error when I try to train on a dataset, can you please help me? (I have also tried with -g 0)

Using fp16 instead of fp32

Is it possible to train with half precision and how to do it? It turned out that versions 3.0+ is not enough 11 GB VRAM.

Data augment

How do I perform data augmentation, also is it to improve a model?

How can i fix this?? I cant run the requirements.txt file.

ERROR: Could not find a version that satisfies the requirement torch>=1.3.0 (from -r requirements.txt (line 3)) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.3.0 (from -r requirements.txt (line 3))

RAM issue

Current error:

RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 6.00 GiB total capacity; 4.29 GiB already allocated; 299.14 MiB free; 52.89 MiB cached)

I have the following PC specs -

Operating System: Windows 10
Processor: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz 2.80 GHz
Graphics card: Geforce GTX 1060 with 6GB's of V-RAM
System RAM: 32GB

My issue is I can't even start training with my GPU due to always getting "out of RAM errors" even when I have enough allocated. I'm able to train just fine with my CPU, but I would really like to use my GPU.

My questions:

  1. Can I train a model using this A.I. with my current set-up?
  2. If not, what is the minimum GPU & V-RAM requirement to train with a dataset of at least 75-150 pairs?

Thank you so much in advance!

error

Traceback (most recent call last):
File "inference.py", line 11, in
from lib import dataset
File "C:\Users\DESKTOP\Desktop\vocal-remover\lib\dataset.py", line 10, in
class VocalRemoverValidationSet(torch.utils.data.Dataset):
AttributeError: module 'torch.utils' has no attribute 'data'

Add Layers to the Model for Longer Training Times

Hello!

Would it be possible for you to make an update to the AI that adds layers to the model? It seems that it hit's a limit after about 4 days of training (the validation loss/training loss stagnates after about 4 days) (this is on a 650 pair dataset). I want to be able to train over the course of a longer time period to achieve better training/validation losses before a model reaches it's limit. Would this be possible?

Thank you in advance for your help!

Can't use pre trained model

I tried this another model from Anjok07 https://github.com/Anjok07/ultimatevocalremovergui
and when Im gonna use the model this comes

loading model... Traceback (most recent call last):
File "inference.py", line 119, in
main()
File "inference.py", line 65, in main
model.load_state_dict(torch.load(args.pretrained_model, map_location=device))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for CascadedASPPNet:
Missing key(s) in state_dict: "stg1_low_band_net.enc1.conv1.conv.0.weight", "stg1_low_band_net.enc1.conv1.conv.1.weight", "stg1_low_band_net.enc1.conv1.conv.1.bias", "stg1_low_band_net.enc1.conv1.conv.1.running_mean", "stg1_low_band_net.enc1.conv1.conv.1.running_var", "stg1_low_band_net.enc1.conv2.conv.0.weight", "stg1_low_band_net.enc1.conv2.conv.1.weight", "stg1_low_band_net.enc1.conv2.conv.1.bias", "stg1_low_band_net.enc1.conv2.conv.1.running_mean", "stg1_low_band_net.enc1.conv2.conv.1.running_var", "stg1_low_band_net.enc2.conv1.conv.0.weight", "stg1_low_band_net.enc2.conv1.conv.1.weight", "stg1_low_band_net.enc2.conv1.conv.1.bias", "stg1_low_band_net.enc2.conv1.conv.1.running_mean", "stg1_low_band_net.enc2.conv1.conv.1.running_var", "stg1_low_band_net.enc2.conv2.conv.0.weight", "stg1_low_band_net.enc2.conv2.conv.1.weight", "stg1_low_band_net.enc2.conv2.conv.1.bias", "stg1_low_band_net.enc2.conv2.conv.1.running_mean", "stg1_low_band_net.enc2.conv2.conv.1.running_var", "stg1_low_band_net.enc3.conv1.conv.0.weight", "stg1_low_band_net.enc3.conv1.conv.1.weight", "stg1_low_band_net.enc3.conv1.conv.1.bias", "stg1_low_band_net.enc3.conv1.conv.1.running_mean", "stg1_low_band_net.enc3.conv1.conv.1.running_var", "stg1_low_band_net.enc3.conv2.conv.0.weight", "stg1_low_band_net.enc3.conv2.conv.1.weight", "stg1_low_band_net.enc3.conv2.conv.1.bias", "stg1_low_band_net.enc3.conv2.conv.1.running_mean", "stg1_low_band_net.enc3.conv2.conv.1.running_var", "stg1_low_band_net.enc4.conv1.conv.0.weight", "stg1_low_band_net.enc4.conv1.conv.1.weight", "stg1_low_band_net.enc4.conv1.conv.1.bias", "stg1_low_band_net.enc4.conv1.conv.1.running_mean", "stg1_low_band_net.enc4.conv1.conv.1.running_var", "stg1_low_band_net.enc4.conv2.conv.0.weight", "stg1_low_band_net.enc4.conv2.conv.1.weight", "stg1_low_band_net.enc4.conv2.conv.1.bias", "stg1_low_band_net.enc4.conv2.conv.1.running_mean", "stg1_low_band_net.enc4.conv2.conv.1.running_var", "stg1_low_band_net.aspp.conv1.1.conv.0.weight", "stg1_low_band_net.aspp.conv1.1.conv.1.weight", "stg1_low_band_net.aspp.conv1.1.conv.1.bias", "stg1_low_band_net.aspp.conv1.1.conv.1.running_mean", "stg1_low_band_net.aspp.conv1.1.conv.1.running_var", "stg1_low_band_net.aspp.conv2.conv.0.weight", "stg1_low_band_net.aspp.conv2.conv.1.weight", "stg1_low_band_net.aspp.conv2.conv.1.bias", "stg1_low_band_net.aspp.conv2.conv.1.running_mean", "stg1_low_band_net.aspp.conv2.conv.1.running_var", "stg1_low_band_net.aspp.conv3.conv.0.weight", "stg1_low_band_net.aspp.conv3.conv.1.weight", "stg1_low_band_net.aspp.conv3.conv.2.weight", "stg1_low_band_net.aspp.conv3.conv.2.bias", "stg1_low_band_net.aspp.conv3.conv.2.running_mean", "stg1_low_band_net.aspp.conv3.conv.2.running_var", "stg1_low_band_net.aspp.conv4.conv.0.weight", "stg1_low_band_net.aspp.conv4.conv.1.weight", "stg1_low_band_net.aspp.conv4.conv.2.weight", "stg1_low_band_net.aspp.conv4.conv.2.bias", "stg1_low_band_net.aspp.conv4.conv.2.running_mean", "stg1_low_band_net.aspp.conv4.conv.2.running_var", "stg1_low_band_net.aspp.conv5.conv.0.weight", "stg1_low_band_net.aspp.conv5.conv.1.weight", "stg1_low_band_net.aspp.conv5.conv.2.weight", "stg1_low_band_net.aspp.conv5.conv.2.bias", "stg1_low_band_net.aspp.conv5.conv.2.running_mean", "stg1_low_band_net.aspp.conv5.conv.2.running_var", "stg1_low_band_net.aspp.bottleneck.0.conv.0.weight", "stg1_low_band_net.aspp.bottleneck.0.conv.1.weight", "stg1_low_band_net.aspp.bottleneck.0.conv.1.bias", "stg1_low_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg1_low_band_net.aspp.bottleneck.0.conv.1.running_var", "stg1_low_band_net.dec4.conv.conv.0.weight", "stg1_low_band_net.dec4.conv.conv.1.weight", "stg1_low_band_net.dec4.conv.conv.1.bias", "stg1_low_band_net.dec4.conv.conv.1.running_mean", "stg1_low_band_net.dec4.conv.conv.1.running_var", "stg1_low_band_net.dec3.conv.conv.0.weight", "stg1_low_band_net.dec3.conv.conv.1.weight", "stg1_low_band_net.dec3.conv.conv.1.bias", "stg1_low_band_net.dec3.conv.conv.1.running_mean", "stg1_low_band_net.dec3.conv.conv.1.running_var", "stg1_low_band_net.dec2.conv.conv.0.weight", "stg1_low_band_net.dec2.conv.conv.1.weight", "stg1_low_band_net.dec2.conv.conv.1.bias", "stg1_low_band_net.dec2.conv.conv.1.running_mean", "stg1_low_band_net.dec2.conv.conv.1.running_var", "stg1_low_band_net.dec1.conv.conv.0.weight", "stg1_low_band_net.dec1.conv.conv.1.weight", "stg1_low_band_net.dec1.conv.conv.1.bias", "stg1_low_band_net.dec1.conv.conv.1.running_mean", "stg1_low_band_net.dec1.conv.conv.1.running_var", "stg1_high_band_net.enc1.conv1.conv.0.weight", "stg1_high_band_net.enc1.conv1.conv.1.weight", "stg1_high_band_net.enc1.conv1.conv.1.bias", "stg1_high_band_net.enc1.conv1.conv.1.running_mean", "stg1_high_band_net.enc1.conv1.conv.1.running_var", "stg1_high_band_net.enc1.conv2.conv.0.weight", "stg1_high_band_net.enc1.conv2.conv.1.weight", "stg1_high_band_net.enc1.conv2.conv.1.bias", "stg1_high_band_net.enc1.conv2.conv.1.running_mean", "stg1_high_band_net.enc1.conv2.conv.1.running_var", "stg1_high_band_net.enc2.conv1.conv.0.weight", "stg1_high_band_net.enc2.conv1.conv.1.weight", "stg1_high_band_net.enc2.conv1.conv.1.bias", "stg1_high_band_net.enc2.conv1.conv.1.running_mean", "stg1_high_band_net.enc2.conv1.conv.1.running_var", "stg1_high_band_net.enc2.conv2.conv.0.weight", "stg1_high_band_net.enc2.conv2.conv.1.weight", "stg1_high_band_net.enc2.conv2.conv.1.bias", "stg1_high_band_net.enc2.conv2.conv.1.running_mean", "stg1_high_band_net.enc2.conv2.conv.1.running_var", "stg1_high_band_net.enc3.conv1.conv.0.weight", "stg1_high_band_net.enc3.conv1.conv.1.weight", "stg1_high_band_net.enc3.conv1.conv.1.bias", "stg1_high_band_net.enc3.conv1.conv.1.running_mean", "stg1_high_band_net.enc3.conv1.conv.1.running_var", "stg1_high_band_net.enc3.conv2.conv.0.weight", "stg1_high_band_net.enc3.conv2.conv.1.weight", "stg1_high_band_net.enc3.conv2.conv.1.bias", "stg1_high_band_net.enc3.conv2.conv.1.running_mean", "stg1_high_band_net.enc3.conv2.conv.1.running_var", "stg1_high_band_net.enc4.conv1.conv.0.weight", "stg1_high_band_net.enc4.conv1.conv.1.weight", "stg1_high_band_net.enc4.conv1.conv.1.bias", "stg1_high_band_net.enc4.conv1.conv.1.running_mean", "stg1_high_band_net.enc4.conv1.conv.1.running_var", "stg1_high_band_net.enc4.conv2.conv.0.weight", "stg1_high_band_net.enc4.conv2.conv.1.weight", "stg1_high_band_net.enc4.conv2.conv.1.bias", "stg1_high_band_net.enc4.conv2.conv.1.running_mean", "stg1_high_band_net.enc4.conv2.conv.1.running_var", "stg1_high_band_net.aspp.conv1.1.conv.0.weight", "stg1_high_band_net.aspp.conv1.1.conv.1.weight", "stg1_high_band_net.aspp.conv1.1.conv.1.bias", "stg1_high_band_net.aspp.conv1.1.conv.1.running_mean", "stg1_high_band_net.aspp.conv1.1.conv.1.running_var", "stg1_high_band_net.aspp.conv2.conv.0.weight", "stg1_high_band_net.aspp.conv2.conv.1.weight", "stg1_high_band_net.aspp.conv2.conv.1.bias", "stg1_high_band_net.aspp.conv2.conv.1.running_mean", "stg1_high_band_net.aspp.conv2.conv.1.running_var", "stg1_high_band_net.aspp.conv3.conv.0.weight", "stg1_high_band_net.aspp.conv3.conv.1.weight", "stg1_high_band_net.aspp.conv3.conv.2.weight", "stg1_high_band_net.aspp.conv3.conv.2.bias", "stg1_high_band_net.aspp.conv3.conv.2.running_mean", "stg1_high_band_net.aspp.conv3.conv.2.running_var", "stg1_high_band_net.aspp.conv4.conv.0.weight", "stg1_high_band_net.aspp.conv4.conv.1.weight", "stg1_high_band_net.aspp.conv4.conv.2.weight", "stg1_high_band_net.aspp.conv4.conv.2.bias", "stg1_high_band_net.aspp.conv4.conv.2.running_mean", "stg1_high_band_net.aspp.conv4.conv.2.running_var", "stg1_high_band_net.aspp.conv5.conv.0.weight", "stg1_high_band_net.aspp.conv5.conv.1.weight", "stg1_high_band_net.aspp.conv5.conv.2.weight", "stg1_high_band_net.aspp.conv5.conv.2.bias", "stg1_high_band_net.aspp.conv5.conv.2.running_mean", "stg1_high_band_net.aspp.conv5.conv.2.running_var", "stg1_high_band_net.aspp.bottleneck.0.conv.0.weight", "stg1_high_band_net.aspp.bottleneck.0.conv.1.weight", "stg1_high_band_net.aspp.bottleneck.0.conv.1.bias", "stg1_high_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg1_high_band_net.aspp.bottleneck.0.conv.1.running_var", "stg1_high_band_net.dec4.conv.conv.0.weight", "stg1_high_band_net.dec4.conv.conv.1.weight", "stg1_high_band_net.dec4.conv.conv.1.bias", "stg1_high_band_net.dec4.conv.conv.1.running_mean", "stg1_high_band_net.dec4.conv.conv.1.running_var", "stg1_high_band_net.dec3.conv.conv.0.weight", "stg1_high_band_net.dec3.conv.conv.1.weight", "stg1_high_band_net.dec3.conv.conv.1.bias", "stg1_high_band_net.dec3.conv.conv.1.running_mean", "stg1_high_band_net.dec3.conv.conv.1.running_var", "stg1_high_band_net.dec2.conv.conv.0.weight", "stg1_high_band_net.dec2.conv.conv.1.weight", "stg1_high_band_net.dec2.conv.conv.1.bias", "stg1_high_band_net.dec2.conv.conv.1.running_mean", "stg1_high_band_net.dec2.conv.conv.1.running_var", "stg1_high_band_net.dec1.conv.conv.0.weight", "stg1_high_band_net.dec1.conv.conv.1.weight", "stg1_high_band_net.dec1.conv.conv.1.bias", "stg1_high_band_net.dec1.conv.conv.1.running_mean", "stg1_high_band_net.dec1.conv.conv.1.running_var", "stg1_full_band_net.enc1.conv1.conv.0.weight", "stg1_full_band_net.enc1.conv1.conv.1.weight", "stg1_full_band_net.enc1.conv1.conv.1.bias", "stg1_full_band_net.enc1.conv1.conv.1.running_mean", "stg1_full_band_net.enc1.conv1.conv.1.running_var", "stg1_full_band_net.enc1.conv2.conv.0.weight", "stg1_full_band_net.enc1.conv2.conv.1.weight", "stg1_full_band_net.enc1.conv2.conv.1.bias", "stg1_full_band_net.enc1.conv2.conv.1.running_mean", "stg1_full_band_net.enc1.conv2.conv.1.running_var", "stg1_full_band_net.enc2.conv1.conv.0.weight", "stg1_full_band_net.enc2.conv1.conv.1.weight", "stg1_full_band_net.enc2.conv1.conv.1.bias", "stg1_full_band_net.enc2.conv1.conv.1.running_mean", "stg1_full_band_net.enc2.conv1.conv.1.running_var", "stg1_full_band_net.enc2.conv2.conv.0.weight", "stg1_full_band_net.enc2.conv2.conv.1.weight", "stg1_full_band_net.enc2.conv2.conv.1.bias", "stg1_full_band_net.enc2.conv2.conv.1.running_mean", "stg1_full_band_net.enc2.conv2.conv.1.running_var", "stg1_full_band_net.enc3.conv1.conv.0.weight", "stg1_full_band_net.enc3.conv1.conv.1.weight", "stg1_full_band_net.enc3.conv1.conv.1.bias", "stg1_full_band_net.enc3.conv1.conv.1.running_mean", "stg1_full_band_net.enc3.conv1.conv.1.running_var", "stg1_full_band_net.enc3.conv2.conv.0.weight", "stg1_full_band_net.enc3.conv2.conv.1.weight", "stg1_full_band_net.enc3.conv2.conv.1.bias", "stg1_full_band_net.enc3.conv2.conv.1.running_mean", "stg1_full_band_net.enc3.conv2.conv.1.running_var", "stg1_full_band_net.enc4.conv1.conv.0.weight", "stg1_full_band_net.enc4.conv1.conv.1.weight", "stg1_full_band_net.enc4.conv1.conv.1.bias", "stg1_full_band_net.enc4.conv1.conv.1.running_mean", "stg1_full_band_net.enc4.conv1.conv.1.running_var", "stg1_full_band_net.enc4.conv2.conv.0.weight", "stg1_full_band_net.enc4.conv2.conv.1.weight", "stg1_full_band_net.enc4.conv2.conv.1.bias", "stg1_full_band_net.enc4.conv2.conv.1.running_mean", "stg1_full_band_net.enc4.conv2.conv.1.running_var", "stg1_full_band_net.aspp.conv1.1.conv.0.weight", "stg1_full_band_net.aspp.conv1.1.conv.1.weight", "stg1_full_band_net.aspp.conv1.1.conv.1.bias", "stg1_full_band_net.aspp.conv1.1.conv.1.running_mean", "stg1_full_band_net.aspp.conv1.1.conv.1.running_var", "stg1_full_band_net.aspp.conv2.conv.0.weight", "stg1_full_band_net.aspp.conv2.conv.1.weight", "stg1_full_band_net.aspp.conv2.conv.1.bias", "stg1_full_band_net.aspp.conv2.conv.1.running_mean", "stg1_full_band_net.aspp.conv2.conv.1.running_var", "stg1_full_band_net.aspp.conv3.conv.0.weight", "stg1_full_band_net.aspp.conv3.conv.1.weight", "stg1_full_band_net.aspp.conv3.conv.2.weight", "stg1_full_band_net.aspp.conv3.conv.2.bias", "stg1_full_band_net.aspp.conv3.conv.2.running_mean", "stg1_full_band_net.aspp.conv3.conv.2.running_var", "stg1_full_band_net.aspp.conv4.conv.0.weight", "stg1_full_band_net.aspp.conv4.conv.1.weight", "stg1_full_band_net.aspp.conv4.conv.2.weight", "stg1_full_band_net.aspp.conv4.conv.2.bias", "stg1_full_band_net.aspp.conv4.conv.2.running_mean", "stg1_full_band_net.aspp.conv4.conv.2.running_var", "stg1_full_band_net.aspp.conv5.conv.0.weight", "stg1_full_band_net.aspp.conv5.conv.1.weight", "stg1_full_band_net.aspp.conv5.conv.2.weight", "stg1_full_band_net.aspp.conv5.conv.2.bias", "stg1_full_band_net.aspp.conv5.conv.2.running_mean", "stg1_full_band_net.aspp.conv5.conv.2.running_var", "stg1_full_band_net.aspp.bottleneck.0.conv.0.weight", "stg1_full_band_net.aspp.bottleneck.0.conv.1.weight", "stg1_full_band_net.aspp.bottleneck.0.conv.1.bias", "stg1_full_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg1_full_band_net.aspp.bottleneck.0.conv.1.running_var", "stg1_full_band_net.dec4.conv.conv.0.weight", "stg1_full_band_net.dec4.conv.conv.1.weight", "stg1_full_band_net.dec4.conv.conv.1.bias", "stg1_full_band_net.dec4.conv.conv.1.running_mean", "stg1_full_band_net.dec4.conv.conv.1.running_var", "stg1_full_band_net.dec3.conv.conv.0.weight", "stg1_full_band_net.dec3.conv.conv.1.weight", "stg1_full_band_net.dec3.conv.conv.1.bias", "stg1_full_band_net.dec3.conv.conv.1.running_mean", "stg1_full_band_net.dec3.conv.conv.1.running_var", "stg1_full_band_net.dec2.conv.conv.0.weight", "stg1_full_band_net.dec2.conv.conv.1.weight", "stg1_full_band_net.dec2.conv.conv.1.bias", "stg1_full_band_net.dec2.conv.conv.1.running_mean", "stg1_full_band_net.dec2.conv.conv.1.running_var", "stg1_full_band_net.dec1.conv.conv.0.weight", "stg1_full_band_net.dec1.conv.conv.1.weight", "stg1_full_band_net.dec1.conv.conv.1.bias", "stg1_full_band_net.dec1.conv.conv.1.running_mean", "stg1_full_band_net.dec1.conv.conv.1.running_var", "stg2_full_band_net.enc1.conv1.conv.0.weight", "stg2_full_band_net.enc1.conv1.conv.1.weight", "stg2_full_band_net.enc1.conv1.conv.1.bias", "stg2_full_band_net.enc1.conv1.conv.1.running_mean", "stg2_full_band_net.enc1.conv1.conv.1.running_var", "stg2_full_band_net.enc1.conv2.conv.0.weight", "stg2_full_band_net.enc1.conv2.conv.1.weight", "stg2_full_band_net.enc1.conv2.conv.1.bias", "stg2_full_band_net.enc1.conv2.conv.1.running_mean", "stg2_full_band_net.enc1.conv2.conv.1.running_var", "stg2_full_band_net.enc2.conv1.conv.0.weight", "stg2_full_band_net.enc2.conv1.conv.1.weight", "stg2_full_band_net.enc2.conv1.conv.1.bias", "stg2_full_band_net.enc2.conv1.conv.1.running_mean", "stg2_full_band_net.enc2.conv1.conv.1.running_var", "stg2_full_band_net.enc2.conv2.conv.0.weight", "stg2_full_band_net.enc2.conv2.conv.1.weight", "stg2_full_band_net.enc2.conv2.conv.1.bias", "stg2_full_band_net.enc2.conv2.conv.1.running_mean", "stg2_full_band_net.enc2.conv2.conv.1.running_var", "stg2_full_band_net.enc3.conv1.conv.0.weight", "stg2_full_band_net.enc3.conv1.conv.1.weight", "stg2_full_band_net.enc3.conv1.conv.1.bias", "stg2_full_band_net.enc3.conv1.conv.1.running_mean", "stg2_full_band_net.enc3.conv1.conv.1.running_var", "stg2_full_band_net.enc3.conv2.conv.0.weight", "stg2_full_band_net.enc3.conv2.conv.1.weight", "stg2_full_band_net.enc3.conv2.conv.1.bias", "stg2_full_band_net.enc3.conv2.conv.1.running_mean", "stg2_full_band_net.enc3.conv2.conv.1.running_var", "stg2_full_band_net.enc4.conv1.conv.0.weight", "stg2_full_band_net.enc4.conv1.conv.1.weight", "stg2_full_band_net.enc4.conv1.conv.1.bias", "stg2_full_band_net.enc4.conv1.conv.1.running_mean", "stg2_full_band_net.enc4.conv1.conv.1.running_var", "stg2_full_band_net.enc4.conv2.conv.0.weight", "stg2_full_band_net.enc4.conv2.conv.1.weight", "stg2_full_band_net.enc4.conv2.conv.1.bias", "stg2_full_band_net.enc4.conv2.conv.1.running_mean", "stg2_full_band_net.enc4.conv2.conv.1.running_var", "stg2_full_band_net.aspp.conv1.1.conv.0.weight", "stg2_full_band_net.aspp.conv1.1.conv.1.weight", "stg2_full_band_net.aspp.conv1.1.conv.1.bias", "stg2_full_band_net.aspp.conv1.1.conv.1.running_mean", "stg2_full_band_net.aspp.conv1.1.conv.1.running_var", "stg2_full_band_net.aspp.conv2.conv.0.weight", "stg2_full_band_net.aspp.conv2.conv.1.weight", "stg2_full_band_net.aspp.conv2.conv.1.bias", "stg2_full_band_net.aspp.conv2.conv.1.running_mean", "stg2_full_band_net.aspp.conv2.conv.1.running_var", "stg2_full_band_net.aspp.conv3.conv.0.weight", "stg2_full_band_net.aspp.conv3.conv.1.weight", "stg2_full_band_net.aspp.conv3.conv.2.weight", "stg2_full_band_net.aspp.conv3.conv.2.bias", "stg2_full_band_net.aspp.conv3.conv.2.running_mean", "stg2_full_band_net.aspp.conv3.conv.2.running_var", "stg2_full_band_net.aspp.conv4.conv.0.weight", "stg2_full_band_net.aspp.conv4.conv.1.weight", "stg2_full_band_net.aspp.conv4.conv.2.weight", "stg2_full_band_net.aspp.conv4.conv.2.bias", "stg2_full_band_net.aspp.conv4.conv.2.running_mean", "stg2_full_band_net.aspp.conv4.conv.2.running_var", "stg2_full_band_net.aspp.conv5.conv.0.weight", "stg2_full_band_net.aspp.conv5.conv.1.weight", "stg2_full_band_net.aspp.conv5.conv.2.weight", "stg2_full_band_net.aspp.conv5.conv.2.bias", "stg2_full_band_net.aspp.conv5.conv.2.running_mean", "stg2_full_band_net.aspp.conv5.conv.2.running_var", "stg2_full_band_net.aspp.bottleneck.0.conv.0.weight", "stg2_full_band_net.aspp.bottleneck.0.conv.1.weight", "stg2_full_band_net.aspp.bottleneck.0.conv.1.bias", "stg2_full_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg2_full_band_net.aspp.bottleneck.0.conv.1.running_var", "stg2_full_band_net.dec4.conv.conv.0.weight", "stg2_full_band_net.dec4.conv.conv.1.weight", "stg2_full_band_net.dec4.conv.conv.1.bias", "stg2_full_band_net.dec4.conv.conv.1.running_mean", "stg2_full_band_net.dec4.conv.conv.1.running_var", "stg2_full_band_net.dec3.conv.conv.0.weight", "stg2_full_band_net.dec3.conv.conv.1.weight", "stg2_full_band_net.dec3.conv.conv.1.bias", "stg2_full_band_net.dec3.conv.conv.1.running_mean", "stg2_full_band_net.dec3.conv.conv.1.running_var", "stg2_full_band_net.dec2.conv.conv.0.weight", "stg2_full_band_net.dec2.conv.conv.1.weight", "stg2_full_band_net.dec2.conv.conv.1.bias", "stg2_full_band_net.dec2.conv.conv.1.running_mean", "stg2_full_band_net.dec2.conv.conv.1.running_var", "stg2_full_band_net.dec1.conv.conv.0.weight", "stg2_full_band_net.dec1.conv.conv.1.weight", "stg2_full_band_net.dec1.conv.conv.1.bias", "stg2_full_band_net.dec1.conv.conv.1.running_mean", "stg2_full_band_net.dec1.conv.conv.1.running_var", "out.weight".
Unexpected key(s) in state_dict: "low_band_net.enc1.conv1.conv.0.weight", "low_band_net.enc1.conv1.conv.1.weight", "low_band_net.enc1.conv1.conv.1.bias", "low_band_net.enc1.conv1.conv.1.running_mean", "low_band_net.enc1.conv1.conv.1.running_var", "low_band_net.enc1.conv1.conv.1.num_batches_tracked", "low_band_net.enc1.conv2.conv.0.weight", "low_band_net.enc1.conv2.conv.1.weight", "low_band_net.enc1.conv2.conv.1.bias", "low_band_net.enc1.conv2.conv.1.running_mean", "low_band_net.enc1.conv2.conv.1.running_var", "low_band_net.enc1.conv2.conv.1.num_batches_tracked", "low_band_net.enc2.conv1.conv.0.weight", "low_band_net.enc2.conv1.conv.1.weight", "low_band_net.enc2.conv1.conv.1.bias", "low_band_net.enc2.conv1.conv.1.running_mean", "low_band_net.enc2.conv1.conv.1.running_var", "low_band_net.enc2.conv1.conv.1.num_batches_tracked", "low_band_net.enc2.conv2.conv.0.weight", "low_band_net.enc2.conv2.conv.1.weight", "low_band_net.enc2.conv2.conv.1.bias", "low_band_net.enc2.conv2.conv.1.running_mean", "low_band_net.enc2.conv2.conv.1.running_var", "low_band_net.enc2.conv2.conv.1.num_batches_tracked", "low_band_net.enc3.conv1.conv.0.weight", "low_band_net.enc3.conv1.conv.1.weight", "low_band_net.enc3.conv1.conv.1.bias", "low_band_net.enc3.conv1.conv.1.running_mean", "low_band_net.enc3.conv1.conv.1.running_var", "low_band_net.enc3.conv1.conv.1.num_batches_tracked", "low_band_net.enc3.conv2.conv.0.weight", "low_band_net.enc3.conv2.conv.1.weight", "low_band_net.enc3.conv2.conv.1.bias", "low_band_net.enc3.conv2.conv.1.running_mean", "low_band_net.enc3.conv2.conv.1.running_var", "low_band_net.enc3.conv2.conv.1.num_batches_tracked", "low_band_net.enc4.conv1.conv.0.weight", "low_band_net.enc4.conv1.conv.1.weight", "low_band_net.enc4.conv1.conv.1.bias", "low_band_net.enc4.conv1.conv.1.running_mean", "low_band_net.enc4.conv1.conv.1.running_var", "low_band_net.enc4.conv1.conv.1.num_batches_tracked", "low_band_net.enc4.conv2.conv.0.weight", "low_band_net.enc4.conv2.conv.1.weight", "low_band_net.enc4.conv2.conv.1.bias", "low_band_net.enc4.conv2.conv.1.running_mean", "low_band_net.enc4.conv2.conv.1.running_var", "low_band_net.enc4.conv2.conv.1.num_batches_tracked", "low_band_net.aspp.conv1.1.conv.0.weight", "low_band_net.aspp.conv1.1.conv.1.weight", "low_band_net.aspp.conv1.1.conv.1.bias", "low_band_net.aspp.conv1.1.conv.1.running_mean", "low_band_net.aspp.conv1.1.conv.1.running_var", "low_band_net.aspp.conv1.1.conv.1.num_batches_tracked", "low_band_net.aspp.conv2.conv.0.weight", "low_band_net.aspp.conv2.conv.1.weight", "low_band_net.aspp.conv2.conv.1.bias", "low_band_net.aspp.conv2.conv.1.running_mean", "low_band_net.aspp.conv2.conv.1.running_var", "low_band_net.aspp.conv2.conv.1.num_batches_tracked", "low_band_net.aspp.conv3.conv.0.weight", "low_band_net.aspp.conv3.conv.1.weight", "low_band_net.aspp.conv3.conv.2.weight", "low_band_net.aspp.conv3.conv.2.bias", "low_band_net.aspp.conv3.conv.2.running_mean", "low_band_net.aspp.conv3.conv.2.running_var", "low_band_net.aspp.conv3.conv.2.num_batches_tracked", "low_band_net.aspp.conv4.conv.0.weight", "low_band_net.aspp.conv4.conv.1.weight", "low_band_net.aspp.conv4.conv.2.weight", "low_band_net.aspp.conv4.conv.2.bias", "low_band_net.aspp.conv4.conv.2.running_mean", "low_band_net.aspp.conv4.conv.2.running_var", "low_band_net.aspp.conv4.conv.2.num_batches_tracked", "low_band_net.aspp.conv5.conv.0.weight", "low_band_net.aspp.conv5.conv.1.weight", "low_band_net.aspp.conv5.conv.2.weight", "low_band_net.aspp.conv5.conv.2.bias", "low_band_net.aspp.conv5.conv.2.running_mean", "low_band_net.aspp.conv5.conv.2.running_var", "low_band_net.aspp.conv5.conv.2.num_batches_tracked", "low_band_net.aspp.bottleneck.0.conv.0.weight", "low_band_net.aspp.bottleneck.0.conv.1.weight", "low_band_net.aspp.bottleneck.0.conv.1.bias", "low_band_net.aspp.bottleneck.0.conv.1.running_mean", "low_band_net.aspp.bottleneck.0.conv.1.running_var", "low_band_net.aspp.bottleneck.0.conv.1.num_batches_tracked", "low_band_net.dec4.conv.conv.0.weight", "low_band_net.dec4.conv.conv.1.weight", "low_band_net.dec4.conv.conv.1.bias", "low_band_net.dec4.conv.conv.1.running_mean", "low_band_net.dec4.conv.conv.1.running_var", "low_band_net.dec4.conv.conv.1.num_batches_tracked", "low_band_net.dec3.conv.conv.0.weight", "low_band_net.dec3.conv.conv.1.weight", "low_band_net.dec3.conv.conv.1.bias", "low_band_net.dec3.conv.conv.1.running_mean", "low_band_net.dec3.conv.conv.1.running_var", "low_band_net.dec3.conv.conv.1.num_batches_tracked", "low_band_net.dec2.conv.conv.0.weight", "low_band_net.dec2.conv.conv.1.weight", "low_band_net.dec2.conv.conv.1.bias", "low_band_net.dec2.conv.conv.1.running_mean", "low_band_net.dec2.conv.conv.1.running_var", "low_band_net.dec2.conv.conv.1.num_batches_tracked", "low_band_net.dec1.conv.conv.0.weight", "low_band_net.dec1.conv.conv.1.weight", "low_band_net.dec1.conv.conv.1.bias", "low_band_net.dec1.conv.conv.1.running_mean", "low_band_net.dec1.conv.conv.1.running_var", "low_band_net.dec1.conv.conv.1.num_batches_tracked", "high_band_net.enc1.conv1.conv.0.weight", "high_band_net.enc1.conv1.conv.1.weight", "high_band_net.enc1.conv1.conv.1.bias", "high_band_net.enc1.conv1.conv.1.running_mean", "high_band_net.enc1.conv1.conv.1.running_var", "high_band_net.enc1.conv1.conv.1.num_batches_tracked", "high_band_net.enc1.conv2.conv.0.weight", "high_band_net.enc1.conv2.conv.1.weight", "high_band_net.enc1.conv2.conv.1.bias", "high_band_net.enc1.conv2.conv.1.running_mean", "high_band_net.enc1.conv2.conv.1.running_var", "high_band_net.enc1.conv2.conv.1.num_batches_tracked", "high_band_net.enc2.conv1.conv.0.weight", "high_band_net.enc2.conv1.conv.1.weight", "high_band_net.enc2.conv1.conv.1.bias", "high_band_net.enc2.conv1.conv.1.running_mean", "high_band_net.enc2.conv1.conv.1.running_var", "high_band_net.enc2.conv1.conv.1.num_batches_tracked", "high_band_net.enc2.conv2.conv.0.weight", "high_band_net.enc2.conv2.conv.1.weight", "high_band_net.enc2.conv2.conv.1.bias", "high_band_net.enc2.conv2.conv.1.running_mean", "high_band_net.enc2.conv2.conv.1.running_var", "high_band_net.enc2.conv2.conv.1.num_batches_tracked", "high_band_net.enc3.conv1.conv.0.weight", "high_band_net.enc3.conv1.conv.1.weight", "high_band_net.enc3.conv1.conv.1.bias", "high_band_net.enc3.conv1.conv.1.running_mean", "high_band_net.enc3.conv1.conv.1.running_var", "high_band_net.enc3.conv1.conv.1.num_batches_tracked", "high_band_net.enc3.conv2.conv.0.weight", "high_band_net.enc3.conv2.conv.1.weight", "high_band_net.enc3.conv2.conv.1.bias", "high_band_net.enc3.conv2.conv.1.running_mean", "high_band_net.enc3.conv2.conv.1.running_var", "high_band_net.enc3.conv2.conv.1.num_batches_tracked", "high_band_net.enc4.conv1.conv.0.weight", "high_band_net.enc4.conv1.conv.1.weight", "high_band_net.enc4.conv1.conv.1.bias", "high_band_net.enc4.conv1.conv.1.running_mean", "high_band_net.enc4.conv1.conv.1.running_var", "high_band_net.enc4.conv1.conv.1.num_batches_tracked", "high_band_net.enc4.conv2.conv.0.weight", "high_band_net.enc4.conv2.conv.1.weight", "high_band_net.enc4.conv2.conv.1.bias", "high_band_net.enc4.conv2.conv.1.running_mean", "high_band_net.enc4.conv2.conv.1.running_var", "high_band_net.enc4.conv2.conv.1.num_batches_tracked", "high_band_net.aspp.conv1.1.conv.0.weight", "high_band_net.aspp.conv1.1.conv.1.weight", "high_band_net.aspp.conv1.1.conv.1.bias", "high_band_net.aspp.conv1.1.conv.1.running_mean", "high_band_net.aspp.conv1.1.conv.1.running_var", "high_band_net.aspp.conv1.1.conv.1.num_batches_tracked", "high_band_net.aspp.conv2.conv.0.weight", "high_band_net.aspp.conv2.conv.1.weight", "high_band_net.aspp.conv2.conv.1.bias", "high_band_net.aspp.conv2.conv.1.running_mean", "high_band_net.aspp.conv2.conv.1.running_var", "high_band_net.aspp.conv2.conv.1.num_batches_tracked", "high_band_net.aspp.conv3.conv.0.weight", "high_band_net.aspp.conv3.conv.1.weight", "high_band_net.aspp.conv3.conv.2.weight", "high_band_net.aspp.conv3.conv.2.bias", "high_band_net.aspp.conv3.conv.2.running_mean", "high_band_net.aspp.conv3.conv.2.running_var", "high_band_net.aspp.conv3.conv.2.num_batches_tracked", "high_band_net.aspp.conv4.conv.0.weight", "high_band_net.aspp.conv4.conv.1.weight", "high_band_net.aspp.conv4.conv.2.weight", "high_band_net.aspp.conv4.conv.2.bias", "high_band_net.aspp.conv4.conv.2.running_mean", "high_band_net.aspp.conv4.conv.2.running_var", "high_band_net.aspp.conv4.conv.2.num_batches_tracked", "high_band_net.aspp.conv5.conv.0.weight", "high_band_net.aspp.conv5.conv.1.weight", "high_band_net.aspp.conv5.conv.2.weight", "high_band_net.aspp.conv5.conv.2.bias", "high_band_net.aspp.conv5.conv.2.running_mean", "high_band_net.aspp.conv5.conv.2.running_var", "high_band_net.aspp.conv5.conv.2.num_batches_tracked", "high_band_net.aspp.bottleneck.0.conv.0.weight", "high_band_net.aspp.bottleneck.0.conv.1.weight", "high_band_net.aspp.bottleneck.0.conv.1.bias", "high_band_net.aspp.bottleneck.0.conv.1.running_mean", "high_band_net.aspp.bottleneck.0.conv.1.running_var", "high_band_net.aspp.bottleneck.0.conv.1.num_batches_tracked", "high_band_net.dec4.conv.conv.0.weight", "high_band_net.dec4.conv.conv.1.weight", "high_band_net.dec4.conv.conv.1.bias", "high_band_net.dec4.conv.conv.1.running_mean", "high_band_net.dec4.conv.conv.1.running_var", "high_band_net.dec4.conv.conv.1.num_batches_tracked", "high_band_net.dec3.conv.conv.0.weight", "high_band_net.dec3.conv.conv.1.weight", "high_band_net.dec3.conv.conv.1.bias", "high_band_net.dec3.conv.conv.1.running_mean", "high_band_net.dec3.conv.conv.1.running_var", "high_band_net.dec3.conv.conv.1.num_batches_tracked", "high_band_net.dec2.conv.conv.0.weight", "high_band_net.dec2.conv.conv.1.weight", "high_band_net.dec2.conv.conv.1.bias", "high_band_net.dec2.conv.conv.1.running_mean", "high_band_net.dec2.conv.conv.1.running_var", "high_band_net.dec2.conv.conv.1.num_batches_tracked", "high_band_net.dec1.conv.conv.0.weight", "high_band_net.dec1.conv.conv.1.weight", "high_band_net.dec1.conv.conv.1.bias", "high_band_net.dec1.conv.conv.1.running_mean", "high_band_net.dec1.conv.conv.1.running_var", "high_band_net.dec1.conv.conv.1.num_batches_tracked", "full_band_net.enc1.conv1.conv.0.weight", "full_band_net.enc1.conv1.conv.1.weight", "full_band_net.enc1.conv1.conv.1.bias", "full_band_net.enc1.conv1.conv.1.running_mean", "full_band_net.enc1.conv1.conv.1.running_var", "full_band_net.enc1.conv1.conv.1.num_batches_tracked", "full_band_net.enc1.conv2.conv.0.weight", "full_band_net.enc1.conv2.conv.1.weight", "full_band_net.enc1.conv2.conv.1.bias", "full_band_net.enc1.conv2.conv.1.running_mean", "full_band_net.enc1.conv2.conv.1.running_var", "full_band_net.enc1.conv2.conv.1.num_batches_tracked", "full_band_net.enc2.conv1.conv.0.weight", "full_band_net.enc2.conv1.conv.1.weight", "full_band_net.enc2.conv1.conv.1.bias", "full_band_net.enc2.conv1.conv.1.running_mean", "full_band_net.enc2.conv1.conv.1.running_var", "full_band_net.enc2.conv1.conv.1.num_batches_tracked", "full_band_net.enc2.conv2.conv.0.weight", "full_band_net.enc2.conv2.conv.1.weight", "full_band_net.enc2.conv2.conv.1.bias", "full_band_net.enc2.conv2.conv.1.running_mean", "full_band_net.enc2.conv2.conv.1.running_var", "full_band_net.enc2.conv2.conv.1.num_batches_tracked", "full_band_net.enc3.conv1.conv.0.weight", "full_band_net.enc3.conv1.conv.1.weight", "full_band_net.enc3.conv1.conv.1.bias", "full_band_net.enc3.conv1.conv.1.running_mean", "full_band_net.enc3.conv1.conv.1.running_var", "full_band_net.enc3.conv1.conv.1.num_batches_tracked", "full_band_net.enc3.conv2.conv.0.weight", "full_band_net.enc3.conv2.conv.1.weight", "full_band_net.enc3.conv2.conv.1.bias", "full_band_net.enc3.conv2.conv.1.running_mean", "full_band_net.enc3.conv2.conv.1.running_var", "full_band_net.enc3.conv2.conv.1.num_batches_tracked", "full_band_net.enc4.conv1.conv.0.weight", "full_band_net.enc4.conv1.conv.1.weight", "full_band_net.enc4.conv1.conv.1.bias", "full_band_net.enc4.conv1.conv.1.running_mean", "full_band_net.enc4.conv1.conv.1.running_var", "full_band_net.enc4.conv1.conv.1.num_batches_tracked", "full_band_net.enc4.conv2.conv.0.weight", "full_band_net.enc4.conv2.conv.1.weight", "full_band_net.enc4.conv2.conv.1.bias", "full_band_net.enc4.conv2.conv.1.running_mean", "full_band_net.enc4.conv2.conv.1.running_var", "full_band_net.enc4.conv2.conv.1.num_batches_tracked", "full_band_net.aspp.conv1.1.conv.0.weight", "full_band_net.aspp.conv1.1.conv.1.weight", "full_band_net.aspp.conv1.1.conv.1.bias", "full_band_net.aspp.conv1.1.conv.1.running_mean", "full_band_net.aspp.conv1.1.conv.1.running_var", "full_band_net.aspp.conv1.1.conv.1.num_batches_tracked", "full_band_net.aspp.conv2.conv.0.weight", "full_band_net.aspp.conv2.conv.1.weight", "full_band_net.aspp.conv2.conv.1.bias", "full_band_net.aspp.conv2.conv.1.running_mean", "full_band_net.aspp.conv2.conv.1.running_var", "full_band_net.aspp.conv2.conv.1.num_batches_tracked", "full_band_net.aspp.conv3.conv.0.weight", "full_band_net.aspp.conv3.conv.1.weight", "full_band_net.aspp.conv3.conv.2.weight", "full_band_net.aspp.conv3.conv.2.bias", "full_band_net.aspp.conv3.conv.2.running_mean", "full_band_net.aspp.conv3.conv.2.running_var", "full_band_net.aspp.conv3.conv.2.num_batches_tracked", "full_band_net.aspp.conv4.conv.0.weight", "full_band_net.aspp.conv4.conv.1.weight", "full_band_net.aspp.conv4.conv.2.weight", "full_band_net.aspp.conv4.conv.2.bias", "full_band_net.aspp.conv4.conv.2.running_mean", "full_band_net.aspp.conv4.conv.2.running_var", "full_band_net.aspp.conv4.conv.2.num_batches_tracked", "full_band_net.aspp.conv5.conv.0.weight", "full_band_net.aspp.conv5.conv.1.weight", "full_band_net.aspp.conv5.conv.2.weight", "full_band_net.aspp.conv5.conv.2.bias", "full_band_net.aspp.conv5.conv.2.running_mean", "full_band_net.aspp.conv5.conv.2.running_var", "full_band_net.aspp.conv5.conv.2.num_batches_tracked", "full_band_net.aspp.bottleneck.0.conv.0.weight", "full_band_net.aspp.bottleneck.0.conv.1.weight", "full_band_net.aspp.bottleneck.0.conv.1.bias", "full_band_net.aspp.bottleneck.0.conv.1.running_mean", "full_band_net.aspp.bottleneck.0.conv.1.running_var", "full_band_net.aspp.bottleneck.0.conv.1.num_batches_tracked", "full_band_net.dec4.conv.conv.0.weight", "full_band_net.dec4.conv.conv.1.weight", "full_band_net.dec4.conv.conv.1.bias", "full_band_net.dec4.conv.conv.1.running_mean", "full_band_net.dec4.conv.conv.1.running_var", "full_band_net.dec4.conv.conv.1.num_batches_tracked", "full_band_net.dec3.conv.conv.0.weight", "full_band_net.dec3.conv.conv.1.weight", "full_band_net.dec3.conv.conv.1.bias", "full_band_net.dec3.conv.conv.1.running_mean", "full_band_net.dec3.conv.conv.1.running_var", "full_band_net.dec3.conv.conv.1.num_batches_tracked", "full_band_net.dec2.conv.conv.0.weight", "full_band_net.dec2.conv.conv.1.weight", "full_band_net.dec2.conv.conv.1.bias", "full_band_net.dec2.conv.conv.1.running_mean", "full_band_net.dec2.conv.conv.1.running_var", "full_band_net.dec2.conv.conv.1.num_batches_tracked", "full_band_net.dec1.conv.conv.0.weight", "full_band_net.dec1.conv.conv.1.weight", "full_band_net.dec1.conv.conv.1.bias", "full_band_net.dec1.conv.conv.1.running_mean", "full_band_net.dec1.conv.conv.1.running_var", "full_band_net.dec1.conv.conv.1.num_batches_tracked", "out.0.conv.0.weight", "out.0.conv.1.weight", "out.0.conv.1.bias", "out.0.conv.1.running_mean", "out.0.conv.1.running_var", "out.0.conv.1.num_batches_tracked", "out.1.weight".

Mixup_rate & Mixup_alpha Options

Quick question, can you explain these options a bit more? What do the mean and how can a higher or lower mixup rate potentially affect the training process?

Mono2stereo model?

Hello, I'm loudly thinking if it is possible with this app to make a mono to stereo model?
Do you think this application architecture allows it?
Didn't found any good way to make naturally sounding stereo audio out of single channel source yet. Even the best VST plugins do not not reach that level of quality like ML can do here.

Quality Reduction in v3

Hello,

Quick question, I noticed there seems to be a greater reduction in the quality of the accompaniment compared to v2. Was this to make the vocal removal more aggressive? If so, what parameters need to be tweaked to bring the accompaniment quality back up to v2?

Thanks!

Trouble running script

Hi,

I really like the project, but unfortunately I'm having trouble running your script. I have Python3.7 installed, and have the required Packages installed per 'requirements.txt'.

However, every time I try and run the script, I receive the following traceback:

C:\vocal-remover>python inference.py --input "C:\test.wav"
loading model... done
loading wave source... done
wave source stft... Traceback (most recent call last):
File "inference.py", line 42, in
X, phase = spec_utils.calc_spec(X, args.hop_length, phase=True)
File "C:\vocal-remover\lib\spec_utils.py", line 22, in calc_spec
spec_left = librosa.stft(X[0], n_fft, hop_length=hop_length)
File "C:\Python37\lib\site-packages\librosa\core\spectrum.py", line 215, in stft
util.valid_audio(y)
File "C:\Python37\lib\site-packages\librosa\util\utils.py", line 278, in valid_audio
raise ParameterError('Audio buffer is not Fortran-contiguous. '
librosa.util.exceptions.ParameterError: Audio buffer is not Fortran-contiguous. Use numpy.asfortranarray to ensure Fortran contiguity.

Any ideas? Any sort of support would be highly appreciated. I look forward to running your script in the future! :)

Needs improvement

I tried a song, this vocal remover has removed a part of the singer's voice and left the rest did not remove it

progress bars

tsurumeso , Make these progress bars , one progress bar , To shorten the time
Annotation 2020-02-29 181042

PR for multiple files

Hello :)
I really loved this and it helped me a lot. I extended it to work with multiple songs (from a folder). Would you be interested in a PR?
It's awesome so I want to contribute so it could help more people maybe.
Have a great day!

IndexError: too many indices for array

Hello, Upon trying to train data with only one file in each folder (to test) I get this error :

line 114, in align_wave_head_and_tail
    a_mono = a[:, :sr * 4].sum(axis=0)
IndexError: too many indices for array

Any solution?

EDIT:
Solved : You need to make sure the dataset is stereo and has 2 tracks.

train model commande line

tsurumeso , i want train model command line work on cpu , because When I trained my own model, he said that you need an NVIDIA, NVIDIA graphics card I don't own, I have Intel
(base) C:\Users\DESKTOP\Downloads\vocal-remover-v3.0.0\vocal-remover>python train.py -i dataset/instruments -m dataset/mixtures -M 0.5 -g 0
1 05_too_mix.wav 05_too_inst.wav
Traceback (most recent call last):
File "train.py", line 225, in
main()
File "train.py", line 136, in main
model.cuda()
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 458, in cuda
return self._apply(lambda t: t.cuda(device))
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 354, in _apply
module._apply(fn)
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 354, in _apply
module._apply(fn)
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 354, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 376, in _apply
param_applied = fn(param)
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 458, in
return self.apply(lambda t: t.cuda(device))
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\cuda_init.py", line 186, in _lazy_init
check_driver()
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\cuda_init.py", line 68, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

bugs in some songs

in some songs I extracted the instrumental, I could hear the original voice playing during certain parts of the song, and in other parts the vocal was removed quietly.

Note: I used post processor to identify this error

Support 32bit float output

For further audio editing it would be best to not reduce the bit depth. Is it possible to write the output with 32bit float?

FileNotFoundError: [WinError 3] The system cannot find the path specified: 'dataset/mixtures'

Hello, great job there.
I ran this code and I had this error, did I miss out something in the project?
python augment.py -i dataset/instrumentals -m dataset/mixtures -p 1

Traceback (most recent call last):
File "augment.py", line 25, in
for fname in os.listdir(args.mixture_dataset)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'dataset/mixtures'

Training & validation subdirs

After reading this article (chapter 2.2), I learned a very important thing - the same artist should not be simultaneously appear in the training dataset and the validation dataset. But the train.py script randomly splits the dataset, preventing me from manually distribution the tracks.

Therefore, I suggest to organize the directories as follows:

dataset/
β”œβ”€β”€Β training/
β”‚Β Β Β Β Β Β β”œβ”€β”€Β instruments/
β”‚Β Β Β Β Β Β β”‚Β Β Β Β Β Β β”œβ”€β”€Β 01_foo_inst.wav
β”‚Β Β Β Β Β Β β”‚Β Β Β Β Β Β β”œβ”€β”€Β 02_bar_inst.wav
│      │      └── ...
│      └── mixtures/
β”‚Β Β Β Β Β Β Β Β Β Β Β Β Β β”œβ”€β”€Β 01_foo_mix.wav
β”‚Β Β Β Β Β Β Β Β Β Β Β Β Β β”œβ”€β”€Β 02_bar_mix.wav
│             └── ...
└── validation/
Β Β Β Β Β Β Β β”œβ”€β”€Β instruments/
Β Β Β Β Β Β Β β”‚Β Β Β Β Β Β β”œβ”€β”€Β 03_foo_inst.wav
Β Β Β Β Β Β Β β”‚Β Β Β Β Β Β β”œβ”€β”€Β 04_bar_inst.wav
       │      └── ...
       └── mixtures/
Β Β Β Β Β Β Β Β Β Β Β Β Β Β β”œβ”€β”€Β 03_foo_mix.wav
Β Β Β Β Β Β Β Β Β Β Β Β Β Β β”œβ”€β”€Β 04_bar_mix.wav
              └── ...

Custom sample rate

Can I use custom sampling rate, for example 32000 instead of 44100? I want to do this to increase the number of pairs with the same amount of memory.

I believe that many people do not need frequencies above 16 kHz and I want to try to improve the quality in the audible range. However, I am not sure that the custom sample rate will be compatible with the default window size.

in my opinion

tsurumeso this vocal remover is a great project , but In my opinion, the real vocal remover must remove the singer's voice with the programmation code and the command line only, and not by train a model with a dataset(mix+instrumental) that Which i want to remove vocals from . because if I find instrumental . I will not need to train model To give me the same instrumental
plus , Imagine this instrumental is not available. What will you do , with what you will create your model?

CUDA Out of memory while trying to use the pre-trained model

If I try to use my gpu I get this error

RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 2.00 GiB total capacity; 948.49 MiB already allocated; 275.23 MiB free; 1.06 GiB reserved in total by PyTorch)

I have the following PC specs:

Operating System: Windows 10
Processor: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
Graphic card: NVIDIA GeForce GTX 750 Ti with 2 GB of VRAM
System RAM: 8 GB

Is there a specific amount of VRAM needed to use the model on the gpu ?
Is there a way to lower the amount of VRAM ?
I tried:

  • python inference.py -i file.mp3 -g 0 -w 256 but I got the error
    Traceback (most recent call last): File "inference.py", line 104, in <module> main() File "inference.py", line 64, in main pred = model.predict(X_window) File "C:\Users\AlbyTree\Vocal-Remover-DeepLearning\lib\nets.py", line 84, in predict assert h.size()[3] > 0 AssertionError
  • python inference.py -i file.mp3 -g 0 -l 512 and it worked but the quality wasn't good as using the default values on the CPU

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.