tsurumeso / vocal-remover Goto Github PK
View Code? Open in Web Editor NEWVocal Remover using Deep Neural Networks
License: MIT License
Vocal Remover using Deep Neural Networks
License: MIT License
Hello,
I'm trying to train on a dataset with thousands of .wav files that are about 4 minutes long, but I run out of RAM with anything larger than 75 files with 16GB of RAM:
Traceback (most recent call last):
File "train.py", line 98, in <module>
hop_length=args.hop_length)
File "C:\vocal-remover\lib\dataset.py", line 33, in create_dataset
(len_dataset, 2, hop_length, cropsize), dtype=np.float32)
MemoryError
So theorically I would still run out of RAM even if I had 128GB and tried to train on more than 600 files (128 / 16 = 8 , 8 * 75 = 600) , I'm trying to achieve similar results to http://krisp.ai , but that needs thousands of files, not just a few hundred.
Is it possible to increase the number of files or make the training scalable? (ie. split it on 75 files chunks and resume training on the next chunk automatically)
I have an audio splitting discord, itβs based on Spleeter but itβs for really any kind of audio splitting tools!
Hereβs the link: https://discord.gg/P7FhQFH
Tsurumeso, Can you add in this program a feature to separate the womanβs voice from the manβs voice
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
loading model... Traceback (most recent call last):
File "inference.py", line 104, in
main()
File "inference.py", line 31, in main
model.load_state_dict(torch.load(args.model, map_location=device))
File "/media/luissiqueira/storage/workspace/roove/vocal-remover/venv/lib/python3.6/site-packages/torch/serialization.py", line 584, in load
with _open_file_like(f, 'rb') as opened_file:
File "/media/luissiqueira/storage/workspace/roove/vocal-remover/venv/lib/python3.6/site-packages/torch/serialization.py", line 234, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/media/luissiqueira/storage/workspace/roove/vocal-remover/venv/lib/python3.6/site-packages/torch/serialization.py", line 215, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'models/baseline.pth'
Hello again!
I have a question about the training process. I'm currently training a model with 170 pairs and I'm currently on my 29th epoch after about 16 hours -
_# epoch 29
My understanding is the training loss & validation loss should be close to each other, with the validation loss always needing to be slightly higher than the training loss.
Here are my questions -
Thank you so much in advance!
Could you tell me about the trained data?
How much time and what data set was used?
and I want to know what options used for trained data.
thank you.
I tried some songs,The singer's voice is not removed from these songs
make your vocal remover remove all vocals from all songs
Line 30 interference.py
model.load_state_dict(torch.load(args.model, map_location=torch.device('cpu')))
What is the best way to ensure a model is trained with more layers?
How exactly do I increase it's size?
I tried to run the inferece on jetson nano platform, for a 28 seconds music, it took 35 seconds to separate the song into vocal and music. the network seems too big to suit for real-time processing on lightweight platform. Is there any effort on how to reduce the network size?
This isn't an issue, but I wanted to let you and the community here know. This GUI is 100% based on your vocal remover, options included. This was a joint project between another coder and myself. Feel free to use, edit, and implement as you wish! Great job on this AI!
I also included 2 additional models that I trained myself. One trained on 700 pairs and another trained on trained data. Both of them came out GREAT!
You can access it via the following:
I've upgraded my PC recently and I now have the Nvidia Geoforce RTX 2080 Ti with 11GB's of V-RAM. I'm able to run conversions just fine with all the correct cuda drivers installed, but I'm having a lot of trouble training. Once the data loads and it starts on the first Epoch, it just stops after a few seconds. No errors, nothing. It just closes. I tried debugging the script in Visual Studio Code to look for errors, but even then nothing, no errors. I verified my configuration and verified that my other PC with a GTX 1060 is able to run the exact same script with no issues.
I verified there was nothing wrong with the GPU as it's able to run high end games with absolutely no issues. Have you experienced this? Is there something I need to look out for?
System specs -
CPU: i9 9900K
RAM: 48 GB
Motherboard: Z390 ASAP Phantom Gaming 6
Comparing this solution with a Spleeter, I noticed that in most cases the Spleeter keeps less vocals, but greatly degrades the quality of the accompaniment. I like the sound quality of this vocal remover better, but unfortunately it doesnβt remove all the vocals.
I would like to try to find a compromise between cleanliness and quality of accompaniment. Is it possible to adjust?
I just installed everything and tried the script. I have 2 GPUs, the internal one is 1GB and NVIDIA 2GB. When I use command gpu 0 it tells me you ran out of memory but when I use gpu 1 it says the following
c:\vocal-remover>inference.py --input 1.wav --gpu 1
loading model... Traceback (most recent call last):
File "C:\vocal-remover\inference.py", line 104, in
main()
File "C:\vocal-remover\inference.py", line 34, in main
model.to(device)
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 443, in to
return self._apply(convert)
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 203, in _apply
module._apply(fn)
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 203, in _apply
module._apply(fn)
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 203, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 225, in _apply
param_applied = fn(param)
File "C:\Users\samsung\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 441, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
*PS. I am a noob with 0 knowledge and would love a simple run down on how to fix this. Thank you.
(env) C:\Users\...\vocal-remover>python train.py -i "dataset_test/instrumentals" -m "dataset_test/mixtures" -M
03_test_mix.wav 03_test_inst.wav
01_test_mix.wav 01_test_inst.wav
04_test_mix.wav 04_test_inst.wav
05_test_mix.wav 05_test_inst.wav
02_test_mix.wav 02_test_inst.wav
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [01:52<00:00, 22.41s/it]
0it [00:00, ?it/s]
# epoch 0
* inner epoch 0
Traceback (most recent call last):
File "train.py", line 133, in <module>
train_loss = sum_loss / len(X_train)
ZeroDivisionError: division by zero
I get this error when I try to train on a dataset, can you please help me? (I have also tried with -g 0)
Is it possible to train with half precision and how to do it? It turned out that versions 3.0+ is not enough 11 GB VRAM.
How do I perform data augmentation, also is it to improve a model?
ERROR: Could not find a version that satisfies the requirement torch>=1.3.0 (from -r requirements.txt (line 3)) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.3.0 (from -r requirements.txt (line 3))
Current error:
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 6.00 GiB total capacity; 4.29 GiB already allocated; 299.14 MiB free; 52.89 MiB cached)
I have the following PC specs -
Operating System: Windows 10
Processor: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz 2.80 GHz
Graphics card: Geforce GTX 1060 with 6GB's of V-RAM
System RAM: 32GB
My issue is I can't even start training with my GPU due to always getting "out of RAM errors" even when I have enough allocated. I'm able to train just fine with my CPU, but I would really like to use my GPU.
My questions:
Thank you so much in advance!
I will use 'vocal-remover' in my Capstone project.
but i dont find 'baseline.pth' model.
how can i get the model??
Tsurumeso answer me please #70 thanks
Traceback (most recent call last):
File "inference.py", line 11, in
from lib import dataset
File "C:\Users\DESKTOP\Desktop\vocal-remover\lib\dataset.py", line 10, in
class VocalRemoverValidationSet(torch.utils.data.Dataset):
AttributeError: module 'torch.utils' has no attribute 'data'
tsurumeso read this #33
Hello!
Would it be possible for you to make an update to the AI that adds layers to the model? It seems that it hit's a limit after about 4 days of training (the validation loss/training loss stagnates after about 4 days) (this is on a 650 pair dataset). I want to be able to train over the course of a longer time period to achieve better training/validation losses before a model reaches it's limit. Would this be possible?
Thank you in advance for your help!
I tried this another model from Anjok07 https://github.com/Anjok07/ultimatevocalremovergui
and when Im gonna use the model this comes
loading model... Traceback (most recent call last):
File "inference.py", line 119, in
main()
File "inference.py", line 65, in main
model.load_state_dict(torch.load(args.pretrained_model, map_location=device))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for CascadedASPPNet:
Missing key(s) in state_dict: "stg1_low_band_net.enc1.conv1.conv.0.weight", "stg1_low_band_net.enc1.conv1.conv.1.weight", "stg1_low_band_net.enc1.conv1.conv.1.bias", "stg1_low_band_net.enc1.conv1.conv.1.running_mean", "stg1_low_band_net.enc1.conv1.conv.1.running_var", "stg1_low_band_net.enc1.conv2.conv.0.weight", "stg1_low_band_net.enc1.conv2.conv.1.weight", "stg1_low_band_net.enc1.conv2.conv.1.bias", "stg1_low_band_net.enc1.conv2.conv.1.running_mean", "stg1_low_band_net.enc1.conv2.conv.1.running_var", "stg1_low_band_net.enc2.conv1.conv.0.weight", "stg1_low_band_net.enc2.conv1.conv.1.weight", "stg1_low_band_net.enc2.conv1.conv.1.bias", "stg1_low_band_net.enc2.conv1.conv.1.running_mean", "stg1_low_band_net.enc2.conv1.conv.1.running_var", "stg1_low_band_net.enc2.conv2.conv.0.weight", "stg1_low_band_net.enc2.conv2.conv.1.weight", "stg1_low_band_net.enc2.conv2.conv.1.bias", "stg1_low_band_net.enc2.conv2.conv.1.running_mean", "stg1_low_band_net.enc2.conv2.conv.1.running_var", "stg1_low_band_net.enc3.conv1.conv.0.weight", "stg1_low_band_net.enc3.conv1.conv.1.weight", "stg1_low_band_net.enc3.conv1.conv.1.bias", "stg1_low_band_net.enc3.conv1.conv.1.running_mean", "stg1_low_band_net.enc3.conv1.conv.1.running_var", "stg1_low_band_net.enc3.conv2.conv.0.weight", "stg1_low_band_net.enc3.conv2.conv.1.weight", "stg1_low_band_net.enc3.conv2.conv.1.bias", "stg1_low_band_net.enc3.conv2.conv.1.running_mean", "stg1_low_band_net.enc3.conv2.conv.1.running_var", "stg1_low_band_net.enc4.conv1.conv.0.weight", "stg1_low_band_net.enc4.conv1.conv.1.weight", "stg1_low_band_net.enc4.conv1.conv.1.bias", "stg1_low_band_net.enc4.conv1.conv.1.running_mean", "stg1_low_band_net.enc4.conv1.conv.1.running_var", "stg1_low_band_net.enc4.conv2.conv.0.weight", "stg1_low_band_net.enc4.conv2.conv.1.weight", "stg1_low_band_net.enc4.conv2.conv.1.bias", "stg1_low_band_net.enc4.conv2.conv.1.running_mean", "stg1_low_band_net.enc4.conv2.conv.1.running_var", "stg1_low_band_net.aspp.conv1.1.conv.0.weight", "stg1_low_band_net.aspp.conv1.1.conv.1.weight", "stg1_low_band_net.aspp.conv1.1.conv.1.bias", "stg1_low_band_net.aspp.conv1.1.conv.1.running_mean", "stg1_low_band_net.aspp.conv1.1.conv.1.running_var", "stg1_low_band_net.aspp.conv2.conv.0.weight", "stg1_low_band_net.aspp.conv2.conv.1.weight", "stg1_low_band_net.aspp.conv2.conv.1.bias", "stg1_low_band_net.aspp.conv2.conv.1.running_mean", "stg1_low_band_net.aspp.conv2.conv.1.running_var", "stg1_low_band_net.aspp.conv3.conv.0.weight", "stg1_low_band_net.aspp.conv3.conv.1.weight", "stg1_low_band_net.aspp.conv3.conv.2.weight", "stg1_low_band_net.aspp.conv3.conv.2.bias", "stg1_low_band_net.aspp.conv3.conv.2.running_mean", "stg1_low_band_net.aspp.conv3.conv.2.running_var", "stg1_low_band_net.aspp.conv4.conv.0.weight", "stg1_low_band_net.aspp.conv4.conv.1.weight", "stg1_low_band_net.aspp.conv4.conv.2.weight", "stg1_low_band_net.aspp.conv4.conv.2.bias", "stg1_low_band_net.aspp.conv4.conv.2.running_mean", "stg1_low_band_net.aspp.conv4.conv.2.running_var", "stg1_low_band_net.aspp.conv5.conv.0.weight", "stg1_low_band_net.aspp.conv5.conv.1.weight", "stg1_low_band_net.aspp.conv5.conv.2.weight", "stg1_low_band_net.aspp.conv5.conv.2.bias", "stg1_low_band_net.aspp.conv5.conv.2.running_mean", "stg1_low_band_net.aspp.conv5.conv.2.running_var", "stg1_low_band_net.aspp.bottleneck.0.conv.0.weight", "stg1_low_band_net.aspp.bottleneck.0.conv.1.weight", "stg1_low_band_net.aspp.bottleneck.0.conv.1.bias", "stg1_low_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg1_low_band_net.aspp.bottleneck.0.conv.1.running_var", "stg1_low_band_net.dec4.conv.conv.0.weight", "stg1_low_band_net.dec4.conv.conv.1.weight", "stg1_low_band_net.dec4.conv.conv.1.bias", "stg1_low_band_net.dec4.conv.conv.1.running_mean", "stg1_low_band_net.dec4.conv.conv.1.running_var", "stg1_low_band_net.dec3.conv.conv.0.weight", "stg1_low_band_net.dec3.conv.conv.1.weight", "stg1_low_band_net.dec3.conv.conv.1.bias", "stg1_low_band_net.dec3.conv.conv.1.running_mean", "stg1_low_band_net.dec3.conv.conv.1.running_var", "stg1_low_band_net.dec2.conv.conv.0.weight", "stg1_low_band_net.dec2.conv.conv.1.weight", "stg1_low_band_net.dec2.conv.conv.1.bias", "stg1_low_band_net.dec2.conv.conv.1.running_mean", "stg1_low_band_net.dec2.conv.conv.1.running_var", "stg1_low_band_net.dec1.conv.conv.0.weight", "stg1_low_band_net.dec1.conv.conv.1.weight", "stg1_low_band_net.dec1.conv.conv.1.bias", "stg1_low_band_net.dec1.conv.conv.1.running_mean", "stg1_low_band_net.dec1.conv.conv.1.running_var", "stg1_high_band_net.enc1.conv1.conv.0.weight", "stg1_high_band_net.enc1.conv1.conv.1.weight", "stg1_high_band_net.enc1.conv1.conv.1.bias", "stg1_high_band_net.enc1.conv1.conv.1.running_mean", "stg1_high_band_net.enc1.conv1.conv.1.running_var", "stg1_high_band_net.enc1.conv2.conv.0.weight", "stg1_high_band_net.enc1.conv2.conv.1.weight", "stg1_high_band_net.enc1.conv2.conv.1.bias", "stg1_high_band_net.enc1.conv2.conv.1.running_mean", "stg1_high_band_net.enc1.conv2.conv.1.running_var", "stg1_high_band_net.enc2.conv1.conv.0.weight", "stg1_high_band_net.enc2.conv1.conv.1.weight", "stg1_high_band_net.enc2.conv1.conv.1.bias", "stg1_high_band_net.enc2.conv1.conv.1.running_mean", "stg1_high_band_net.enc2.conv1.conv.1.running_var", "stg1_high_band_net.enc2.conv2.conv.0.weight", "stg1_high_band_net.enc2.conv2.conv.1.weight", "stg1_high_band_net.enc2.conv2.conv.1.bias", "stg1_high_band_net.enc2.conv2.conv.1.running_mean", "stg1_high_band_net.enc2.conv2.conv.1.running_var", "stg1_high_band_net.enc3.conv1.conv.0.weight", "stg1_high_band_net.enc3.conv1.conv.1.weight", "stg1_high_band_net.enc3.conv1.conv.1.bias", "stg1_high_band_net.enc3.conv1.conv.1.running_mean", "stg1_high_band_net.enc3.conv1.conv.1.running_var", "stg1_high_band_net.enc3.conv2.conv.0.weight", "stg1_high_band_net.enc3.conv2.conv.1.weight", "stg1_high_band_net.enc3.conv2.conv.1.bias", "stg1_high_band_net.enc3.conv2.conv.1.running_mean", "stg1_high_band_net.enc3.conv2.conv.1.running_var", "stg1_high_band_net.enc4.conv1.conv.0.weight", "stg1_high_band_net.enc4.conv1.conv.1.weight", "stg1_high_band_net.enc4.conv1.conv.1.bias", "stg1_high_band_net.enc4.conv1.conv.1.running_mean", "stg1_high_band_net.enc4.conv1.conv.1.running_var", "stg1_high_band_net.enc4.conv2.conv.0.weight", "stg1_high_band_net.enc4.conv2.conv.1.weight", "stg1_high_band_net.enc4.conv2.conv.1.bias", "stg1_high_band_net.enc4.conv2.conv.1.running_mean", "stg1_high_band_net.enc4.conv2.conv.1.running_var", "stg1_high_band_net.aspp.conv1.1.conv.0.weight", "stg1_high_band_net.aspp.conv1.1.conv.1.weight", "stg1_high_band_net.aspp.conv1.1.conv.1.bias", "stg1_high_band_net.aspp.conv1.1.conv.1.running_mean", "stg1_high_band_net.aspp.conv1.1.conv.1.running_var", "stg1_high_band_net.aspp.conv2.conv.0.weight", "stg1_high_band_net.aspp.conv2.conv.1.weight", "stg1_high_band_net.aspp.conv2.conv.1.bias", "stg1_high_band_net.aspp.conv2.conv.1.running_mean", "stg1_high_band_net.aspp.conv2.conv.1.running_var", "stg1_high_band_net.aspp.conv3.conv.0.weight", "stg1_high_band_net.aspp.conv3.conv.1.weight", "stg1_high_band_net.aspp.conv3.conv.2.weight", "stg1_high_band_net.aspp.conv3.conv.2.bias", "stg1_high_band_net.aspp.conv3.conv.2.running_mean", "stg1_high_band_net.aspp.conv3.conv.2.running_var", "stg1_high_band_net.aspp.conv4.conv.0.weight", "stg1_high_band_net.aspp.conv4.conv.1.weight", "stg1_high_band_net.aspp.conv4.conv.2.weight", "stg1_high_band_net.aspp.conv4.conv.2.bias", "stg1_high_band_net.aspp.conv4.conv.2.running_mean", "stg1_high_band_net.aspp.conv4.conv.2.running_var", "stg1_high_band_net.aspp.conv5.conv.0.weight", "stg1_high_band_net.aspp.conv5.conv.1.weight", "stg1_high_band_net.aspp.conv5.conv.2.weight", "stg1_high_band_net.aspp.conv5.conv.2.bias", "stg1_high_band_net.aspp.conv5.conv.2.running_mean", "stg1_high_band_net.aspp.conv5.conv.2.running_var", "stg1_high_band_net.aspp.bottleneck.0.conv.0.weight", "stg1_high_band_net.aspp.bottleneck.0.conv.1.weight", "stg1_high_band_net.aspp.bottleneck.0.conv.1.bias", "stg1_high_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg1_high_band_net.aspp.bottleneck.0.conv.1.running_var", "stg1_high_band_net.dec4.conv.conv.0.weight", "stg1_high_band_net.dec4.conv.conv.1.weight", "stg1_high_band_net.dec4.conv.conv.1.bias", "stg1_high_band_net.dec4.conv.conv.1.running_mean", "stg1_high_band_net.dec4.conv.conv.1.running_var", "stg1_high_band_net.dec3.conv.conv.0.weight", "stg1_high_band_net.dec3.conv.conv.1.weight", "stg1_high_band_net.dec3.conv.conv.1.bias", "stg1_high_band_net.dec3.conv.conv.1.running_mean", "stg1_high_band_net.dec3.conv.conv.1.running_var", "stg1_high_band_net.dec2.conv.conv.0.weight", "stg1_high_band_net.dec2.conv.conv.1.weight", "stg1_high_band_net.dec2.conv.conv.1.bias", "stg1_high_band_net.dec2.conv.conv.1.running_mean", "stg1_high_band_net.dec2.conv.conv.1.running_var", "stg1_high_band_net.dec1.conv.conv.0.weight", "stg1_high_band_net.dec1.conv.conv.1.weight", "stg1_high_band_net.dec1.conv.conv.1.bias", "stg1_high_band_net.dec1.conv.conv.1.running_mean", "stg1_high_band_net.dec1.conv.conv.1.running_var", "stg1_full_band_net.enc1.conv1.conv.0.weight", "stg1_full_band_net.enc1.conv1.conv.1.weight", "stg1_full_band_net.enc1.conv1.conv.1.bias", "stg1_full_band_net.enc1.conv1.conv.1.running_mean", "stg1_full_band_net.enc1.conv1.conv.1.running_var", "stg1_full_band_net.enc1.conv2.conv.0.weight", "stg1_full_band_net.enc1.conv2.conv.1.weight", "stg1_full_band_net.enc1.conv2.conv.1.bias", "stg1_full_band_net.enc1.conv2.conv.1.running_mean", "stg1_full_band_net.enc1.conv2.conv.1.running_var", "stg1_full_band_net.enc2.conv1.conv.0.weight", "stg1_full_band_net.enc2.conv1.conv.1.weight", "stg1_full_band_net.enc2.conv1.conv.1.bias", "stg1_full_band_net.enc2.conv1.conv.1.running_mean", "stg1_full_band_net.enc2.conv1.conv.1.running_var", "stg1_full_band_net.enc2.conv2.conv.0.weight", "stg1_full_band_net.enc2.conv2.conv.1.weight", "stg1_full_band_net.enc2.conv2.conv.1.bias", "stg1_full_band_net.enc2.conv2.conv.1.running_mean", "stg1_full_band_net.enc2.conv2.conv.1.running_var", "stg1_full_band_net.enc3.conv1.conv.0.weight", "stg1_full_band_net.enc3.conv1.conv.1.weight", "stg1_full_band_net.enc3.conv1.conv.1.bias", "stg1_full_band_net.enc3.conv1.conv.1.running_mean", "stg1_full_band_net.enc3.conv1.conv.1.running_var", "stg1_full_band_net.enc3.conv2.conv.0.weight", "stg1_full_band_net.enc3.conv2.conv.1.weight", "stg1_full_band_net.enc3.conv2.conv.1.bias", "stg1_full_band_net.enc3.conv2.conv.1.running_mean", "stg1_full_band_net.enc3.conv2.conv.1.running_var", "stg1_full_band_net.enc4.conv1.conv.0.weight", "stg1_full_band_net.enc4.conv1.conv.1.weight", "stg1_full_band_net.enc4.conv1.conv.1.bias", "stg1_full_band_net.enc4.conv1.conv.1.running_mean", "stg1_full_band_net.enc4.conv1.conv.1.running_var", "stg1_full_band_net.enc4.conv2.conv.0.weight", "stg1_full_band_net.enc4.conv2.conv.1.weight", "stg1_full_band_net.enc4.conv2.conv.1.bias", "stg1_full_band_net.enc4.conv2.conv.1.running_mean", "stg1_full_band_net.enc4.conv2.conv.1.running_var", "stg1_full_band_net.aspp.conv1.1.conv.0.weight", "stg1_full_band_net.aspp.conv1.1.conv.1.weight", "stg1_full_band_net.aspp.conv1.1.conv.1.bias", "stg1_full_band_net.aspp.conv1.1.conv.1.running_mean", "stg1_full_band_net.aspp.conv1.1.conv.1.running_var", "stg1_full_band_net.aspp.conv2.conv.0.weight", "stg1_full_band_net.aspp.conv2.conv.1.weight", "stg1_full_band_net.aspp.conv2.conv.1.bias", "stg1_full_band_net.aspp.conv2.conv.1.running_mean", "stg1_full_band_net.aspp.conv2.conv.1.running_var", "stg1_full_band_net.aspp.conv3.conv.0.weight", "stg1_full_band_net.aspp.conv3.conv.1.weight", "stg1_full_band_net.aspp.conv3.conv.2.weight", "stg1_full_band_net.aspp.conv3.conv.2.bias", "stg1_full_band_net.aspp.conv3.conv.2.running_mean", "stg1_full_band_net.aspp.conv3.conv.2.running_var", "stg1_full_band_net.aspp.conv4.conv.0.weight", "stg1_full_band_net.aspp.conv4.conv.1.weight", "stg1_full_band_net.aspp.conv4.conv.2.weight", "stg1_full_band_net.aspp.conv4.conv.2.bias", "stg1_full_band_net.aspp.conv4.conv.2.running_mean", "stg1_full_band_net.aspp.conv4.conv.2.running_var", "stg1_full_band_net.aspp.conv5.conv.0.weight", "stg1_full_band_net.aspp.conv5.conv.1.weight", "stg1_full_band_net.aspp.conv5.conv.2.weight", "stg1_full_band_net.aspp.conv5.conv.2.bias", "stg1_full_band_net.aspp.conv5.conv.2.running_mean", "stg1_full_band_net.aspp.conv5.conv.2.running_var", "stg1_full_band_net.aspp.bottleneck.0.conv.0.weight", "stg1_full_band_net.aspp.bottleneck.0.conv.1.weight", "stg1_full_band_net.aspp.bottleneck.0.conv.1.bias", "stg1_full_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg1_full_band_net.aspp.bottleneck.0.conv.1.running_var", "stg1_full_band_net.dec4.conv.conv.0.weight", "stg1_full_band_net.dec4.conv.conv.1.weight", "stg1_full_band_net.dec4.conv.conv.1.bias", "stg1_full_band_net.dec4.conv.conv.1.running_mean", "stg1_full_band_net.dec4.conv.conv.1.running_var", "stg1_full_band_net.dec3.conv.conv.0.weight", "stg1_full_band_net.dec3.conv.conv.1.weight", "stg1_full_band_net.dec3.conv.conv.1.bias", "stg1_full_band_net.dec3.conv.conv.1.running_mean", "stg1_full_band_net.dec3.conv.conv.1.running_var", "stg1_full_band_net.dec2.conv.conv.0.weight", "stg1_full_band_net.dec2.conv.conv.1.weight", "stg1_full_band_net.dec2.conv.conv.1.bias", "stg1_full_band_net.dec2.conv.conv.1.running_mean", "stg1_full_band_net.dec2.conv.conv.1.running_var", "stg1_full_band_net.dec1.conv.conv.0.weight", "stg1_full_band_net.dec1.conv.conv.1.weight", "stg1_full_band_net.dec1.conv.conv.1.bias", "stg1_full_band_net.dec1.conv.conv.1.running_mean", "stg1_full_band_net.dec1.conv.conv.1.running_var", "stg2_full_band_net.enc1.conv1.conv.0.weight", "stg2_full_band_net.enc1.conv1.conv.1.weight", "stg2_full_band_net.enc1.conv1.conv.1.bias", "stg2_full_band_net.enc1.conv1.conv.1.running_mean", "stg2_full_band_net.enc1.conv1.conv.1.running_var", "stg2_full_band_net.enc1.conv2.conv.0.weight", "stg2_full_band_net.enc1.conv2.conv.1.weight", "stg2_full_band_net.enc1.conv2.conv.1.bias", "stg2_full_band_net.enc1.conv2.conv.1.running_mean", "stg2_full_band_net.enc1.conv2.conv.1.running_var", "stg2_full_band_net.enc2.conv1.conv.0.weight", "stg2_full_band_net.enc2.conv1.conv.1.weight", "stg2_full_band_net.enc2.conv1.conv.1.bias", "stg2_full_band_net.enc2.conv1.conv.1.running_mean", "stg2_full_band_net.enc2.conv1.conv.1.running_var", "stg2_full_band_net.enc2.conv2.conv.0.weight", "stg2_full_band_net.enc2.conv2.conv.1.weight", "stg2_full_band_net.enc2.conv2.conv.1.bias", "stg2_full_band_net.enc2.conv2.conv.1.running_mean", "stg2_full_band_net.enc2.conv2.conv.1.running_var", "stg2_full_band_net.enc3.conv1.conv.0.weight", "stg2_full_band_net.enc3.conv1.conv.1.weight", "stg2_full_band_net.enc3.conv1.conv.1.bias", "stg2_full_band_net.enc3.conv1.conv.1.running_mean", "stg2_full_band_net.enc3.conv1.conv.1.running_var", "stg2_full_band_net.enc3.conv2.conv.0.weight", "stg2_full_band_net.enc3.conv2.conv.1.weight", "stg2_full_band_net.enc3.conv2.conv.1.bias", "stg2_full_band_net.enc3.conv2.conv.1.running_mean", "stg2_full_band_net.enc3.conv2.conv.1.running_var", "stg2_full_band_net.enc4.conv1.conv.0.weight", "stg2_full_band_net.enc4.conv1.conv.1.weight", "stg2_full_band_net.enc4.conv1.conv.1.bias", "stg2_full_band_net.enc4.conv1.conv.1.running_mean", "stg2_full_band_net.enc4.conv1.conv.1.running_var", "stg2_full_band_net.enc4.conv2.conv.0.weight", "stg2_full_band_net.enc4.conv2.conv.1.weight", "stg2_full_band_net.enc4.conv2.conv.1.bias", "stg2_full_band_net.enc4.conv2.conv.1.running_mean", "stg2_full_band_net.enc4.conv2.conv.1.running_var", "stg2_full_band_net.aspp.conv1.1.conv.0.weight", "stg2_full_band_net.aspp.conv1.1.conv.1.weight", "stg2_full_band_net.aspp.conv1.1.conv.1.bias", "stg2_full_band_net.aspp.conv1.1.conv.1.running_mean", "stg2_full_band_net.aspp.conv1.1.conv.1.running_var", "stg2_full_band_net.aspp.conv2.conv.0.weight", "stg2_full_band_net.aspp.conv2.conv.1.weight", "stg2_full_band_net.aspp.conv2.conv.1.bias", "stg2_full_band_net.aspp.conv2.conv.1.running_mean", "stg2_full_band_net.aspp.conv2.conv.1.running_var", "stg2_full_band_net.aspp.conv3.conv.0.weight", "stg2_full_band_net.aspp.conv3.conv.1.weight", "stg2_full_band_net.aspp.conv3.conv.2.weight", "stg2_full_band_net.aspp.conv3.conv.2.bias", "stg2_full_band_net.aspp.conv3.conv.2.running_mean", "stg2_full_band_net.aspp.conv3.conv.2.running_var", "stg2_full_band_net.aspp.conv4.conv.0.weight", "stg2_full_band_net.aspp.conv4.conv.1.weight", "stg2_full_band_net.aspp.conv4.conv.2.weight", "stg2_full_band_net.aspp.conv4.conv.2.bias", "stg2_full_band_net.aspp.conv4.conv.2.running_mean", "stg2_full_band_net.aspp.conv4.conv.2.running_var", "stg2_full_band_net.aspp.conv5.conv.0.weight", "stg2_full_band_net.aspp.conv5.conv.1.weight", "stg2_full_band_net.aspp.conv5.conv.2.weight", "stg2_full_band_net.aspp.conv5.conv.2.bias", "stg2_full_band_net.aspp.conv5.conv.2.running_mean", "stg2_full_band_net.aspp.conv5.conv.2.running_var", "stg2_full_band_net.aspp.bottleneck.0.conv.0.weight", "stg2_full_band_net.aspp.bottleneck.0.conv.1.weight", "stg2_full_band_net.aspp.bottleneck.0.conv.1.bias", "stg2_full_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg2_full_band_net.aspp.bottleneck.0.conv.1.running_var", "stg2_full_band_net.dec4.conv.conv.0.weight", "stg2_full_band_net.dec4.conv.conv.1.weight", "stg2_full_band_net.dec4.conv.conv.1.bias", "stg2_full_band_net.dec4.conv.conv.1.running_mean", "stg2_full_band_net.dec4.conv.conv.1.running_var", "stg2_full_band_net.dec3.conv.conv.0.weight", "stg2_full_band_net.dec3.conv.conv.1.weight", "stg2_full_band_net.dec3.conv.conv.1.bias", "stg2_full_band_net.dec3.conv.conv.1.running_mean", "stg2_full_band_net.dec3.conv.conv.1.running_var", "stg2_full_band_net.dec2.conv.conv.0.weight", "stg2_full_band_net.dec2.conv.conv.1.weight", "stg2_full_band_net.dec2.conv.conv.1.bias", "stg2_full_band_net.dec2.conv.conv.1.running_mean", "stg2_full_band_net.dec2.conv.conv.1.running_var", "stg2_full_band_net.dec1.conv.conv.0.weight", "stg2_full_band_net.dec1.conv.conv.1.weight", "stg2_full_band_net.dec1.conv.conv.1.bias", "stg2_full_band_net.dec1.conv.conv.1.running_mean", "stg2_full_band_net.dec1.conv.conv.1.running_var", "out.weight".
Unexpected key(s) in state_dict: "low_band_net.enc1.conv1.conv.0.weight", "low_band_net.enc1.conv1.conv.1.weight", "low_band_net.enc1.conv1.conv.1.bias", "low_band_net.enc1.conv1.conv.1.running_mean", "low_band_net.enc1.conv1.conv.1.running_var", "low_band_net.enc1.conv1.conv.1.num_batches_tracked", "low_band_net.enc1.conv2.conv.0.weight", "low_band_net.enc1.conv2.conv.1.weight", "low_band_net.enc1.conv2.conv.1.bias", "low_band_net.enc1.conv2.conv.1.running_mean", "low_band_net.enc1.conv2.conv.1.running_var", "low_band_net.enc1.conv2.conv.1.num_batches_tracked", "low_band_net.enc2.conv1.conv.0.weight", "low_band_net.enc2.conv1.conv.1.weight", "low_band_net.enc2.conv1.conv.1.bias", "low_band_net.enc2.conv1.conv.1.running_mean", "low_band_net.enc2.conv1.conv.1.running_var", "low_band_net.enc2.conv1.conv.1.num_batches_tracked", "low_band_net.enc2.conv2.conv.0.weight", "low_band_net.enc2.conv2.conv.1.weight", "low_band_net.enc2.conv2.conv.1.bias", "low_band_net.enc2.conv2.conv.1.running_mean", "low_band_net.enc2.conv2.conv.1.running_var", "low_band_net.enc2.conv2.conv.1.num_batches_tracked", "low_band_net.enc3.conv1.conv.0.weight", "low_band_net.enc3.conv1.conv.1.weight", "low_band_net.enc3.conv1.conv.1.bias", "low_band_net.enc3.conv1.conv.1.running_mean", "low_band_net.enc3.conv1.conv.1.running_var", "low_band_net.enc3.conv1.conv.1.num_batches_tracked", "low_band_net.enc3.conv2.conv.0.weight", "low_band_net.enc3.conv2.conv.1.weight", "low_band_net.enc3.conv2.conv.1.bias", "low_band_net.enc3.conv2.conv.1.running_mean", "low_band_net.enc3.conv2.conv.1.running_var", "low_band_net.enc3.conv2.conv.1.num_batches_tracked", "low_band_net.enc4.conv1.conv.0.weight", "low_band_net.enc4.conv1.conv.1.weight", "low_band_net.enc4.conv1.conv.1.bias", "low_band_net.enc4.conv1.conv.1.running_mean", "low_band_net.enc4.conv1.conv.1.running_var", "low_band_net.enc4.conv1.conv.1.num_batches_tracked", "low_band_net.enc4.conv2.conv.0.weight", "low_band_net.enc4.conv2.conv.1.weight", "low_band_net.enc4.conv2.conv.1.bias", "low_band_net.enc4.conv2.conv.1.running_mean", "low_band_net.enc4.conv2.conv.1.running_var", "low_band_net.enc4.conv2.conv.1.num_batches_tracked", "low_band_net.aspp.conv1.1.conv.0.weight", "low_band_net.aspp.conv1.1.conv.1.weight", "low_band_net.aspp.conv1.1.conv.1.bias", "low_band_net.aspp.conv1.1.conv.1.running_mean", "low_band_net.aspp.conv1.1.conv.1.running_var", "low_band_net.aspp.conv1.1.conv.1.num_batches_tracked", "low_band_net.aspp.conv2.conv.0.weight", "low_band_net.aspp.conv2.conv.1.weight", "low_band_net.aspp.conv2.conv.1.bias", "low_band_net.aspp.conv2.conv.1.running_mean", "low_band_net.aspp.conv2.conv.1.running_var", "low_band_net.aspp.conv2.conv.1.num_batches_tracked", "low_band_net.aspp.conv3.conv.0.weight", "low_band_net.aspp.conv3.conv.1.weight", "low_band_net.aspp.conv3.conv.2.weight", "low_band_net.aspp.conv3.conv.2.bias", "low_band_net.aspp.conv3.conv.2.running_mean", "low_band_net.aspp.conv3.conv.2.running_var", "low_band_net.aspp.conv3.conv.2.num_batches_tracked", "low_band_net.aspp.conv4.conv.0.weight", "low_band_net.aspp.conv4.conv.1.weight", "low_band_net.aspp.conv4.conv.2.weight", "low_band_net.aspp.conv4.conv.2.bias", "low_band_net.aspp.conv4.conv.2.running_mean", "low_band_net.aspp.conv4.conv.2.running_var", "low_band_net.aspp.conv4.conv.2.num_batches_tracked", "low_band_net.aspp.conv5.conv.0.weight", "low_band_net.aspp.conv5.conv.1.weight", "low_band_net.aspp.conv5.conv.2.weight", "low_band_net.aspp.conv5.conv.2.bias", "low_band_net.aspp.conv5.conv.2.running_mean", "low_band_net.aspp.conv5.conv.2.running_var", "low_band_net.aspp.conv5.conv.2.num_batches_tracked", "low_band_net.aspp.bottleneck.0.conv.0.weight", "low_band_net.aspp.bottleneck.0.conv.1.weight", "low_band_net.aspp.bottleneck.0.conv.1.bias", "low_band_net.aspp.bottleneck.0.conv.1.running_mean", "low_band_net.aspp.bottleneck.0.conv.1.running_var", "low_band_net.aspp.bottleneck.0.conv.1.num_batches_tracked", "low_band_net.dec4.conv.conv.0.weight", "low_band_net.dec4.conv.conv.1.weight", "low_band_net.dec4.conv.conv.1.bias", "low_band_net.dec4.conv.conv.1.running_mean", "low_band_net.dec4.conv.conv.1.running_var", "low_band_net.dec4.conv.conv.1.num_batches_tracked", "low_band_net.dec3.conv.conv.0.weight", "low_band_net.dec3.conv.conv.1.weight", "low_band_net.dec3.conv.conv.1.bias", "low_band_net.dec3.conv.conv.1.running_mean", "low_band_net.dec3.conv.conv.1.running_var", "low_band_net.dec3.conv.conv.1.num_batches_tracked", "low_band_net.dec2.conv.conv.0.weight", "low_band_net.dec2.conv.conv.1.weight", "low_band_net.dec2.conv.conv.1.bias", "low_band_net.dec2.conv.conv.1.running_mean", "low_band_net.dec2.conv.conv.1.running_var", "low_band_net.dec2.conv.conv.1.num_batches_tracked", "low_band_net.dec1.conv.conv.0.weight", "low_band_net.dec1.conv.conv.1.weight", "low_band_net.dec1.conv.conv.1.bias", "low_band_net.dec1.conv.conv.1.running_mean", "low_band_net.dec1.conv.conv.1.running_var", "low_band_net.dec1.conv.conv.1.num_batches_tracked", "high_band_net.enc1.conv1.conv.0.weight", "high_band_net.enc1.conv1.conv.1.weight", "high_band_net.enc1.conv1.conv.1.bias", "high_band_net.enc1.conv1.conv.1.running_mean", "high_band_net.enc1.conv1.conv.1.running_var", "high_band_net.enc1.conv1.conv.1.num_batches_tracked", "high_band_net.enc1.conv2.conv.0.weight", "high_band_net.enc1.conv2.conv.1.weight", "high_band_net.enc1.conv2.conv.1.bias", "high_band_net.enc1.conv2.conv.1.running_mean", "high_band_net.enc1.conv2.conv.1.running_var", "high_band_net.enc1.conv2.conv.1.num_batches_tracked", "high_band_net.enc2.conv1.conv.0.weight", "high_band_net.enc2.conv1.conv.1.weight", "high_band_net.enc2.conv1.conv.1.bias", "high_band_net.enc2.conv1.conv.1.running_mean", "high_band_net.enc2.conv1.conv.1.running_var", "high_band_net.enc2.conv1.conv.1.num_batches_tracked", "high_band_net.enc2.conv2.conv.0.weight", "high_band_net.enc2.conv2.conv.1.weight", "high_band_net.enc2.conv2.conv.1.bias", "high_band_net.enc2.conv2.conv.1.running_mean", "high_band_net.enc2.conv2.conv.1.running_var", "high_band_net.enc2.conv2.conv.1.num_batches_tracked", "high_band_net.enc3.conv1.conv.0.weight", "high_band_net.enc3.conv1.conv.1.weight", "high_band_net.enc3.conv1.conv.1.bias", "high_band_net.enc3.conv1.conv.1.running_mean", "high_band_net.enc3.conv1.conv.1.running_var", "high_band_net.enc3.conv1.conv.1.num_batches_tracked", "high_band_net.enc3.conv2.conv.0.weight", "high_band_net.enc3.conv2.conv.1.weight", "high_band_net.enc3.conv2.conv.1.bias", "high_band_net.enc3.conv2.conv.1.running_mean", "high_band_net.enc3.conv2.conv.1.running_var", "high_band_net.enc3.conv2.conv.1.num_batches_tracked", "high_band_net.enc4.conv1.conv.0.weight", "high_band_net.enc4.conv1.conv.1.weight", "high_band_net.enc4.conv1.conv.1.bias", "high_band_net.enc4.conv1.conv.1.running_mean", "high_band_net.enc4.conv1.conv.1.running_var", "high_band_net.enc4.conv1.conv.1.num_batches_tracked", "high_band_net.enc4.conv2.conv.0.weight", "high_band_net.enc4.conv2.conv.1.weight", "high_band_net.enc4.conv2.conv.1.bias", "high_band_net.enc4.conv2.conv.1.running_mean", "high_band_net.enc4.conv2.conv.1.running_var", "high_band_net.enc4.conv2.conv.1.num_batches_tracked", "high_band_net.aspp.conv1.1.conv.0.weight", "high_band_net.aspp.conv1.1.conv.1.weight", "high_band_net.aspp.conv1.1.conv.1.bias", "high_band_net.aspp.conv1.1.conv.1.running_mean", "high_band_net.aspp.conv1.1.conv.1.running_var", "high_band_net.aspp.conv1.1.conv.1.num_batches_tracked", "high_band_net.aspp.conv2.conv.0.weight", "high_band_net.aspp.conv2.conv.1.weight", "high_band_net.aspp.conv2.conv.1.bias", "high_band_net.aspp.conv2.conv.1.running_mean", "high_band_net.aspp.conv2.conv.1.running_var", "high_band_net.aspp.conv2.conv.1.num_batches_tracked", "high_band_net.aspp.conv3.conv.0.weight", "high_band_net.aspp.conv3.conv.1.weight", "high_band_net.aspp.conv3.conv.2.weight", "high_band_net.aspp.conv3.conv.2.bias", "high_band_net.aspp.conv3.conv.2.running_mean", "high_band_net.aspp.conv3.conv.2.running_var", "high_band_net.aspp.conv3.conv.2.num_batches_tracked", "high_band_net.aspp.conv4.conv.0.weight", "high_band_net.aspp.conv4.conv.1.weight", "high_band_net.aspp.conv4.conv.2.weight", "high_band_net.aspp.conv4.conv.2.bias", "high_band_net.aspp.conv4.conv.2.running_mean", "high_band_net.aspp.conv4.conv.2.running_var", "high_band_net.aspp.conv4.conv.2.num_batches_tracked", "high_band_net.aspp.conv5.conv.0.weight", "high_band_net.aspp.conv5.conv.1.weight", "high_band_net.aspp.conv5.conv.2.weight", "high_band_net.aspp.conv5.conv.2.bias", "high_band_net.aspp.conv5.conv.2.running_mean", "high_band_net.aspp.conv5.conv.2.running_var", "high_band_net.aspp.conv5.conv.2.num_batches_tracked", "high_band_net.aspp.bottleneck.0.conv.0.weight", "high_band_net.aspp.bottleneck.0.conv.1.weight", "high_band_net.aspp.bottleneck.0.conv.1.bias", "high_band_net.aspp.bottleneck.0.conv.1.running_mean", "high_band_net.aspp.bottleneck.0.conv.1.running_var", "high_band_net.aspp.bottleneck.0.conv.1.num_batches_tracked", "high_band_net.dec4.conv.conv.0.weight", "high_band_net.dec4.conv.conv.1.weight", "high_band_net.dec4.conv.conv.1.bias", "high_band_net.dec4.conv.conv.1.running_mean", "high_band_net.dec4.conv.conv.1.running_var", "high_band_net.dec4.conv.conv.1.num_batches_tracked", "high_band_net.dec3.conv.conv.0.weight", "high_band_net.dec3.conv.conv.1.weight", "high_band_net.dec3.conv.conv.1.bias", "high_band_net.dec3.conv.conv.1.running_mean", "high_band_net.dec3.conv.conv.1.running_var", "high_band_net.dec3.conv.conv.1.num_batches_tracked", "high_band_net.dec2.conv.conv.0.weight", "high_band_net.dec2.conv.conv.1.weight", "high_band_net.dec2.conv.conv.1.bias", "high_band_net.dec2.conv.conv.1.running_mean", "high_band_net.dec2.conv.conv.1.running_var", "high_band_net.dec2.conv.conv.1.num_batches_tracked", "high_band_net.dec1.conv.conv.0.weight", "high_band_net.dec1.conv.conv.1.weight", "high_band_net.dec1.conv.conv.1.bias", "high_band_net.dec1.conv.conv.1.running_mean", "high_band_net.dec1.conv.conv.1.running_var", "high_band_net.dec1.conv.conv.1.num_batches_tracked", "full_band_net.enc1.conv1.conv.0.weight", "full_band_net.enc1.conv1.conv.1.weight", "full_band_net.enc1.conv1.conv.1.bias", "full_band_net.enc1.conv1.conv.1.running_mean", "full_band_net.enc1.conv1.conv.1.running_var", "full_band_net.enc1.conv1.conv.1.num_batches_tracked", "full_band_net.enc1.conv2.conv.0.weight", "full_band_net.enc1.conv2.conv.1.weight", "full_band_net.enc1.conv2.conv.1.bias", "full_band_net.enc1.conv2.conv.1.running_mean", "full_band_net.enc1.conv2.conv.1.running_var", "full_band_net.enc1.conv2.conv.1.num_batches_tracked", "full_band_net.enc2.conv1.conv.0.weight", "full_band_net.enc2.conv1.conv.1.weight", "full_band_net.enc2.conv1.conv.1.bias", "full_band_net.enc2.conv1.conv.1.running_mean", "full_band_net.enc2.conv1.conv.1.running_var", "full_band_net.enc2.conv1.conv.1.num_batches_tracked", "full_band_net.enc2.conv2.conv.0.weight", "full_band_net.enc2.conv2.conv.1.weight", "full_band_net.enc2.conv2.conv.1.bias", "full_band_net.enc2.conv2.conv.1.running_mean", "full_band_net.enc2.conv2.conv.1.running_var", "full_band_net.enc2.conv2.conv.1.num_batches_tracked", "full_band_net.enc3.conv1.conv.0.weight", "full_band_net.enc3.conv1.conv.1.weight", "full_band_net.enc3.conv1.conv.1.bias", "full_band_net.enc3.conv1.conv.1.running_mean", "full_band_net.enc3.conv1.conv.1.running_var", "full_band_net.enc3.conv1.conv.1.num_batches_tracked", "full_band_net.enc3.conv2.conv.0.weight", "full_band_net.enc3.conv2.conv.1.weight", "full_band_net.enc3.conv2.conv.1.bias", "full_band_net.enc3.conv2.conv.1.running_mean", "full_band_net.enc3.conv2.conv.1.running_var", "full_band_net.enc3.conv2.conv.1.num_batches_tracked", "full_band_net.enc4.conv1.conv.0.weight", "full_band_net.enc4.conv1.conv.1.weight", "full_band_net.enc4.conv1.conv.1.bias", "full_band_net.enc4.conv1.conv.1.running_mean", "full_band_net.enc4.conv1.conv.1.running_var", "full_band_net.enc4.conv1.conv.1.num_batches_tracked", "full_band_net.enc4.conv2.conv.0.weight", "full_band_net.enc4.conv2.conv.1.weight", "full_band_net.enc4.conv2.conv.1.bias", "full_band_net.enc4.conv2.conv.1.running_mean", "full_band_net.enc4.conv2.conv.1.running_var", "full_band_net.enc4.conv2.conv.1.num_batches_tracked", "full_band_net.aspp.conv1.1.conv.0.weight", "full_band_net.aspp.conv1.1.conv.1.weight", "full_band_net.aspp.conv1.1.conv.1.bias", "full_band_net.aspp.conv1.1.conv.1.running_mean", "full_band_net.aspp.conv1.1.conv.1.running_var", "full_band_net.aspp.conv1.1.conv.1.num_batches_tracked", "full_band_net.aspp.conv2.conv.0.weight", "full_band_net.aspp.conv2.conv.1.weight", "full_band_net.aspp.conv2.conv.1.bias", "full_band_net.aspp.conv2.conv.1.running_mean", "full_band_net.aspp.conv2.conv.1.running_var", "full_band_net.aspp.conv2.conv.1.num_batches_tracked", "full_band_net.aspp.conv3.conv.0.weight", "full_band_net.aspp.conv3.conv.1.weight", "full_band_net.aspp.conv3.conv.2.weight", "full_band_net.aspp.conv3.conv.2.bias", "full_band_net.aspp.conv3.conv.2.running_mean", "full_band_net.aspp.conv3.conv.2.running_var", "full_band_net.aspp.conv3.conv.2.num_batches_tracked", "full_band_net.aspp.conv4.conv.0.weight", "full_band_net.aspp.conv4.conv.1.weight", "full_band_net.aspp.conv4.conv.2.weight", "full_band_net.aspp.conv4.conv.2.bias", "full_band_net.aspp.conv4.conv.2.running_mean", "full_band_net.aspp.conv4.conv.2.running_var", "full_band_net.aspp.conv4.conv.2.num_batches_tracked", "full_band_net.aspp.conv5.conv.0.weight", "full_band_net.aspp.conv5.conv.1.weight", "full_band_net.aspp.conv5.conv.2.weight", "full_band_net.aspp.conv5.conv.2.bias", "full_band_net.aspp.conv5.conv.2.running_mean", "full_band_net.aspp.conv5.conv.2.running_var", "full_band_net.aspp.conv5.conv.2.num_batches_tracked", "full_band_net.aspp.bottleneck.0.conv.0.weight", "full_band_net.aspp.bottleneck.0.conv.1.weight", "full_band_net.aspp.bottleneck.0.conv.1.bias", "full_band_net.aspp.bottleneck.0.conv.1.running_mean", "full_band_net.aspp.bottleneck.0.conv.1.running_var", "full_band_net.aspp.bottleneck.0.conv.1.num_batches_tracked", "full_band_net.dec4.conv.conv.0.weight", "full_band_net.dec4.conv.conv.1.weight", "full_band_net.dec4.conv.conv.1.bias", "full_band_net.dec4.conv.conv.1.running_mean", "full_band_net.dec4.conv.conv.1.running_var", "full_band_net.dec4.conv.conv.1.num_batches_tracked", "full_band_net.dec3.conv.conv.0.weight", "full_band_net.dec3.conv.conv.1.weight", "full_band_net.dec3.conv.conv.1.bias", "full_band_net.dec3.conv.conv.1.running_mean", "full_band_net.dec3.conv.conv.1.running_var", "full_band_net.dec3.conv.conv.1.num_batches_tracked", "full_band_net.dec2.conv.conv.0.weight", "full_band_net.dec2.conv.conv.1.weight", "full_band_net.dec2.conv.conv.1.bias", "full_band_net.dec2.conv.conv.1.running_mean", "full_band_net.dec2.conv.conv.1.running_var", "full_band_net.dec2.conv.conv.1.num_batches_tracked", "full_band_net.dec1.conv.conv.0.weight", "full_band_net.dec1.conv.conv.1.weight", "full_band_net.dec1.conv.conv.1.bias", "full_band_net.dec1.conv.conv.1.running_mean", "full_band_net.dec1.conv.conv.1.running_var", "full_band_net.dec1.conv.conv.1.num_batches_tracked", "out.0.conv.0.weight", "out.0.conv.1.weight", "out.0.conv.1.bias", "out.0.conv.1.running_mean", "out.0.conv.1.running_var", "out.0.conv.1.num_batches_tracked", "out.1.weight".
Quick question, can you explain these options a bit more? What do the mean and how can a higher or lower mixup rate potentially affect the training process?
Hello, I'm loudly thinking if it is possible with this app to make a mono to stereo model?
Do you think this application architecture allows it?
Didn't found any good way to make naturally sounding stereo audio out of single channel source yet. Even the best VST plugins do not not reach that level of quality like ML can do here.
Hello,
Quick question, I noticed there seems to be a greater reduction in the quality of the accompaniment compared to v2. Was this to make the vocal removal more aggressive? If so, what parameters need to be tweaked to bring the accompaniment quality back up to v2?
Thanks!
Hi,
I really like the project, but unfortunately I'm having trouble running your script. I have Python3.7 installed, and have the required Packages installed per 'requirements.txt'.
However, every time I try and run the script, I receive the following traceback:
C:\vocal-remover>python inference.py --input "C:\test.wav"
loading model... done
loading wave source... done
wave source stft... Traceback (most recent call last):
File "inference.py", line 42, in
X, phase = spec_utils.calc_spec(X, args.hop_length, phase=True)
File "C:\vocal-remover\lib\spec_utils.py", line 22, in calc_spec
spec_left = librosa.stft(X[0], n_fft, hop_length=hop_length)
File "C:\Python37\lib\site-packages\librosa\core\spectrum.py", line 215, in stft
util.valid_audio(y)
File "C:\Python37\lib\site-packages\librosa\util\utils.py", line 278, in valid_audio
raise ParameterError('Audio buffer is not Fortran-contiguous. '
librosa.util.exceptions.ParameterError: Audio buffer is not Fortran-contiguous. Use numpy.asfortranarray to ensure Fortran contiguity.
Any ideas? Any sort of support would be highly appreciated. I look forward to running your script in the future! :)
I tried a song, this vocal remover has removed a part of the singer's voice and left the rest did not remove it
How do I use TPU
I found a bug in version 2.2. The last few seconds in the accompaniment are unprocessed. This is noticeable in all songs where vocals are present at the very end.
Examples:
https://www.youtube.com/watch?v=D7oSQ6wV5vU
https://www.youtube.com/watch?v=vXptzKluBLc
In version 2.1 there was no such problem.
self.conv1 = Conv2DBNActiv(nin, nout, ksize, 1, pad, activ=activ)
self.conv2 = Conv2DBNActiv(nout, nout, ksize, stride, pad, activ=activ)
what the "1" means ?
Hello :)
I really loved this and it helped me a lot. I extended it to work with multiple songs (from a folder). Would you be interested in a PR?
It's awesome so I want to contribute so it could help more people maybe.
Have a great day!
Hello, Upon trying to train data with only one file in each folder (to test) I get this error :
line 114, in align_wave_head_and_tail
a_mono = a[:, :sr * 4].sum(axis=0)
IndexError: too many indices for array
Any solution?
EDIT:
Solved : You need to make sure the dataset is stereo and has 2 tracks.
tsurumeso , i want train model command line work on cpu , because When I trained my own model, he said that you need an NVIDIA, NVIDIA graphics card I don't own, I have Intel
(base) C:\Users\DESKTOP\Downloads\vocal-remover-v3.0.0\vocal-remover>python train.py -i dataset/instruments -m dataset/mixtures -M 0.5 -g 0
1 05_too_mix.wav 05_too_inst.wav
Traceback (most recent call last):
File "train.py", line 225, in
main()
File "train.py", line 136, in main
model.cuda()
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 458, in cuda
return self._apply(lambda t: t.cuda(device))
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 354, in _apply
module._apply(fn)
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 354, in _apply
module._apply(fn)
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 354, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 376, in _apply
param_applied = fn(param)
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 458, in
return self.apply(lambda t: t.cuda(device))
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\cuda_init.py", line 186, in _lazy_init
check_driver()
File "C:\Users\DESKTOP\miniconda3\lib\site-packages\torch\cuda_init.py", line 68, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
in some songs I extracted the instrumental, I could hear the original voice playing during certain parts of the song, and in other parts the vocal was removed quietly.
Note: I used post processor to identify this error
For further audio editing it would be best to not reduce the bit depth. Is it possible to write the output with 32bit float?
Hello, great job there.
I ran this code and I had this error, did I miss out something in the project?
python augment.py -i dataset/instrumentals -m dataset/mixtures -p 1
Traceback (most recent call last):
File "augment.py", line 25, in
for fname in os.listdir(args.mixture_dataset)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'dataset/mixtures'
After reading this article (chapter 2.2), I learned a very important thing - the same artist should not be simultaneously appear in the training dataset and the validation dataset. But the train.py script randomly splits the dataset, preventing me from manually distribution the tracks.
Therefore, I suggest to organize the directories as follows:
dataset/
βββΒ training/
βΒ Β Β Β Β Β βββΒ instruments/
βΒ Β Β Β Β Β βΒ Β Β Β Β Β βββΒ 01_foo_inst.wav
βΒ Β Β Β Β Β βΒ Β Β Β Β Β βββΒ 02_bar_inst.wav
βΒ Β Β Β Β Β βΒ Β Β Β Β Β βββΒ ...
βΒ Β Β Β Β Β βββΒ mixtures/
βΒ Β Β Β Β Β Β Β Β Β Β Β Β βββΒ 01_foo_mix.wav
βΒ Β Β Β Β Β Β Β Β Β Β Β Β βββΒ 02_bar_mix.wav
βΒ Β Β Β Β Β Β Β Β Β Β Β Β βββΒ ...
βββΒ validation/
Β Β Β Β Β Β Β βββΒ instruments/
Β Β Β Β Β Β Β βΒ Β Β Β Β Β βββΒ 03_foo_inst.wav
Β Β Β Β Β Β Β βΒ Β Β Β Β Β βββΒ 04_bar_inst.wav
Β Β Β Β Β Β Β βΒ Β Β Β Β Β βββΒ ...
Β Β Β Β Β Β Β βββΒ mixtures/
Β Β Β Β Β Β Β Β Β Β Β Β Β Β βββΒ 03_foo_mix.wav
Β Β Β Β Β Β Β Β Β Β Β Β Β Β βββΒ 04_bar_mix.wav
Β Β Β Β Β Β Β Β Β Β Β Β Β Β βββΒ ...
Can I use custom sampling rate, for example 32000 instead of 44100? I want to do this to increase the number of pairs with the same amount of memory.
I believe that many people do not need frequencies above 16 kHz and I want to try to improve the quality in the audible range. However, I am not sure that the custom sample rate will be compatible with the default window size.
tsurumeso this vocal remover is a great project , but In my opinion, the real vocal remover must remove the singer's voice with the programmation code and the command line only, and not by train a model with a dataset(mix+instrumental) that Which i want to remove vocals from . because if I find instrumental . I will not need to train model To give me the same instrumental
plus , Imagine this instrumental is not available. What will you do , with what you will create your model?
If I try to use my gpu I get this error
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 2.00 GiB total capacity; 948.49 MiB already allocated; 275.23 MiB free; 1.06 GiB reserved in total by PyTorch)
I have the following PC specs:
Operating System: Windows 10
Processor: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
Graphic card: NVIDIA GeForce GTX 750 Ti with 2 GB of VRAM
System RAM: 8 GB
Is there a specific amount of VRAM needed to use the model on the gpu ?
Is there a way to lower the amount of VRAM ?
I tried:
python inference.py -i file.mp3 -g 0 -w 256
but I got the errorTraceback (most recent call last): File "inference.py", line 104, in <module> main() File "inference.py", line 64, in main pred = model.predict(X_window) File "C:\Users\AlbyTree\Vocal-Remover-DeepLearning\lib\nets.py", line 84, in predict assert h.size()[3] > 0 AssertionError
python inference.py -i file.mp3 -g 0 -l 512
and it worked but the quality wasn't good as using the default values on the CPUA declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.