Comments (19)
@f90 ok just realized that this is hardcoded here
# Batch size of 1 sep_input_shape[0] = 1 sep_output_shape[0] = 1 mix_context, sources = Input.get_multitrack_placeholders(sep_output_shape, model_config["num_sources"], sep_input_shape, "input")so to change
batch_size
here I should change thesep_input_shape
.
Be aware that changing the code there means that the internal code further down has to be adapted as well though, since it assumes that we insert one audio segment and get predictions for one back, not multiple.
I am talking about the predict_track
method in particular, where I am basically iterating over the input audio, taking a chunk, predicting the sources, and append the output chunk to the overall output audio. If we do that in batches, we need to collect a bunch of these segments in a loop to fill a batch, predict a batch, then append the outputs in the correct order, and then continue iterating over the input audio, until we have nothing left. However we might end up with only a few segments at the end that do not fill up a complete batch, in that case we would have to pad the batch with zeros...
If i have some time for this, and there is sufficient need indicated from all of you (leave a like/comment here to show that) then I will come around to implement that. I would keep the standard setting to a batch size of 1 though to make sure prediction still works even on small systems.
from wave-u-net.
The part where prediction for an input song is made is actually here:
https://github.com/f90/Wave-U-Net/blob/master/Evaluate.py#L109
What could be changed without a lot of effort would be to change batch size from 1 to the default (16), however that also means prediction requires more RAM/GPU memory. We would have to make sure though that prediction still works exactly the same way as before. Also I am not so sure how much it speeds up prediction especially on CPU.
Multi-GPU implementation is also possible, but requires a bit more effort to get right. Keep in mind this is also all supposed to work right out of the gate without people having to configure the GPU setup.
In case someone wants to provide such fast implementations, I am all ears.
from wave-u-net.
OK so i looked into this a bit more, I implemented a batched variant of prediction and compared running times for a 3 minute input piece. Results:
GPU (1x GTX1080)
- Current version: 5.71s
- Batched version: 4.57s
CPU
- Current version: 161.15s
- Batched version: 157.21s
These numbers give the time spent within the predict_track
method.
The batched version also gave memory warnings for CPU, and was using all my CPU cores at once, so it is not surprising a speedup cannot be achieved this way.
So, to summarise:
- GPU is MUCH faster than CPU
- CPU implementation is already parallelised, so no performance gains possible by batched prediction
- GPU implementation doesn't really benefit from batching either
If prediction time is an issue for you, it can be reduced by
- switching to GPU from CPU
- using a model at a lower sampling rate (e.g. if using 22KHz model instead of 44KHz, it will also predict twice as fast)
- maybe some fancy neural network distillation/compression methods? This is definitely out of the scope of this project though...
Going to close this soon unless there are some good ideas how to improve this otherwise.
from wave-u-net.
I am curious why you expect any improvements with the batched version. But if you want to experiment with it, replace the predict_track
function with this version of it in the code. If it turns out better, just tell me and I can push it to the repository for everyone.
Also you have to comment out
# Batch size of 1
sep_input_shape[0] = 1
sep_output_shape[0] = 1
found in predict
function.
def predict_track(model_config, sess, mix_audio, mix_sr, sep_input_shape, sep_output_shape, separator_sources, mix_context):
'''
Outputs source estimates for a given input mixture signal mix_audio [n_frames, n_channels] and a given Tensorflow session and placeholders belonging to the prediction network.
It iterates through the track, collecting segment-wise predictions to form the output.
:param model_config: Model configuration dictionary
:param sess: Tensorflow session used to run the network inference
:param mix_audio: [n_frames, n_channels] audio signal (numpy array). Can have higher sampling rate or channels than the model supports, will be downsampled correspondingly.
:param mix_sr: Sampling rate of mix_audio
:param sep_input_shape: Input shape of separator ([batch_size, num_samples, num_channels])
:param sep_output_shape: Input shape of separator ([batch_size, num_samples, num_channels])
:param separator_sources: List of Tensorflow tensors that represent the output of the separator network
:param mix_context: Input tensor of the network
:return:
'''
# Load mixture, convert to mono and downsample then
assert(len(mix_audio.shape) == 2)
if model_config["mono_downmix"]:
mix_audio = np.mean(mix_audio, axis=1, keepdims=True)
else:
if mix_audio.shape[1] == 1:# Duplicate channels if input is mono but model is stereo
mix_audio = np.tile(mix_audio, [1, 2])
mix_audio = Utils.resample(mix_audio, mix_sr, model_config["expected_sr"])
# Preallocate source predictions (same shape as input mixture)
source_time_frames = mix_audio.shape[0]
source_preds = [np.zeros(mix_audio.shape, np.float32) for _ in range(model_config["num_sources"])]
input_time_frames = sep_input_shape[1]
output_time_frames = sep_output_shape[1]
# Pad mixture across time at beginning and end so that neural network can make prediction at the beginning and end of signal
pad_time_frames = (input_time_frames - output_time_frames) / 2
mix_audio_padded = np.pad(mix_audio, [(pad_time_frames, pad_time_frames), (0,0)], mode="constant", constant_values=0.0)
# Iterate over mixture magnitudes, fetch network predictions
mixes = list()
start_end_times = list()
for source_pos in range(0, source_time_frames, output_time_frames):
# If this output patch would reach over the end of the source spectrogram, set it so we predict the very end of the output, then stop
if source_pos + output_time_frames > source_time_frames:
source_pos = source_time_frames - output_time_frames
# Prepare mixture excerpt by selecting time interval
mix_part = mix_audio_padded[source_pos:source_pos + input_time_frames,:]
mixes.append(mix_part)
start_end_times.append((source_pos, source_pos + output_time_frames))
# Make predictions
for mix_num in range(0, len(mixes), model_config["batch_size"]):
if mix_num + model_config["batch_size"] < len(mixes):
batch = np.stack(mixes[mix_num:mix_num + model_config["batch_size"]])
else:
batch = np.stack(mixes[mix_num:] + [np.zeros(mixes[0].shape) for _ in range(mix_num + model_config["batch_size"] - len(mixes))])
source_parts = sess.run(separator_sources, feed_dict={mix_context: batch})
# Save predictions
# source_shape = [1, freq_bins, acc_mag_part.shape[2], num_chan]
for out_num in range(mix_num, min(mix_num + model_config["batch_size"], len(mixes))):
batch_num = out_num - mix_num
for i in range(model_config["num_sources"]):
source_preds[i][start_end_times[out_num][0] : start_end_times[out_num][1]] = source_parts[i][batch_num, :, :]
return source_preds
from wave-u-net.
I'm also interested in a speed-up, but I'm not sure it's possible since my CPU is already using all cores
from wave-u-net.
Hi @f90 ,
I am using GPU + the 44kHz model but I am only predicting 30s of audio at a time. So my times are around 2.66 seconds.
Any chances you could share the batched variant you mentioned above?
Thanks a lot for your help and advices!
from wave-u-net.
You were right, this does not bring improvements in terms of speed.
By 'batch' I thought you meant a batch of multiple signals, not 1 single audio in batch, that's why I thought the prediction could be speeded up.
Not sure I'll have time soon to experiment more with this, I'll let you know if I have some improvements, feel free to close the issue.
from wave-u-net.
Multi-GPU is definitely an interesting option. I would like to establish this repository as a "go-to" resource for people learning about deep learning for source separation, so I would like to keep the source code simple, and I am not sure whether a multi-GPU implementation is straightforward enough for that? While training could be elegant to implement especially in newer TF versions, with the specific way we need to predict song outputs I am not sure it would turn out that elegant.
I'm open to feedback on this though!
As for MP3 export, see my post here: (#2 (comment))
from wave-u-net.
A couple of notes on this:
- CPU is much slower than GPU and this is to be expected, since the required operations run much faster on the GPU. So this has nothing to do with my project specifically
- If you use 22KHz models instead of the 44KHz one, your speed will double, since there is only half the number of samples in the audio to process, so that might help if you care about speed more
- I programmed the inference in a very "safe" way that is not particularly fast - using only one CPU/GPU with a batch size of 1, so essentially no parallelism. This ensures the model predicts exactly the way I need it to.
So I could speed this up by quite a lot, probably bringing it down to only 2-3secs on GPU per song, but I would risk introducing new errors in the process. So the main question would be - how important/sufficient is the prediction speed for people that use this repository? So far I did not get any complaints about speed, but if you show a common use case that requires more speed to be feasible, please present it and, if others also indicate that they would like to have this, I can consider putting in some speed-ups. But Multi-GPU training and prediction for example is not super straightforward to code, so I decided to avoid that in favour of keeping correct, readable code that can be adapted by people to their own needs easily.
from wave-u-net.
@f90 thank you, currently I'm using the latest model cfg.full_44KHz
, and the config was
{u'num_frames': 16384, u'num_sources': 2, u'musdb_path': u'/home/daniel/Datasets/MUSDB18', u'merge_filter_size': 5, u'num_layers': 12, u'duration': 2, u'estimates_path': u'/mnt/windaten/Source_Estimates', u'network': u'unet', u'log_dir': u'logs', u'expected_sr': 44100, u'init_sup_sep_lr': 0.0001, u'worse_epochs': 20, u'num_workers': 6, u'num_initial_filters': 24, u'raw_audio_loss': True, u'augmentation': True, u'batch_size': 16, u'mono_downmix': False, u'task': u'voice', u'filter_size': 15, u'epoch_it': 2000, u'upsampling': u'learned', u'num_channels': 2, u'context': True, u'cache_size': 16, u'output_type': u'difference', u'min_replacement_rate': 16, u'model_base_dir': u'checkpoints'}
So have a batch_size
of 16 already. I have tried to change num_workers
to 12, but the processing time it's the same (CPU):
Pre-trained model restored for song prediction
INFO - Waveunet Prediction - Completed after 0:03:31
from wave-u-net.
So have a
batch_size
of 16 already. I have tried to changenum_workers
to 12, but the processing time it's the same (CPU):
This is expected. num_workers
is just for fetching input randomly from your music database while training, so this has no effect on prediction. batch_size
is internally always set to 1 for prediction regardless of what you set, so this has currently no effect. It would be possible to change this, and I would expect you to see some speed-up, but maybe mostly when using a GPU since it can process in parallel well.
Out of interest, what's your CPU usage while predicting? Is it only using a single core, or multiple ones? If it is already using all available cores at 100% then CPU can not be sped up further by changing the code. If not, then maybe using a larger batch size can improve things, but only if Tensorflow is implemented such that it parallelises automatically across multiple CPU cores when processing a whole batch of samples - and I am not sure of that
from wave-u-net.
There would also be the issue when implementing support for any batch_size
for prediction that the best value is the largest one that still does not lead your particular GPU/RAM to a memory overflow. So essentially the default would still have to be left at 1 to be sure it works for almost everyone right away, then people would have to increase their values on their own to find out when it breaks. So due to all these issues I am not sure whether this is worth the potential speed improvement when it works fairly quickly on a single GPU already...
from wave-u-net.
@f90 ok just realized that this is hardcoded here
# Batch size of 1
sep_input_shape[0] = 1
sep_output_shape[0] = 1
mix_context, sources = Input.get_multitrack_placeholders(sep_output_shape, model_config["num_sources"], sep_input_shape, "input")
so to change batch_size
here I should change the sep_input_shape
.
from wave-u-net.
@f90 ok thank you very much it makes sense.
from wave-u-net.
Going to close this issue soon if I don't get any reports on the above code snippet bringing much benefit in terms of prediction speed...
from wave-u-net.
@f90 thanks a lot, we are going to try this asap!
from wave-u-net.
I'm interested in any kind of multiple GPU support or tricks that would speed up the process!
Currently running newest model on a GTX970 and a 3-4 minute song takes approx 1min40secs,
which is awesome! Looking forward to updates. Is it possible to include a mp3 conversion method?
Merry XMAS!
from wave-u-net.
I am trying this repo on google colab and get the following error while running the following command
need suggestion commands on how to tackle this.
!python Predict.py with cfg.full_44KHz input_path="audio_examples/Cristina\ Vane\ -\ So\ Easy/mix.mp3" output_path="Myoutput"
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "Predict.py", line 3, in
import Evaluate
File "/content/Wave-U-Net/Evaluate.py", line 2, in
import tensorflow as tf
File "/usr/local/lib/python3.6/dist-packages/tensorflow/init.py", line 24, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
from wave-u-net.
Hey, this looks like a typical error if the CUDA libraries are not included in your environment properly. Please refer to the CUDA installation manual and how to setup CUDA properly in your particular environment. I think with a simple test.py file that just does "import tensorflow" you will also get the same error, so I don't think it's related to my code in particular
from wave-u-net.
Related Issues (20)
- Further optimization of the model with the help of "Neural Structured Learning"? HOT 1
- ERROR:Key separator/interp_0 not found in checkpoint HOT 4
- How to enable multi-GPU HOT 1
- how to make the code running on the specific gpu?such as gpu1 rather than default gpu0 HOT 1
- there are some issues in "Evaluate.produce_musdb_source_estimates" function
- Inquiry on SDR mean and median on MUSDB/test HOT 4
- Pre-trained models HOT 1
- ERROR: "Key separator/interp_0" not found in checkpoint" in my own trained model
- Key separator/conv1d_26/bias not found in checkpoint
- (Question) GUI version? HOT 1
- How much is the GPU usage?
- Excuse me, how can I use test.py?
- TypeError: load() missing 1 required positional argument: 'Loader' HOT 1
- ModuleNotFoundError: No module named 'tensorflow.contrib' HOT 1
- Wave-U-Net: TypeError: guvectorize() missing 1 required positional argument: 'signature' ? HOT 1
- KeyError: 'brand' HOT 2
- 我想用其它数据集都需要更改哪些参数啊? HOT 1
- can you elaborate on how can i use pre trained model, i didn't understand where should i run the commands you gave in read me
- on what sample rate does this model is trained
- Missing Pretrained Models in Checkpoints Subfolder
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wave-u-net.