GithubHelp home page GithubHelp logo

openai / jukebox Goto Github PK

View Code? Open in Web Editor NEW
7.6K 303.0 1.4K 2.74 MB

Code for the paper "Jukebox: A Generative Model for Music"

Home Page: https://openai.com/blog/jukebox/

License: Other

Python 85.81% C++ 1.64% Cuda 8.68% Makefile 0.13% CSS 0.20% HTML 0.09% Dockerfile 0.07% Shell 0.66% Jupyter Notebook 2.72%
paper audio music pytorch generative-model vq-vae transformer

jukebox's Introduction

Status: Archive (code is provided as-is, no updates expected)

Jukebox

Code for "Jukebox: A Generative Model for Music"

Paper Blog Explorer Colab

Install

Install the conda package manager from https://docs.conda.io/en/latest/miniconda.html

# Required: Sampling
conda create --name jukebox python=3.7.5
conda activate jukebox
conda install mpi4py=3.0.3 # if this fails, try: pip install mpi4py==3.0.3
conda install pytorch=1.4 torchvision=0.5 cudatoolkit=10.0 -c pytorch
git clone https://github.com/openai/jukebox.git
cd jukebox
pip install -r requirements.txt
pip install -e .

# Required: Training
conda install av=7.0.01 -c conda-forge 
pip install ./tensorboardX
 
# Optional: Apex for faster training with fused_adam
conda install pytorch=1.1 torchvision=0.3 cudatoolkit=10.0 -c pytorch
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex

Sampling

Sampling from scratch

To sample normally, run the following command. Model can be 5b, 5b_lyrics, 1b_lyrics

python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 \
--total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125
python jukebox/sample.py --model=1b_lyrics --name=sample_1b --levels=3 --sample_length_in_seconds=20 \
--total_sample_length_in_seconds=180 --sr=44100 --n_samples=16 --hop_fraction=0.5,0.5,0.125

The above generates the first sample_length_in_seconds seconds of audio from a song of total length total_sample_length_in_seconds. To use multiple GPU's, launch the above scripts as mpiexec -n {ngpus} python jukebox/sample.py ... so they use {ngpus}

The samples decoded from each level are stored in {name}/level_{level}. You can also view the samples as an html with the aligned lyrics under {name}/level_{level}/index.html. Run python -m http.server and open the html through the server to see the lyrics animate as the song plays.
A summary of all sampling data including zs, x, labels and sampling_kwargs is stored in {name}/level_{level}/data.pth.tar.

The hps are for a V100 GPU with 16 GB GPU memory. The 1b_lyrics, 5b, and 5b_lyrics top-level priors take up 3.8 GB, 10.3 GB, and 11.5 GB, respectively. The peak memory usage to store transformer key, value cache is about 400 MB for 1b_lyrics and 1 GB for 5b_lyrics per sample. If you are having trouble with CUDA OOM issues, try 1b_lyrics or decrease max_batch_size in sample.py, and --n_samples in the script call.

On a V100, it takes about 3 hrs to fully sample 20 seconds of music. Since this is a long time, it is recommended to use n_samples > 1 so you can generate as many samples as possible in parallel. The 1B lyrics and upsamplers can process 16 samples at a time, while 5B can fit only up to 3. Since the vast majority of time is spent on upsampling, we recommend using a multiple of 3 less than 16 like --n_samples 15 for 5b_lyrics. This will make the top-level generate samples in groups of three while upsampling is done in one pass.

To continue sampling from already generated codes for a longer duration, you can run

python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --mode=continue \
--codes_file=sample_5b/level_0/data.pth.tar --sample_length_in_seconds=40 --total_sample_length_in_seconds=180 \
--sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125

Here, we take the 20 seconds samples saved from the first sampling run at sample_5b/level_0/data.pth.tar and continue by adding 20 more seconds.

You could also continue directly from the level 2 saved outputs, just pass --codes_file=sample_5b/level_2/data.pth.tar. Note this will upsample the full 40 seconds song at the end.

If you stopped sampling at only the first level and want to upsample the saved codes, you can run

python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --mode=upsample \
--codes_file=sample_5b/level_2/data.pth.tar --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 \
--sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125

Here, we take the 20 seconds samples saved from the first sampling run at sample_5b/level_2/data.pth.tar and upsample the lower two levels.

Prompt with your own music

If you want to prompt the model with your own creative piece or any other music, first save them as wave files and run

python jukebox/sample.py --model=5b_lyrics --name=sample_5b_prompted --levels=3 --mode=primed \
--audio_file=path/to/recording.wav,awesome-mix.wav,fav-song.wav,etc.wav --prompt_length_in_seconds=12 \
--sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125

This will load the four files, tile them to fill up to n_samples batch size, and prime the model with the first prompt_length_in_seconds seconds.

Training

VQVAE

To train a small vqvae, run

mpiexec -n {ngpus} python jukebox/train.py --hps=small_vqvae --name=small_vqvae --sample_length=262144 --bs=4 \
--audio_files_dir={audio_files_dir} --labels=False --train --aug_shift --aug_blend

Here, {audio_files_dir} is the directory in which you can put the audio files for your dataset, and {ngpus} is number of GPU's you want to use to train. The above trains a two-level VQ-VAE with downs_t = (5,3), and strides_t = (2, 2) meaning we downsample the audio by 2**5 = 32 to get the first level of codes, and 2**8 = 256 to get the second level codes.
Checkpoints are stored in the logs folder. You can monitor the training by running Tensorboard

tensorboard --logdir logs

Prior

Train prior or upsamplers

Once the VQ-VAE is trained, we can restore it from its saved checkpoint and train priors on the learnt codes. To train the top-level prior, we can run

mpiexec -n {ngpus} python jukebox/train.py --hps=small_vqvae,small_prior,all_fp16,cpu_ema --name=small_prior \
--sample_length=2097152 --bs=4 --audio_files_dir={audio_files_dir} --labels=False --train --test --aug_shift --aug_blend \
--restore_vqvae=logs/small_vqvae/checkpoint_latest.pth.tar --prior --levels=2 --level=1 --weight_decay=0.01 --save_iters=1000

To train the upsampler, we can run

mpiexec -n {ngpus} python jukebox/train.py --hps=small_vqvae,small_upsampler,all_fp16,cpu_ema --name=small_upsampler \
--sample_length=262144 --bs=4 --audio_files_dir={audio_files_dir} --labels=False --train --test --aug_shift --aug_blend \
--restore_vqvae=logs/small_vqvae/checkpoint_latest.pth.tar --prior --levels=2 --level=0 --weight_decay=0.01 --save_iters=1000

We pass sample_length = n_ctx * downsample_of_level so that after downsampling the tokens match the n_ctx of the prior hps. Here, n_ctx = 8192 and downsamples = (32, 256), giving sample_lengths = (8192 * 32, 8192 * 256) = (65536, 2097152) respectively for the bottom and top level.

Learning rate annealing

To get the best sample quality anneal the learning rate to 0 near the end of training. To do so, continue training from the latest checkpoint and run with

--restore_prior="path/to/checkpoint" --lr_use_linear_decay --lr_start_linear_decay={already_trained_steps} --lr_decay={decay_steps_as_needed}

Reuse pre-trained VQ-VAE and train top-level prior on new dataset from scratch.

Train without labels

Our pre-trained VQ-VAE can produce compressed codes for a wide variety of genres of music, and the pre-trained upsamplers can upsample them back to audio that sound very similar to the original audio. To re-use these for a new dataset of your choice, you can retrain just the top-level

To train top-level on a new dataset, run

mpiexec -n {ngpus} python jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=pretrained_vqvae_small_prior \
--sample_length=1048576 --bs=4 --aug_shift --aug_blend --audio_files_dir={audio_files_dir} \
--labels=False --train --test --prior --levels=3 --level=2 --weight_decay=0.01 --save_iters=1000

Training the small_prior with a batch size of 2, 4, and 8 requires 6.7 GB, 9.3 GB, and 15.8 GB of GPU memory, respectively. A few days to a week of training typically yields reasonable samples when the dataset is homogeneous (e.g. all piano pieces, songs of the same style, etc).

Near the end of training, follow this to anneal the learning rate to 0

Sample from new model

You can then run sample.py with the top-level of our models replaced by your new model. To do so,

  • Add an entry my_model=("vqvae", "upsampler_level_0", "upsampler_level_1", "small_prior") in MODELS in make_models.py.
  • Update the small_prior dictionary in hparams.py to include restore_prior='path/to/checkpoint'. If you you changed any hps directly in the command line script (eg:heads), make sure to update them in the dictionary too so that make_models restores our checkpoint correctly.
  • Run sample.py as outlined in the sampling section, but now with --model=my_model

For example, let's say we trained small_vqvae, small_prior, and small_upsampler under /path/to/jukebox/logs. In make_models.py, we are going to declare a tuple of the new models as my_model.

MODELS = {
    '5b': ("vqvae", "upsampler_level_0", "upsampler_level_1", "prior_5b"),
    '5b_lyrics': ("vqvae", "upsampler_level_0", "upsampler_level_1", "prior_5b_lyrics"),
    '1b_lyrics': ("vqvae", "upsampler_level_0", "upsampler_level_1", "prior_1b_lyrics"),
    'my_model': ("my_small_vqvae", "my_small_upsampler", "my_small_prior"),
}

Next, in hparams.py, we add them to the registry with the corresponding restore_paths and any other command line options used during training. Another important note is that for top-level priors with lyric conditioning, we have to locate a self-attention layer that shows alignment between the lyric and music tokens. Look for layers where prior.prior.transformer._attn_mods[layer].attn_func is either 6 or 7. If your model is starting to sing along lyrics, it means some layer, head pair has learned alignment. Congrats!

my_small_vqvae = Hyperparams(
    restore_vqvae='/path/to/jukebox/logs/small_vqvae/checkpoint_some_step.pth.tar',
)
my_small_vqvae.update(small_vqvae)
HPARAMS_REGISTRY["my_small_vqvae"] = my_small_vqvae

my_small_prior = Hyperparams(
    restore_prior='/path/to/jukebox/logs/small_prior/checkpoint_latest.pth.tar',
    level=1,
    labels=False,
    # TODO For the two lines below, if `--labels` was used and the model is
    # trained with lyrics, find and enter the layer, head pair that has learned
    # alignment.
    alignment_layer=47,
    alignment_head=0,
)
my_small_prior.update(small_prior)
HPARAMS_REGISTRY["my_small_prior"] = my_small_prior

my_small_upsampler = Hyperparams(
    restore_prior='/path/to/jukebox/logs/small_upsampler/checkpoint_latest.pth.tar',
    level=0,
    labels=False,
)
my_small_upsampler.update(small_upsampler)
HPARAMS_REGISTRY["my_small_upsampler"] = my_small_upsampler

Train with labels

To train with you own metadata for your audio files, implement get_metadata in data/files_dataset.py to return the artist, genre and lyrics for a given audio file. For now, you can pass '' for lyrics to not use any lyrics.

For training with labels, we'll use small_labelled_prior in hparams.py, and we set labels=True,labels_v3=True. We use 2 kinds of labels information:

  • Artist/Genre:
    • For each file, we return an artist_id and a list of genre_ids. The reason we have a list and not a single genre_id is that in v2, we split genres like blues_rock into a bag of words [blues, rock], and we pass atmost max_bow_genre_size of those, in v3 we consider it as a single word and just set max_bow_genre_size=1.
    • Update the v3_artist_ids and v3_genre_ids to use ids from your new dataset.
    • In small_labelled_prior, set the hps y_bins = (number_of_genres, number_of_artists) and max_bow_genre_size=1.
  • Timing:
    • For each chunk of audio, we return the total_length of the song, the offset the current audio chunk is at and the sample_length of the audio chunk. We have three timing embeddings: total_length, our current position, and our current position as a fraction of the total length, and we divide the range of these values into t_bins discrete bins.
    • In small_labelled_prior, set the hps min_duration and max_duration to be the shortest/longest duration of audio files you want for your dataset, and t_bins for how many bins you want to discretize timing information into. Note min_duration * sr needs to be at least sample_length to have an audio chunk in it.

After these modifications, to train a top-level with labels, run

mpiexec -n {ngpus} python jukebox/train.py --hps=vqvae,small_labelled_prior,all_fp16,cpu_ema --name=pretrained_vqvae_small_prior_labels \
--sample_length=1048576 --bs=4 --aug_shift --aug_blend --audio_files_dir={audio_files_dir} \
--labels=True --train --test --prior --levels=3 --level=2 --weight_decay=0.01 --save_iters=1000

For sampling, follow same instructions as above but use small_labelled_prior instead of small_prior.

Train with lyrics

To train in addition with lyrics, update get_metadata in data/files_dataset.py to return lyrics too. For training with lyrics, we'll use small_single_enc_dec_prior in hparams.py.

  • Lyrics:
    • For each file, we linearly align the lyric characters to the audio, find the position in lyric that corresponds to the midpoint of our audio chunk, and pass a window of n_tokens lyric characters centred around that.
    • In small_single_enc_dec_prior, set the hps use_tokens=True and n_tokens to be the number of lyric characters to use for an audio chunk. Set it according to the sample_length you're training on so that its large enough that the lyrics for an audio chunk are almost always found inside a window of that size.
    • If you use a non-English vocabulary, update text_processor.py with your new vocab and set n_vocab = number of characters in vocabulary accordingly in small_single_enc_dec_prior. In v2, we had a n_vocab=80 and in v3 we missed + and so n_vocab=79 of characters.

After these modifications, to train a top-level with labels and lyrics, run

mpiexec -n {ngpus} python jukebox/train.py --hps=vqvae,small_single_enc_dec_prior,all_fp16,cpu_ema --name=pretrained_vqvae_small_single_enc_dec_prior_labels \
--sample_length=786432 --bs=4 --aug_shift --aug_blend --audio_files_dir={audio_files_dir} \
--labels=True --train --test --prior --levels=3 --level=2 --weight_decay=0.01 --save_iters=1000

To simplify hps choices, here we used a single_enc_dec model like the 1b_lyrics model that combines both encoder and decoder of the transformer into a single model. We do so by merging the lyric vocab and vq-vae vocab into a single larger vocab, and flattening the lyric tokens and the vq-vae codes into a single sequence of length n_ctx + n_tokens. This uses attn_order=12 which includes prime_attention layers with keys/values from lyrics and queries from audio. If you instead want to use a model with the usual encoder-decoder style transformer, use small_sep_enc_dec_prior.

For sampling, follow same instructions as above but use small_single_enc_dec_prior instead of small_prior. To also get the alignment between lyrics and samples in the saved html, you'll need to set alignment_layer and alignment_head in small_single_enc_dec_prior. To find which layer/head is best to use, run a forward pass on a training example, save the attention weight tensors for all prime_attention layers, and pick the (layer, head) which has the best linear alignment pattern between the lyrics keys and music queries.

Fine-tune pre-trained top-level prior to new style(s)

Previously, we showed how to train a small top-level prior from scratch. Assuming you have a GPU with at least 15 GB of memory and support for fp16, you could fine-tune from our pre-trained 1B top-level prior. Here are the steps:

  • Support --labels=True by implementing get_metadata in jukebox/data/files_dataset.py for your dataset.
  • Add new entries in jukebox/data/ids. We recommend replacing existing mappings (e.g. rename "unknown", etc with styles of your choice). This uses the pre-trained style vectors as initialization and could potentially save some compute.

After these modifications, run

mpiexec -n {ngpus} python jukebox/train.py --hps=vqvae,prior_1b_lyrics,all_fp16,cpu_ema --name=finetuned \
--sample_length=1048576 --bs=1 --aug_shift --aug_blend --audio_files_dir={audio_files_dir} \
--labels=True --train --test --prior --levels=3 --level=2 --weight_decay=0.01 --save_iters=1000

To get the best sample quality, it is recommended to anneal the learning rate in the end. Training the 5B top-level requires GPipe which is not supported in this release.

Citation

Please cite using the following bibtex entry:

@article{dhariwal2020jukebox,
  title={Jukebox: A Generative Model for Music},
  author={Dhariwal, Prafulla and Jun, Heewoo and Payne, Christine and Kim, Jong Wook and Radford, Alec and Sutskever, Ilya},
  journal={arXiv preprint arXiv:2005.00341},
  year={2020}
}

License

Noncommercial Use License

It covers both released code and weights.

jukebox's People

Contributors

gnhdnb avatar heewooj avatar jdlozanom avatar johndpope avatar jongwook avatar mcleavey avatar prafullasd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jukebox's Issues

'ReduceOp'?

when running get this error:
(jukebox) C:\Users\Profile\jukebox>python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125
Traceback (most recent call last):
File "jukebox/sample.py", line 7, in
from jukebox.utils.audio_utils import save_wav, load_audio
File "c:\users\thebeast\jukebox\jukebox\utils\audio_utils.py", line 6, in
from jukebox.utils.dist_utils import print_once
File "c:\users\thebeast\jukebox\jukebox\utils\dist_utils.py", line 22, in
def allreduce(x, op=dist.ReduceOp.SUM):
AttributeError: module 'torch.distributed' has no attribute 'ReduceOp'

any ideas?

Code is hard to read

Would be nice to use nbdev notebooks from fastAI.
Also use named tensors. I heard they added this in pytorch.

Loading Previous Model after timeout

Is there some way to upload the model back into Colab instead of restarting? I believe it is saved as data.pth.tar but I am not sure how to resume with the samples I created when Google Colab disconnects. I have samples at level 1 but it does not complete upsampling to level 0.

Thank you for the help!

Unexpected EOF when creating Upsamplers

I am running into this error when creating the upsamplers. I am not sure which file it is referring to that could potentially be corrupted. All the code before worked well and I have my music samples.

I am using the colab notebook that SMarioMan provided for primed audio.

eofError

Error installing tensorboardX

Hey, I'm sorry if this is a stupid question as I'm new to programming/GitHub generally and I don't even know if it's correct to ask it here, I just hope I'll be able to get help here since I really want to use Jukebox.

I was following the installation instructions precisely, but I don't seem to be able to install tensorboardX, this is the output I get when trying to install it. (Linked in case the file upload doesn't work)

Cheers
tensorboardX_installation_error.txt

'CUDA out of memory' on p2.xlarge when running sampling example

I'm currently trying to get the first sampling examples in the README running on a p2.xlarge AWS instance running Deep Learning AMI (Ubuntu 18.04) Version 28.0.

The error returned is: RuntimeError: CUDA out of memory. Tried to allocate 450.00 MiB (GPU 0; 11.17 GiB total capacity; 10.69 GiB already allocated; 31.31 MiB free; 172.81 MiB cached) (full log available at https://gist.github.com/tomekr/e7968d373683ebea79f18881070fa9a1)

nvidia-smi right before the crash was at about this point

Every 2.0s: nvidia-smi                                                                                                                                                                                                                         ip-172-31-31-64: Fri May  1 00:00:34 2020

Fri May  1 00:00:34 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   46C    P0    72W / 149W |    299MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     23731      C   python                                       288MiB |
+-----------------------------------------------------------------------------+

It looks like even changing --n_samples to 1 doesn't help here.

Questions about the final composition of music

First of all, it's a great project, and it's amazing
I do it first
"Python jukebox / sample. Py -- model = 5B ﹐ LYC -- name = sample ﹐ 5B -- levels = 3 -- sample ﹐ length ﹐ in ﹐ seconds = 20 -- total ﹐ sample ﹐ length ﹐ in ﹐ seconds = 180 -- sR = 44100 -- N ﹐ samples = 6 -- hop ﹐ fraction = 0.5,0.5,0.125" for about 2 hours,
In the sample_5b directory, there are six levels_0 levels_1 levels_2. The html is just the first one, but the effect is really amazing. I want to continue to see the complete,And then on
"python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --mode=continue --codes_file=sample_5b/level_0/data.pth.tar --sample_length_in_seconds=40 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125",The directory is the same, but the generated wav time is 39s,The effect of listening at level_0 is not bad. I will continue,This time, it will continue for β€œ--sample_length_in_seconds=100”, and the duration of wav will change to 1:39,After about 10 hours, the effect is really amazing in the level 0 directory,But I always feel a little noisy. I'm not sure if it's the final generation,Wav under level_0

I want to ask you that in your blog, this will eventually generate a new β€œnovel audioβ€οΌŒThis is to achieve the equality of "--sample_length_in_seconds=180 "and" --total_sample_length_in_seconds=180"before execution. The final generation,

I also want to ask how to generate a new audio by combining all the data in the level_0 1 2 directory
Top level upsample middle level upsample bottom level decode generated in your blog generate new novel audio?

Also, I haven't seen the command of how to generate the final result after the sampling is completed. I also have "librosa. Output. Write" wav (f'noisy "top" level "generation" {I}. Wav ", x [i], Sr = 44100" in "interacting_with_jukebox.Ipynb". I don't know if this is the code to generate the final level audio

I'm sorry to disturb you, but I still hope to ask you to solve your doubts. I will continue to sample in 180s, but I hope that the expected result is finally generated

I'm sorry to be so long winded. I'm sorry. Finally, I hope this project will become more and more popular and enjoy it

sample.py just stops running without any clear error

Tried running in Google Colab.

Notebook link: https://colab.research.google.com/drive/1qvJ2YCaB2LYbERgqe_I9gHaLyFB-E7o6

Tried 5b-lyrics, 1b-lyrics and 1b-lyrics with lower samples or lengths but it just stops.

5b-lyrics
python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125

output:
Using cuda True {'name': 'sample_5b', 'levels': 3, 'sample_length_in_seconds': 20, 'total_sample_length_in_seconds': 180, 'sr': 44100, 'n_samples': 6, 'hop_fraction': (0.5, 0.5, 0.125)} Setting sample length to 881920 (i.e. 19.998185941043083 seconds) to be multiple of 128 Downloading from gce Restored from /root/.cache/jukebox-assets/models/5b/vqvae.pth.tar 0: Loading vqvae in eval mode Conditioning on 1 above level(s) Checkpointing convs Checkpointing convs Loading artist IDs from /content/jukebox/jukebox/data/ids/v2_artist_ids.txt Loading artist IDs from /content/jukebox/jukebox/data/ids/v2_genre_ids.txt Level:0, Cond downsample:4, Raw to tokens:8, Sample length:65536 Downloading from gce Restored from /root/.cache/jukebox-assets/models/5b/prior_level_0.pth.tar 0: Loading prior in eval mode Conditioning on 1 above level(s) Checkpointing convs Checkpointing convs Loading artist IDs from /content/jukebox/jukebox/data/ids/v2_artist_ids.txt Loading artist IDs from /content/jukebox/jukebox/data/ids/v2_genre_ids.txt Level:1, Cond downsample:4, Raw to tokens:32, Sample length:262144 Downloading from gce Restored from /root/.cache/jukebox-assets/models/5b/prior_level_1.pth.tar 0: Loading prior in eval mode ^C

1b-lyrics:

python jukebox/sample.py --model=1b_lyrics --name=sample_1b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=16 --hop_fraction=0.5,0.5,0.125

output:
Using cuda True {'name': 'sample_1b', 'levels': 3, 'sample_length_in_seconds': 20, 'total_sample_length_in_seconds': 180, 'sr': 44100, 'n_samples': 16, 'hop_fraction': (0.5, 0.5, 0.125)} Setting sample length to 881920 (i.e. 19.998185941043083 seconds) to be multiple of 128 Downloading from gce Restored from /root/.cache/jukebox-assets/models/5b/vqvae.pth.tar 0: Loading vqvae in eval mode Conditioning on 1 above level(s) Checkpointing convs Checkpointing convs Loading artist IDs from /content/jukebox/jukebox/data/ids/v2_artist_ids.txt Loading artist IDs from /content/jukebox/jukebox/data/ids/v2_genre_ids.txt Level:0, Cond downsample:4, Raw to tokens:8, Sample length:65536 Downloading from gce Restored from /root/.cache/jukebox-assets/models/5b/prior_level_0.pth.tar 0: Loading prior in eval mode Conditioning on 1 above level(s) Checkpointing convs Checkpointing convs Loading artist IDs from /content/jukebox/jukebox/data/ids/v2_artist_ids.txt Loading artist IDs from /content/jukebox/jukebox/data/ids/v2_genre_ids.txt Level:1, Cond downsample:4, Raw to tokens:32, Sample length:262144 Downloading from gce Restored from /root/.cache/jukebox-assets/models/5b/prior_level_1.pth.tar 0: Loading prior in eval mode Creating cond. autoregress with prior bins [79, 2048], dims [384, 6144], shift [ 0 79] input shape 6528 input bins 2127 Self copy is False ^C

Mind you I never tried to interrupt the script so I don't know where the ^C is coming from.

Error when running sampling example

I followed the Install instructions and then ran the sampling command and got:

$ python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125
Caught error during NCCL init (attempt 0 of 5): Distributed package doesn't have NCCL built in
Caught error during NCCL init (attempt 1 of 5): Distributed package doesn't have NCCL built in
Caught error during NCCL init (attempt 2 of 5): Distributed package doesn't have NCCL built in
Caught error during NCCL init (attempt 3 of 5): Distributed package doesn't have NCCL built in
Caught error during NCCL init (attempt 4 of 5): Distributed package doesn't have NCCL built in
Traceback (most recent call last):
  File "jukebox/sample.py", line 237, in <module>
    fire.Fire(run)
  File "/Users/manu/opt/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/Users/manu/opt/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/Users/manu/opt/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 229, in run
    rank, local_rank, device = setup_dist_from_mpi(port=port)
  File "/Users/manu/git/jukebox/jukebox/utils/dist_utils.py", line 86, in setup_dist_from_mpi
    raise RuntimeError("Failed to initialize NCCL")
RuntimeError: Failed to initialize NCCL

I tried googling around about this NCCL error (I have no idea what NCCL is), but couldn't find any solutions. Any idea on how to fix this? Thanks!

"Intermediate steps" on CoLab

Hi, I was wondering which files I need to download and re-upload due to the 12 hour limit when upsampling using CoLab (I use Co-Composer). I know there is a co_composer folder that shows up when level 1 upsampling is completed, but when loading it into the content folder again in a new session, the upsampling process still starts completely anew. And the level 0 folder doesn't show up at all.
Please help, the instructions in the CoLab are very unclear on this part.

factored_attention.py throws AssertionError

After changing the artist and lyrics,
the jupyter notebook after line:
zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)
throws the following error:

Sampling level 2
Sampling 8192 tokens for [0,8192]. Conditioning on 0 tokens
Ancestral sampling 3 samples with temp=0.98, top_k=0, top_p=0.0
0/8192 [00:00<?, ?it/s]

---------------------------------------------------------------------------

AssertionError                            Traceback (most recent call last)

<ipython-input-44-c3213f092598> in <module>()
      1 zs = [t.zeros(hps.n_samples,0,dtype=t.long, device='cuda') for _ in range(len(priors))]
----> 2 zs = _sample(zs, labels, sampling_kwargs, [None, None, top_prior], [2], hps)

6 frames

/usr/local/lib/python3.6/dist-packages/jukebox/transformer/factored_attention.py in check_cache(self, n_samples, sample_t, fp16)
    411 
    412     def check_cache(self, n_samples, sample_t, fp16):
--> 413         assert self.sample_t == sample_t, f"{self.sample_t} != {sample_t}"
    414         if sample_t == 0:
    415             assert self.cache == {}

AssertionError: 3344 != 0

Slow

Will the code ever be improved upon to be faster? I understand there is limits to AI but i'm sure theres a solution to make upsampling faster

How to properly sample from 1B on colab

Do i only need to change the model to "1B lyrics" in that first line or do i need to also change every reference to 5b lyrics in the code?

model = "1b_lyrics"
hps = Hyperparams()
hps.sr = 44100
hps.n_samples = 3 if model=='5b_lyrics' else 8
hps.name = 'samples'
chunk_size = 16 if model=="5b_lyrics" else 32
max_batch_size = 3 if model=="5b_lyrics" else 16
hps.levels = 3
hps.hop_fraction = [.5,.5,.125]

sampling_temperature = .98
lower_batch_size = 16
max_batch_size = 3 if model == "5b_lyrics" else 16
lower_level_chunk_size = 32
chunk_size = 16 if model == "5b_lyrics" else 32
sampling_kwargs = [dict(temp=.99, fp16=True, max_batch_size=lower_batch_size,
chunk_size=lower_level_chunk_size),
dict(temp=0.99, fp16=True, max_batch_size=lower_batch_size,
chunk_size=lower_level_chunk_size),
dict(temp=sampling_temperature, fp16=True,
max_batch_size=max_batch_size, chunk_size=chunk_size)]

Install encountered issue reported as Collecting numba==0.48.0 (from jukebox==1.0)

Running the pip install -e . command has lead to this error
Collecting numba==0.48.0 (from jukebox==1.0)
Could not find a version that satisfies the requirement numba==0.48.0 (from jukebox==1.0) (from versions: 0.1, 0.2, 0.3, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.12.1, 0.12.2, 0.13.0, 0.13.2, 0.13.3, 0.13.4, 0.14.0, 0.15.1, 0.16.0, 0.17.0, 0.18.1, 0.18.2, 0.19.1, 0.19.2, 0.20.0, 0.21.0, 0.22.0, 0.22.1, 0.23.0, 0.23.1, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.1, 0.29.0, 0.30.0, 0.30.1, 0.31.0, 0.32.0, 0.33.0, 0.34.0, 0.35.0, 0.36.1, 0.36.2, 0.37.0, 0.38.0, 0.38.1, 0.39.0, 0.40.0, 0.40.1, 0.41.0, 0.42.0, 0.42.1, 0.43.0, 0.43.1, 0.44.0, 0.44.1, 0.45.0, 0.45.1, 0.46.0, 0.47.0)
No matching distribution found for numba==0.48.0 (from jukebox==1.0)

v3 Genre ID "jazz fusion" is non-functional

Also, this genre is duplicated in v3_genre_ids.txt at id 107 and id 295.

Traceback (most recent call last):
  File "jukebox/sample.py", line 307, in <module>
    fire.Fire(run)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 304, in run
    save_samples(model, device, hps, sample_hps)
  File "jukebox/sample.py", line 268, in save_samples
    labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
  File "jukebox/sample.py", line 268, in <listcomp>
    labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
  File "~/jukebox/jukebox/data/labels.py", line 60, in get_batch_labels
    label = self.get_label(**meta)
  File "~/jukebox/jukebox/data/labels.py", line 33, in get_label
    genre_ids = self.ag_processor.get_genre_ids(genre)
  File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in get_genre_ids
    return [self.genre_ids[word] for word in genres]
  File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in <listcomp>
    return [self.genre_ids[word] for word in genres]
KeyError: 'fusion'

Primed audio samples from notebook

From the notebook, how would you change the mode from an ancestral_sample starting point to a primed audio file list? I tried adding these to the Hyperparams() hps instance, but it wasn't effective.

Problem with installation of mpi4py and pytorch

Following the readme I try to install required packages but i stumble upon problems on the way.

mpi4py:
when command inserted as it says in the readme the following error shows

(jukebox) C:\Users\Tomasz>conda install mpi4py=3.0.3
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - mpi4py=3.0.3

Current channels:

  - https://repo.anaconda.com/pkgs/main/win-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/win-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/msys2/win-64
  - https://repo.anaconda.com/pkgs/msys2/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

I seemed to workaround this by just installing it using pip, nevertheless I would like to know why the error occurs and how to avoid it in the future if pip instalation doesn't work.

One solution I found online was to update channels in anaconda, it did not solve the issue

pytorch:
when command inserted as it says in the readme the following error shows

(jukebox) C:\Users\Tomasz>conda install pytorch=1.4 torchvision=0.5 cudatoolkit=10.0 -c pytorch
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
Examining wheel:  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                   | 7/15 [00:00<00:01,  6.15it/s]\Examining conflict for pytorch torchvision:  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                          | 5/15 [00:01<00:02,  4.86it/s]\Examining conflict for pip python wheel:  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 13/15 [00:02<00:00,  4.56it/s]/failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package vs2015_runtime conflicts for:
vc -> vs2015_runtime[version='>=14.0.25123,<15.0a0|>=14.0.25420|>=14.15.26706|>=14.16.27012']
pytorch=1.4 -> vs2015_runtime[version='>=14.16.27012,<15.0a0']
setuptools -> python[version='>=3.8,<3.9.0a0'] -> vs2015_runtime[version='>=14.16.27012,<15.0a0']
zlib -> vc[version='>=14.1,<15.0a0'] -> vs2015_runtime[version='>=14.0.25123,<15.0a0|>=14.0.25420|>=14.15.26706|>=14.16.27012']
wheel -> python[version='>=3.6,<3.7.0a0'] -> vs2015_runtime[version='>=14.16.27012,<15.0a0']
torchvision=0.5 -> numpy[version='>=1.11'] -> vs2015_runtime[version='>=14.16.27012,<15.0a0']
zlib -> vs2015_runtime[version='>=14.16.27012,<15.0a0']
vs2015_runtime
pip -> python[version='>=3.7,<3.8.0a0'] -> vs2015_runtime[version='>=14.16.27012,<15.0a0']
python=3.7.5 -> vc[version='>=14.1,<15.0a0'] -> vs2015_runtime[version='>=14.15.26706|>=14.16.27012']
certifi -> python[version='>=3.8,<3.9.0a0'] -> vs2015_runtime[version='>=14.16.27012,<15.0a0']
pytorch=1.4 -> python[version='>=3.6,<3.7.0a0'] -> vs2015_runtime[version='>=14.15.26706|>=14.16.27012']
openssl -> vs2015_runtime[version='>=14.16.27012,<15.0a0']
openssl -> vc[version='>=14.1,<15.0a0'] -> vs2015_runtime[version='>=14.0.25123,<15.0a0|>=14.0.25420|>=14.15.26706|>=14.16.27012']
python=3.7.5 -> vs2015_runtime[version='>=14.16.27012,<15.0a0']
wincertstore -> python[version='>=3.8,<3.9.0a0'] -> vs2015_runtime[version='>=14.16.27012,<15.0a0']

Package six conflicts for:
pip -> html5lib -> six[version='>=1.9']
torchvision=0.5 -> six
pytorch=1.4 -> mkl-service[version='>=2,<3.0a0'] -> six

Package sqlite conflicts for:
pip -> python[version='>=3.7,<3.8.0a0'] -> sqlite[version='>=3.25.3,<4.0a0|>=3.26.0,<4.0a0|>=3.27.2,<4.0a0|>=3.28.0,<4.0a0|>=3.29.0,<4.0a0|>=3.30.1,<4.0a0|>=3.31.1,<4.0a0|>=3.30.0,<4.0a0']
certifi -> python[version='>=3.8,<3.9.0a0'] -> sqlite[version='>=3.25.3,<4.0a0|>=3.26.0,<4.0a0|>=3.27.2,<4.0a0|>=3.28.0,<4.0a0|>=3.29.0,<4.0a0|>=3.30.0,<4.0a0|>=3.30.1,<4.0a0|>=3.31.1,<4.0a0']
pytorch=1.4 -> python[version='>=3.8,<3.9.0a0'] -> sqlite[version='>=3.25.3,<4.0a0|>=3.26.0,<4.0a0|>=3.27.2,<4.0a0|>=3.28.0,<4.0a0|>=3.29.0,<4.0a0|>=3.30.0,<4.0a0|>=3.30.1,<4.0a0|>=3.31.1,<4.0a0']
setuptools -> python[version='>=3.8,<3.9.0a0'] -> sqlite[version='>=3.25.3,<4.0a0|>=3.26.0,<4.0a0|>=3.29.0,<4.0a0|>=3.30.0,<4.0a0|>=3.30.1,<4.0a0|>=3.31.1,<4.0a0|>=3.28.0,<4.0a0|>=3.27.2,<4.0a0']
torchvision=0.5 -> python[version='>=3.8,<3.9.0a0'] -> sqlite[version='>=3.25.3,<4.0a0|>=3.26.0,<4.0a0|>=3.27.2,<4.0a0|>=3.28.0,<4.0a0|>=3.29.0,<4.0a0|>=3.30.0,<4.0a0|>=3.30.1,<4.0a0|>=3.31.1,<4.0a0']
sqlite
wheel -> python[version='>=3.6,<3.7.0a0'] -> sqlite[version='>=3.25.3,<4.0a0|>=3.26.0,<4.0a0|>=3.29.0,<4.0a0|>=3.30.1,<4.0a0|>=3.31.1,<4.0a0|>=3.30.0,<4.0a0|>=3.28.0,<4.0a0|>=3.27.2,<4.0a0']
python=3.7.5 -> sqlite[version='>=3.30.1,<4.0a0']
wincertstore -> python[version='>=3.8,<3.9.0a0'] -> sqlite[version='>=3.25.3,<4.0a0|>=3.26.0,<4.0a0|>=3.27.2,<4.0a0|>=3.28.0,<4.0a0|>=3.29.0,<4.0a0|>=3.30.0,<4.0a0|>=3.30.1,<4.0a0|>=3.31.1,<4.0a0']

Package openssl conflicts for:
pytorch=1.4 -> python[version='>=3.8,<3.9.0a0'] -> openssl[version='>=1.1.1a,<1.1.2a|>=1.1.1b,<1.1.2a|>=1.1.1c,<1.1.2a|>=1.1.1d,<1.1.2a|>=1.1.1e,<1.1.2a|>=1.1.1f,<1.1.2a|>=1.1.1g,<1.1.2a']
python=3.7.5 -> openssl[version='>=1.1.1d,<1.1.2a']
wheel -> python[version='>=3.8,<3.9.0a0'] -> openssl[version='>=1.1.1a,<1.1.2a|>=1.1.1b,<1.1.2a|>=1.1.1c,<1.1.2a|>=1.1.1d,<1.1.2a|>=1.1.1e,<1.1.2a|>=1.1.1f,<1.1.2a|>=1.1.1g,<1.1.2a']
wincertstore -> python[version='>=3.8,<3.9.0a0'] -> openssl[version='>=1.1.1a,<1.1.2a|>=1.1.1b,<1.1.2a|>=1.1.1c,<1.1.2a|>=1.1.1d,<1.1.2a|>=1.1.1e,<1.1.2a|>=1.1.1f,<1.1.2a|>=1.1.1g,<1.1.2a']
setuptools -> python[version='>=3.8,<3.9.0a0'] -> openssl[version='>=1.1.1a,<1.1.2a|>=1.1.1b,<1.1.2a|>=1.1.1c,<1.1.2a|>=1.1.1d,<1.1.2a|>=1.1.1e,<1.1.2a|>=1.1.1f,<1.1.2a|>=1.1.1g,<1.1.2a']
torchvision=0.5 -> python[version='>=3.8,<3.9.0a0'] -> openssl[version='>=1.1.1a,<1.1.2a|>=1.1.1b,<1.1.2a|>=1.1.1c,<1.1.2a|>=1.1.1d,<1.1.2a|>=1.1.1e,<1.1.2a|>=1.1.1f,<1.1.2a|>=1.1.1g,<1.1.2a']
pip -> python[version='>=3.7,<3.8.0a0'] -> openssl[version='>=1.1.1a,<1.1.2a|>=1.1.1b,<1.1.2a|>=1.1.1c,<1.1.2a|>=1.1.1d,<1.1.2a|>=1.1.1e,<1.1.2a|>=1.1.1f,<1.1.2a|>=1.1.1g,<1.1.2a']
openssl
certifi -> python[version='>=3.8,<3.9.0a0'] -> openssl[version='>=1.1.1a,<1.1.2a|>=1.1.1b,<1.1.2a|>=1.1.1c,<1.1.2a|>=1.1.1d,<1.1.2a|>=1.1.1e,<1.1.2a|>=1.1.1f,<1.1.2a|>=1.1.1g,<1.1.2a']

Package vc conflicts for:
pip -> python[version='>=3.7,<3.8.0a0'] -> vc[version='14.*|>=14.1,<15.0a0|9.*']
python=3.7.5 -> openssl[version='>=1.1.1d,<1.1.2a'] -> vc=9
wincertstore -> python[version='>=3.8,<3.9.0a0'] -> vc[version='14.*|>=14.1,<15.0a0|9.*']
torchvision=0.5 -> numpy[version='>=1.11'] -> vc[version='14.*|9.*|>=14.1,<15.0a0']
openssl -> vc[version='14.*|9.*|>=14.1,<15.0a0']
certifi -> python[version='>=3.8,<3.9.0a0'] -> vc[version='14.*|>=14.1,<15.0a0|9.*']
setuptools -> python[version='>=3.8,<3.9.0a0'] -> vc[version='14.*|>=14.1,<15.0a0|9.*']
pytorch=1.4 -> vc[version='>=14.1,<15.0a0']
wheel -> python[version='>=3.6,<3.7.0a0'] -> vc[version='14.*|>=14.1,<15.0a0|9.*']
pytorch=1.4 -> ninja -> vc[version='14.*|9.*']
python=3.7.5 -> vc[version='>=14.1,<15.0a0']
vc
zlib -> vc[version='14.*|9.*|>=14.1,<15.0a0']

Package wincertstore conflicts for:
wincertstore
wheel -> setuptools -> wincertstore[version='>=0.2']
pip -> setuptools -> wincertstore[version='>=0.2']
setuptools -> wincertstore[version='>=0.2']

Package cudatoolkit conflicts for:
pytorch=1.4 -> cudatoolkit[version='>=10.1,<10.2|>=9.2,<9.3']
cudatoolkit=10.0
torchvision=0.5 -> cudatoolkit[version='>=10.1,<10.2|>=9.2,<9.3']

Package wheel conflicts for:
pip -> wheel
python=3.7.5 -> pip -> wheel
wheel

Package pytorch conflicts for:
torchvision=0.5 -> pytorch==1.4.0
pytorch=1.4

Package ca-certificates conflicts for:
python=3.7.5 -> openssl[version='>=1.1.1d,<1.1.2a'] -> ca-certificates
wincertstore -> python[version='>=2.7,<2.8.0a0'] -> ca-certificates
setuptools -> python[version='>=2.7,<2.8.0a0'] -> ca-certificates
ca-certificates
openssl -> ca-certificates
pip -> python[version='>=2.7,<2.8.0a0'] -> ca-certificates
wheel -> python[version='>=2.7,<2.8.0a0'] -> ca-certificates
certifi -> python[version='>=2.7,<2.8.0a0'] -> ca-certificates

Package setuptools conflicts for:
setuptools
wheel -> setuptools
python=3.7.5 -> pip -> setuptools
pip -> setuptools

Package pip conflicts for:
wheel -> python[version='>=3.6,<3.7.0a0'] -> pip
pip
pytorch=1.4 -> python[version='>=3.8,<3.9.0a0'] -> pip
wincertstore -> python[version='>=3.8,<3.9.0a0'] -> pip
certifi -> python[version='>=3.8,<3.9.0a0'] -> pip
setuptools -> python[version='>=3.8,<3.9.0a0'] -> pip
python=3.7.5 -> pip
torchvision=0.5 -> python[version='>=3.8,<3.9.0a0'] -> pip

Package zlib conflicts for:
python=3.7.5 -> sqlite[version='>=3.30.1,<4.0a0'] -> zlib[version='>=1.2.11,<1.3.0a0']
zlib
torchvision=0.5 -> pillow[version='>=4.1.1'] -> zlib[version='>=1.2.11,<1.3.0a0']

Package cachecontrol conflicts for:
python=3.7.5 -> pip -> cachecontrol
pip -> cachecontrol

Package vs2008_runtime conflicts for:
setuptools -> python[version='>=2.7,<2.8.0a0'] -> vs2008_runtime
vc -> vs2008_runtime[version='>=9.0.30729.1,<10.0a0']
wheel -> python[version='>=2.7,<2.8.0a0'] -> vs2008_runtime
pip -> python[version='>=2.7,<2.8.0a0'] -> vs2008_runtime
wincertstore -> python[version='>=2.7,<2.8.0a0'] -> vs2008_runtime
sqlite -> vc=9 -> vs2008_runtime[version='>=9.0.30729.1,<10.0a0']
certifi -> python[version='>=2.7,<2.8.0a0'] -> vs2008_runtime
zlib -> vc=9 -> vs2008_runtime[version='>=9.0.30729.1,<10.0a0']
openssl -> vc=9 -> vs2008_runtime[version='>=9.0.30729.1,<10.0a0']

Package certifi conflicts for:
certifi
pip -> setuptools -> certifi[version='>=2016.09|>=2016.9.26|>=2017.4.17']
setuptools -> certifi[version='>=2016.09|>=2016.9.26']
wheel -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']

I've read online that if packages are conflicting i should just delete those that do not support the newer/older version but looking at the quantity of packages that conflict it should not be common because there is nothing mentioned about it in the readme. This is the first time using anaconda so I don't feel really comfortable deleting stuff by hand as I do not know if they will not be needed in the future.

RuntimeError during Sampling and thus error during training

I ran sample.py for 8h before CUDA ran out of memory. Here are the logs:
Input:
~/jukebox$ python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125
Output:

Using cuda True
{'name': 'sample_5b', 'levels': 3, 'sample_length_in_seconds': 20, 'total_sample_length_in_seconds': 180, 'sr': 44100, 'n_samples': 6, 'hop_fraction': (0.5, 0.5, 0.125)}
Setting sample length to 881920 (i.e. 19.998185941043083 seconds) to be multiple of 128
Downloading from gce
Restored from /home/correlation4/.cache/jukebox-assets/models/5b/vqvae.pth.tar
0: Loading vqvae in eval mode
Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /home/correlation4/jukebox/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /home/correlation4/jukebox/jukebox/data/ids/v2_genre_ids.txt
Level:0, Cond downsample:4, Raw to tokens:8, Sample length:65536
Downloading from gce
Restored from /home/correlation4/.cache/jukebox-assets/models/5b/prior_level_0.pth.tar
0: Loading prior in eval mode
Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /home/correlation4/jukebox/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /home/correlation4/jukebox/jukebox/data/ids/v2_genre_ids.txt
Level:1, Cond downsample:4, Raw to tokens:32, Sample length:262144
Downloading from gce
Restored from /home/correlation4/.cache/jukebox-assets/models/5b/prior_level_1.pth.tar
0: Loading prior in eval mode
Loading artist IDs from /home/correlation4/jukebox/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /home/correlation4/jukebox/jukebox/data/ids/v2_genre_ids.txt
Level:2, Cond downsample:None, Raw to tokens:128, Sample length:1048576
0: Converting to fp16 params
Downloading from gce
Restored from /home/correlation4/.cache/jukebox-assets/models/5b_lyrics/prior_level_2.pth.tar
0: Loading prior in eval mode
Traceback (most recent call last):
  File "jukebox/sample.py", line 275, in <module>
    fire.Fire(run)
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 272, in run
    save_samples(model, device, hps, sample_hps)
  File "jukebox/sample.py", line 240, in save_samples
    ancestral_sample(labels, sampling_kwargs, priors, hps)
  File "jukebox/sample.py", line 123, in ancestral_sample
    zs = _sample(zs, labels, sampling_kwargs, priors, sample_levels, hps)
  File "jukebox/sample.py", line 94, in _sample
    prior.cuda()
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 201, in _apply
    module._apply(fn)
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 201, in _apply
    module._apply(fn)
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 201, in _apply
    module._apply(fn)
  [Previous line repeated 3 more times]
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 223, in _apply
    param_applied = fn(param)
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 1.96 GiB total capacity; 1.83 GiB already allocated; 2.75 MiB free; 1.87 GiB reserved in total by PyTorch)

I then decided to run a small vqvae in order to check if the sampling was effective (because i have no real idea of what this first part should output if it succeed).
Input:
mpiexec -n 1 python jukebox/train.py --hps=small_vqvae --name=small_vqvae --sample_length=262144 --bs=4 --nworkers=4 --audio_files_dir=/home/correlation4/Downloads --labels=False --train --aug_shift --aug_blend
Output:

Using cuda True
0: Found 15 files. Getting durations
0: self.sr=22050, min: 12, max: inf
0: Keeping 15 of 15 files
Traceback (most recent call last):
  File "jukebox/train.py", line 336, in <module>
    fire.Fire(run)
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/correlation4/anaconda3/envs/jukebox2/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/train.py", line 294, in run
    data_processor = DataProcessor(hps)
  File "/home/correlation4/jukebox/jukebox/data/data_processor.py", line 28, in __init__
    hps.bandwidth = calculate_bandwidth(self.dataset, hps, duration=duration)
  File "/home/correlation4/jukebox/jukebox/utils/audio_utils.py", line 28, in calculate_bandwidth
    x = dataset[idx]
  File "/home/correlation4/jukebox/jukebox/data/files_dataset.py", line 96, in __getitem__
    return self.get_item(item)
  File "/home/correlation4/jukebox/jukebox/data/files_dataset.py", line 89, in get_item
    index, offset = self.get_index_offset(item)
  File "/home/correlation4/jukebox/jukebox/data/files_dataset.py", line 60, in get_index_offset
    assert 0 <= midpoint < self.cumsum[-1], f'Midpoint {midpoint} of item beyond total length {self.cumsum[-1]}'
AssertionError: Midpoint 12720168 of item beyond total length 11970543

Is there any link between the fact that CUDA ran out of memory and this error ? It seems like this error comes from somewhere else.

OSError: [Errno 12] Cannot allocate memory

Hi, this looks awesome btw!

I tried running the sample.py command with my own song, but received the following error:

Level:2, Cond downsample:None, Raw to tokens:128, Sample length:1048576
0: Converting to fp16 params
Downloading from gce
Traceback (most recent call last):
  File "jukebox/sample.py", line 237, in <module>
    fire.Fire(run)
  File "/home/axiezai/miniconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/axiezai/miniconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/axiezai/miniconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 234, in run
    save_samples(model, device, hps, sample_hps)
  File "jukebox/sample.py", line 157, in save_samples
    vqvae, priors = make_model(model, device, hps)
  File "/media/rajlab/sachin_data_2/userdata/xihe/jukebox/jukebox/make_models.py", line 185, in make_model
    priors = [make_prior(setup_hparams(priors[level], dict()), vqvae, 'cpu') for level in levels]
  File "/media/rajlab/sachin_data_2/userdata/xihe/jukebox/jukebox/make_models.py", line 185, in <listcomp>
    priors = [make_prior(setup_hparams(priors[level], dict()), vqvae, 'cpu') for level in levels]
  File "/media/rajlab/sachin_data_2/userdata/xihe/jukebox/jukebox/make_models.py", line 169, in make_prior
    restore(hps, prior, hps.restore_prior)
  File "/media/rajlab/sachin_data_2/userdata/xihe/jukebox/jukebox/make_models.py", line 54, in restore
    checkpoint = load_checkpoint(checkpoint_path)
  File "/media/rajlab/sachin_data_2/userdata/xihe/jukebox/jukebox/make_models.py", line 34, in load_checkpoint
    download(gs_path, local_path)
  File "/media/rajlab/sachin_data_2/userdata/xihe/jukebox/jukebox/utils/gcs_utils.py", line 36, in download
    subprocess.call(args)
  File "/home/axiezai/miniconda3/envs/jukebox/lib/python3.7/subprocess.py", line 339, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/home/axiezai/miniconda3/envs/jukebox/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/home/axiezai/miniconda3/envs/jukebox/lib/python3.7/subprocess.py", line 1482, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

I did some googling, and this seems like a swap space issue? I checked and confirmed I had free swap space:

# free -h
              total        used        free      shared  buff/cache   available
Mem:            31G        615M         29G         16M        780M         29G
Swap:          236M         42M        194M
Thu Apr 30 14:50:26 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78       Driver Version: 410.78       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:42:00.0  On |                  N/A |
| 23%   34C    P8    18W / 250W |     76MiB / 12194MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1451      G   /usr/lib/xorg/Xorg                            73MiB |
+-----------------------------------------------------------------------------+

Is 194M not enough? Is there a minimum requirement for swap space that I'm not meeting or is this memory error caused by something else?

How to use newly trained model

To add a new model, the readme states to add the small_vqvae, small_prior... etc into MODELS to make_models.py
But how exactly? should these be relative paths/absolute paths/ the path to the checkpoint files/ "small_prior", or is there something left to do with the checkpoint files?

RuntimeError: Failed to initialize NCCL

Running into

Caught error during NCCL init (attempt 0 of 5): Distributed package doesn't have NCCL built in
Caught error during NCCL init (attempt 1 of 5): Distributed package doesn't have NCCL built in
Caught error during NCCL init (attempt 2 of 5): Distributed package doesn't have NCCL built in
Caught error during NCCL init (attempt 3 of 5): Distributed package doesn't have NCCL built in
Caught error during NCCL init (attempt 4 of 5): Distributed package doesn't have NCCL built in
Traceback (most recent call last):
  File "jukebox/sample.py", line 237, in <module>
    fire.Fire(run)
  File "/Users/user/anaconda/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/Users/user/anaconda/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/Users/user/anaconda/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 229, in run
    rank, local_rank, device = setup_dist_from_mpi(port=port)
  File "/Users/user/Documents/projects/jukebox/jukebox/utils/dist_utils.py", line 86, in setup_dist_from_mpi
    raise RuntimeError("Failed to initialize NCCL")
RuntimeError: Failed to initialize NCCL

Specifically it is stating:

Distributed package doesn't have NCCL built in

From running

python jukebox/sample.py --model=1b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=18/0 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125

Seems to be coming from

def setup_dist_from_mpi(

Here is where it is raised:

for attempt_idx in range(n_attempts):
try:
dist.init_process_group(backend=backend, init_method=f"env://")
assert dist.get_rank() == mpi_rank
use_cuda = torch.cuda.is_available()
print(f'Using cuda {use_cuda}')
local_rank = mpi_rank % 8
device = torch.device("cuda", local_rank) if use_cuda else torch.device("cpu")
torch.cuda.set_device(local_rank)
return mpi_rank, local_rank, device
except RuntimeError as e:
print(f"Caught error during NCCL init (attempt {attempt_idx} of {n_attempts}): {e}")
sleep(1 + (0.01 * mpi_rank)) # Sleep to avoid thundering herd
pass
raise RuntimeError("Failed to initialize NCCL")

Is the problem regarding my python distribution?

Does this work on windows?

I wanted to install this but I cannot figure out how to do it at all, and a lot of the commands seem to be linux libraries so idk. If anyone could tell me how to install it like I'm a 5 year old child that would help lmao

setup.py not working

when i open setup.py it opens for less than a second and then immediately it closes
can someone help

edit: can someone also give a full tutorial on how to install and use (when its installed)
discord: Librastien#4197

Abandon (core dumped) while sampling

I have been running this on Ubuntu 18.04.4 with an NVIDIA GT740M which is not optimal. Regardless of the model used, it always will stop with the same error.

Input:
python jukebox/sample.py --model=1b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=2 --hop_fraction=0.5,0.5,0.125

Output:
Using cuda True {'name': 'sample_5b', 'levels': 3, 'sample_length_in_seconds': 20, 'total_sample_length_in_seconds': 180, 'sr': 44100, 'n_samples': 2, 'hop_fraction': (0.5, 0.5, 0.125)} Setting sample length to 881920 (i.e. 19.998185941043083 seconds) to be multiple of 128 Downloading from gce Restored from /home/XXXX/.cache/jukebox-assets/models/5b/vqvae.pth.tar 0: Loading vqvae in eval mode Conditioning on 1 above level(s) Checkpointing convs Checkpointing convs Loading artist IDs from /home/XXXX/XXXX/jukebox/jukebox/data/ids/v2_artist_ids.txt Loading artist IDs from /home/XXXX/XXXX/jukebox/jukebox/data/ids/v2_genre_ids.txt Level:0, Cond downsample:4, Raw to tokens:8, Sample length:65536 Downloading from gce
Traceback (most recent call last): File "jukebox/sample.py", line 237, in <module> fire.Fire(run) File "/home/XXXX/.conda/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire component_trace = _Fire(component, args, context, name) File "/home/XXXX/.conda/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire component, remaining_args) File "/home/XXXX/.conda/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable result = fn(*varargs, **kwargs) File "jukebox/sample.py", line 234, in run save_samples(model, device, hps, sample_hps) File "jukebox/sample.py", line 157, in save_samples vqvae, priors = make_model(model, device, hps) File "/home/XXXX/XXXX/jukebox/jukebox/make_models.py", line 185, in make_model priors = [make_prior(setup_hparams(priors[level], dict()), vqvae, 'cpu') for level in levels] File "/home/XXXX/XXXX/jukebox/jukebox/make_models.py", line 185, in <listcomp> priors = [make_prior(setup_hparams(priors[level], dict()), vqvae, 'cpu') for level in levels] File "/home/XXXX/XXXX/jukebox/jukebox/make_models.py", line 169, in make_prior restore(hps, prior, hps.restore_prior) File "/home/XXXX/XXXX/jukebox/jukebox/make_models.py", line 54, in restore checkpoint = load_checkpoint(checkpoint_path) File "/home/XXXX/XXXX/jukebox/jukebox/make_models.py", line 37, in load_checkpoint checkpoint = t.load(restore, map_location=t.device('cpu')) File "/home/XXXX/.conda/envs/jukebox/lib/python3.7/site-packages/torch/serialization.py", line 529, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/XXXX/.conda/envs/jukebox/lib/python3.7/site-packages/torch/serialization.py", line 709, in _legacy_load deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly) RuntimeError: unexpected EOF, expected 43488 more bytes. The file might be corrupted. terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount_.load() > 0 INTERNAL ASSERT FAILED at /opt/conda/conda-bld/pytorch_1579040055865/work/c10/util/intrusive_ptr.h:348, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /opt/conda/conda-bld/pytorch_1579040055865/work/c10/util/intrusive_ptr.h:348) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7ff60d6aa627 in /home/XXXX/.conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x14879df (0x7ff61085c9df in /home/XXXX/.conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #2: THStorage_free + 0x17 (0x7ff611024fe7 in /home/XXXX/.conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #3: <unknown function> + 0x5639bd (0x7ff63e9f29bd in /home/XXXX/.conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libtorch_python.so) <omitting python frames> frame #27: __libc_start_main + 0xe7 (0x7ff64dd90b97 in /lib/x86_64-linux-gnu/libc.so.6)

Abandon (core dumped)

I can't tell if this is coming from my GPU not being compatible, and being completely new, I don't have enough knowledge to check every error outputted above. I might be wrong but if the GPU was the issue here, I would not get this error.

Co-composition uses more memory than normal sampling

Currently, I'm attempting to use the colab with my local graphics card (an RTX 2080 Super). Previously, I've successfully used a local copy of this repo (without colab) to generate batches of 3 two-minute samples via 1b_lyrics (the total processing time is approximately 12 hours).

Using the co-composition mode in the colab, I've run into some trouble with CUDA running out of memory. With the default settings, this happens as soon as I attempt to continue creation (expanding upon the first 4 second sample).

I managed to get things moving by adjusting the settings as follows:

model = "1b_lyrics" # or "1b_lyrics"
hps = Hyperparams()
hps.sr = 44100
hps.n_samples = 3 if model=='5b_lyrics' else 3
hps.name = 'co_composer'
hps.sample_length = 1048576 if model=="5b_lyrics" else 1048576 
chunk_size = 16 if model=="5b_lyrics" else 3
max_batch_size = 3 if model=="5b_lyrics" else 3
hps.hop_fraction = [.5, .5, .125] 
hps.levels = 3
vqvae, *priors = MODELS[model]
vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length = hps.sample_length)), device)
top_prior = make_prior(setup_hparams(priors[-1], dict()), vqvae, device)

This worked for a while, but the continuation failed after I had reached 64 seconds of audio (CUDA out of memory on the sixteenth continuation). I decided to go ahead and try upsampling from there, and CUDA then failed to allocate enough memory unless I deleted top_prior as suggested for a hosted runtime in this block:

# Set this False if you are on a local machine that has enough memory (this allows you to do the
# lyrics alignment visualization). For a hosted runtime, we'll need to go ahead and delete the top_prior
# if you are using the 5b_lyrics model.
if True:
  del top_prior
  empty_cache()
  top_prior=None

Given that I've previously been successful in locally generating batches of 3 two-minute samples (including lyric alignment visualization), this trouble with memory seems strange. Is there some setting optimization that I can make in order to continue co-composition past 64 seconds, and maybe even upsample with lyric visualization included?

AttributeError while running sample.py: 'torch.distributed' has no attribute 'ReduceOp'

Following up the readme.md instructions and when trying to run sample.py I get the following error:

Traceback (most recent call last):
File "jukebox/sample.py", line 7, in
from jukebox.utils.audio_utils import save_wav, load_audio
File "c:\windows\system32\jukebox\jukebox\utils\audio_utils.py", line 6, in
from jukebox.utils.dist_utils import print_once
File "c:\windows\system32\jukebox\jukebox\utils\dist_utils.py", line 22, in
def allreduce(x, op=dist.ReduceOp.SUM):
AttributeError: module 'torch.distributed' has no attribute 'ReduceOp'

OS: Windows10

Prompt with your own music in colab

Could you please update your Colab with code to prompt generation with existing audio file? I tried to do it myself but turned out it way over my head.

Abstracting from recording quality?

Full disclosure: I have no qualifications in the field of AI, I'm just following the evolution of it, out of interest, as a programmer and former musician.

Just wanted to share something I've been noticing about the generated samples. Might be useful, might not. πŸ™‚

If you listen to generated samples of Aretha Franklin and Frank Sinatra in particular, it's especially obvious: the model seems to have learned "recording quality", of vocals in particular, and it applies this sporadically.

Within moments of the same sample, you can hear the recording quality of the vocals changing drastically - likely reflecting the fact that Aretha and Frank both recorded over long periods of time, so the quality of the media, microphone, mastering etc. changed and produced quite radically different recordings, which the model seems to switch between at random, every word or so.

This is much less evident in, for example, some of the modern pop music recordings, as these recordings are generally much more streamlined according to current trends. (To be poignant, a lot of pop music sounds "the same".)

I'm wondering if you could add this to the model? At it's simplest, maybe start by incorporating the recording year, data that should be easy to obtain - whereas something like the original recording medium or microphone type is probably almost impossible to obtain, but equipment trends roughly follow the years, so the year alone might be enough to make a difference.

Imagine being able to ask for a recording of modern rap or pop artists in 40s quality, or Aretha Franklin in 2020. πŸ™‚

Anyhow, interesting project! Cheers.

PackagesNotFoundError: The following packages are not available from current channels

When running conda install mpi4py=3.0.3 I am getting this error:


Collecting package metadata: done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - mpi4py=3.0.3

Current channels:

  - https://repo.anaconda.com/pkgs/main/win-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/free/win-64
  - https://repo.anaconda.com/pkgs/free/noarch
  - https://repo.anaconda.com/pkgs/r/win-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/msys2/win-64
  - https://repo.anaconda.com/pkgs/msys2/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

I am on windows 10, with python 3.8.2 and conda 4.6.14

Hello need advice please!

Hey guys... musicians and programs. So This is a big breakthrough as I’m sure most here recognize. I was wondering if I can download and run this on my computer? Also is there anything to watch out for virus wise or malware wise on the links? I’m very interested in this... I apologies if I sound strange as my English isn’t so good. I hope to find out and hear from you soon.

Missing hyperparams example for newly trained top-level prior

The documentation reads:

You can then run sample.py with the top-level of our models replaced by your new model. To do so, add an entry my_model in MODELs (in make_models.py) with the (vqvae hps, upsampler hps, top-level prior hps) of your new model, and run sample.py with --model=my_model.

Following these directions leads to a hparams.py Key Error for the missing top-level prior definition. Using small_prior's hps leads to an error: Expecting (genre, artist) bins, got {y_bins}. Adding a y_bins: (0,0) param leads to further issues with the labeling.

It isn't quite clear how to transition between training a model, specifying the path to this model, and getting sound out.

GPU out-of-memory on second iteration of co-composing 1B

The GPU keeps hitting out-of-memory errors on the 2nd iteration of co-composing using 1B model.
Are there some memory-management steps (cache, deleting un-wanted samples) in-between co-composing iterations that can free-up memory without breaking the chain of desired conditioning?

Training new artists / genres on colab

Is it possible to train on new artists or genres on colab? For example, if I'd like to create samples in the style of an artist that wasn't in the original dataset and therefore has no artist id tag.

Sampling with multiple GPUs?

I am running on a machine at home with 2x8GB GPU and the 5b_lyrics model runs out of GPU memory, but it appears to only be using device 0. Is there a way to distribute the sampling across 2 physical GPUs?

Regarding the copyrights of the model weights

Hi, thank you for the code/models, the work is very interesting!
I'd like to know regarding the copyrights on model weights. If in case you have trained the models on copyrighted music, are we allowed to use the weights for commercial purposes (since MIT license allows commercial usage)?
Thank you!

Training Issue - AssertionError: Midpoint 42164118 of item beyond total length 38873664

I'm trying to train a new model using the provided instructions but no matter how many wav files I put in or what length they are I always get the error below:

Traceback (most recent call last):
File "jukebox/train.py", line 336, in
fire.Fire(run)
File "/home/anton/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/anton/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/anton/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "jukebox/train.py", line 294, in run
data_processor = DataProcessor(hps)
File "/home/anton/Documents/deep/jukebox/jukebox/data/data_processor.py", line 28, in init
hps.bandwidth = calculate_bandwidth(self.dataset, hps, duration=duration)
File "/home/anton/Documents/deep/jukebox/jukebox/utils/audio_utils.py", line 28, in calculate_bandwidth
x = dataset[idx]
File "/home/anton/Documents/deep/jukebox/jukebox/data/files_dataset.py", line 96, in getitem
return self.get_item(item)
File "/home/anton/Documents/deep/jukebox/jukebox/data/files_dataset.py", line 89, in get_item
index, offset = self.get_index_offset(item)
File "/home/anton/Documents/deep/jukebox/jukebox/data/files_dataset.py", line 60, in get_index_offset
assert 0 <= midpoint < self.cumsum[-1], f'Midpoint {midpoint} of item beyond total length {self.cumsum[-1]}'
AssertionError: Midpoint 42164118 of item beyond total length 38873664

Running this on a 11GB 1080Ti and had a few CUDA errors before, but with pretrained samples it eventually worked. Here though, it seems to be an error with midpoint being calculated wrong, but despite reading through the code, I can't figure out what's wrong

Songs listed as "unseen lyrics" have lyrics from real life songs

Many songs listed in the "unseen lyrics" collection have almost exact copies of the lyrics from real life songs.

Some examples:

image

That's most of Family Table, by Zac Brown Band.

image

Same again.

image

That's from Spaceman, by 4 Non Blondes.

And so on.

So there are some serious overfitting issues in the lyrics generation, and they weren't noticed.

hey help when runnig the first sample test

β–Ά python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125

i get this error
`
Using cuda True
{'name': 'sample_5b', 'levels': 3, 'sample_length_in_seconds': 20, 'total_sample_length_in_seconds': 180, 'sr': 44100, 'n_samples': 6, 'hop_fraction': (0.5, 0.5, 0.125)}
Setting sample length to 881920 (i.e. 19.998185941043083 seconds) to be multiple of 128
Downloading from gce
Restored from /home/jacos/.cache/jukebox-assets/models/5b/vqvae.pth.tar
0: Loading vqvae in eval mode
Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /home/jacos/jukebox/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /home/jacos/jukebox/jukebox/data/ids/v2_genre_ids.txt
Level:0, Cond downsample:4, Raw to tokens:8, Sample length:65536
Downloading from gce
Traceback (most recent call last):
File "jukebox/sample.py", line 237, in
fire.Fire(run)
File "/home/jacos/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/jacos/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/jacos/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "jukebox/sample.py", line 234, in run
save_samples(model, device, hps, sample_hps)
File "jukebox/sample.py", line 157, in save_samples
vqvae, priors = make_model(model, device, hps)
File "/home/jacos/jukebox/jukebox/make_models.py", line 185, in make_model
priors = [make_prior(setup_hparams(priors[level], dict()), vqvae, 'cpu') for level in levels]
File "/home/jacos/jukebox/jukebox/make_models.py", line 185, in
priors = [make_prior(setup_hparams(priors[level], dict()), vqvae, 'cpu') for level in levels]
File "/home/jacos/jukebox/jukebox/make_models.py", line 169, in make_prior
restore(hps, prior, hps.restore_prior)
File "/home/jacos/jukebox/jukebox/make_models.py", line 54, in restore
checkpoint = load_checkpoint(checkpoint_path)
File "/home/jacos/jukebox/jukebox/make_models.py", line 37, in load_checkpoint
checkpoint = t.load(restore, map_location=t.device('cpu'))
File "/home/jacos/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/jacos/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/serialization.py", line 709, in _legacy_load
deserialized_objects[key].set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 113540 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount
.load() > 0 INTERNAL ASSERT FAILED at /opt/conda/conda-bld/pytorch_1579040055865/work/c10/util/intrusive_ptr.h:348, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /opt/conda/conda-bld/pytorch_1579040055865/work/c10/util/intrusive_ptr.h:348)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7fbd602ab627 in /home/jacos/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x14879df (0x7fbd6345d9df in /home/jacos/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: THStorage_free + 0x17 (0x7fbd63c25fe7 in /home/jacos/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: + 0x563a9d (0x7fbd915f3a9d in /home/jacos/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #27: __libc_start_main + 0xf3 (0x7fbd9fb34153 in /usr/lib/libc.so.6)

[1] 30984 abort (core dumped) python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3
`

Corrupted 1b_lyrics checkpoint?

Have the same issue on local machine (Ubuntu 20.04, 1080Ti, Anaconda, python 3.7, all installed as in readme) and on Google CoLab.

When fetching checkpoint for 1b_lyrics model and try to start:

(jukebox) desm0nt@desm0nt-linux:~/jukebox$ python jukebox/sample.py --model=1b_lyrics --name=sample_1b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=4 --hop_fraction=0.5,0.5,0.125
Using cuda True
{'name': 'sample_1b', 'levels': 3, 'sample_length_in_seconds': 20, 'total_sample_length_in_seconds': 180, 'sr': 44100, 'n_samples': 4, 'hop_fraction': (0.5, 0.5, 0.125)}
Setting sample length to 881920 (i.e. 19.998185941043083 seconds) to be multiple of 128
Downloading from gce
Restored from /home/desm0nt/.cache/jukebox-assets/models/5b/vqvae.pth.tar
0: Loading vqvae in eval mode
Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v2_genre_ids.txt
Level:0, Cond downsample:4, Raw to tokens:8, Sample length:65536
Downloading from gce
Restored from /home/desm0nt/.cache/jukebox-assets/models/5b/prior_level_0.pth.tar
0: Loading prior in eval mode
Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v2_genre_ids.txt
Level:1, Cond downsample:4, Raw to tokens:32, Sample length:262144
Downloading from gce
Restored from /home/desm0nt/.cache/jukebox-assets/models/5b/prior_level_1.pth.tar
0: Loading prior in eval mode
Creating cond. autoregress with prior bins [79, 2048], 
dims [384, 6144], 
shift [ 0 79]
input shape 6528
input bins 2127
Self copy is False
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v3_artist_ids.txt
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v3_genre_ids.txt
Level:2, Cond downsample:None, Raw to tokens:128, Sample length:786432
Downloading from gce
Traceback (most recent call last):
  File "jukebox/sample.py", line 237, in <module>
    fire.Fire(run)
  File "/home/desm0nt/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/desm0nt/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/desm0nt/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 234, in run
    save_samples(model, device, hps, sample_hps)
  File "jukebox/sample.py", line 157, in save_samples
    vqvae, priors = make_model(model, device, hps)
  File "/home/desm0nt/jukebox/jukebox/make_models.py", line 185, in make_model
    priors = [make_prior(setup_hparams(priors[level], dict()), vqvae, 'cpu') for level in levels]
  File "/home/desm0nt/jukebox/jukebox/make_models.py", line 185, in <listcomp>
    priors = [make_prior(setup_hparams(priors[level], dict()), vqvae, 'cpu') for level in levels]
  File "/home/desm0nt/jukebox/jukebox/make_models.py", line 169, in make_prior
    restore(hps, prior, hps.restore_prior)
  File "/home/desm0nt/jukebox/jukebox/make_models.py", line 54, in restore
    checkpoint = load_checkpoint(checkpoint_path)
  File "/home/desm0nt/jukebox/jukebox/make_models.py", line 37, in load_checkpoint
    checkpoint = t.load(restore, map_location=t.device('cpu'))
  File "/home/desm0nt/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/desm0nt/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/serialization.py", line 709, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 61312207 more bytes. The file might be corrupted.
corrupted double-linked list
Aborted (core dumped)

sample.py throws ModuleNotFoundError

Hi,
this might be easy to fix, I am just missing a detail in the configuration.
After installation without errors, the example code for sampling doesn't run.

python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125

throws error:

Traceback (most recent call last):
  File "./sample.py", line 1, in <module>
    import jukebox
ModuleNotFoundError: No module named 'jukebox'

I understand that sample.py is in the jukebox folder, so I need to execute python jukebox/sample.py one level above - but then of course, jukebox cannot be imported.
But how is jukebox supposed to be found as module?
Doesn't sample.py have to be outside of the jukebox folder?

Problem running installation instructions

When I run this:

conda install pytorch=1.4 torchvision=0.5 cudatoolkit=10.0 -c pytorch

I get:

PackagesNotFoundError: The following packages are not available from current channels:

  - cudatoolkit=10.0
Current channels:

  - https://conda.anaconda.org/pytorch/osx-64
  - https://conda.anaconda.org/pytorch/noarch
  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://conda.anaconda.org/conda-forge/osx-64
  - https://conda.anaconda.org/conda-forge/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

I did as the error suggested but I couldn't find the package cudatoolkit v10+ for OS X. (https://anaconda.org/anaconda/cudatoolkit)

Any ideas on how to solve this?

Not an issue - avenue of research to have separate render heads(vocals / piano / strings/ drums) from deezer/spleeter

It must have taken a lot of gpu power to render existing audio tracks - some of them are remarkable.
Was any thought given as next step - to leverage breakthrough work from @deezer / spleeter to integrate into a pipeline of ai to leverage the breaking down of tracks eg. bass / drums / vocals other....I currently have spleeter installed - and can give it ANY mp3 file - and it will spit out corresponding tracks segmented.

It seems like without retraining - you could just take their dataset / all the songs they use / - and trained models - and concatenate this into jukebox somehow....I guess my question is how....

I'm not asking anyone from openai to address this - but surely this would help propel the rendering of a higher quality tracks. This reminds me of a symbolication conversation that creeps up in ai with hinton. do we add rules to the engine - or just make it a black box that spits out the answer.

I'm imagining some kind of logic that would orchestrate audio rendering and it would say - we need a drum here on this track at this point - which jukebox does - but it's not necessarily / cleanly done and the audio quality is a bit inferior.

In the training - perhaps it needs an understanding of / oh this song that I'm learning uses drums / vocals / piano / strings here - and less so here - then we know to spit out these audio tracks here....

Thinking out-loud - given Katy Perry track here - ( which I was blown away by)
https://soundcloud.com/openai_audio/jukebox-novel_lyrics-78968609
so you take Katy Perry catalogue -
-> you build 5 separate models specific to Katy Perry / drums / vocals / strings / piano somehow you condition each model to the song / or artist....(I don't know)

At this step - you'd have no problem spitting out Katy Perry style drums. But then tying that back to the - give me an entire katy perry track / that track needs to understand what's going on in the other layers - you would need to condition the parallel output to orchestrator (you need drums / but also vocals / and string) etc on an artist.

I did see the midi conditioning - perhaps - it's related to that in how the data is trained.
if there was any light you could shed on conditioning the midi that maybe relevant - that would be awesome.

perhaps another line of thinking could be during backward propagation when testing for errors - it could use splitter to separate the tracks - then have a say in quality of output at each level. probably too slow though.

Lyrics Conditioning

Is the lyrics Conditioning only at training time or is it possible to seed the model with own lyrics in the same way as is possible with wav input here

python jukebox/sample.py --model=5b_lyrics --name=sample_5b_prompted --levels=3 --mode=primed --audio_file=path/to/recording.wav,awesome-mix.wav,fav-song.wav,etc.wav --prompt_length_in_seconds=12 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.