minzwon / sota-music-tagging-models Goto Github PK

View Code? Open in Web Editor NEW

396.0 396.0 64.0 436.62 MB

License: MIT License

Python 100.00%

sota-music-tagging-models's People

Contributors

Stargazers

Watchers

sota-music-tagging-models's Issues

Is there [Short-chunk CNN + Res] pretrained weight about MTAT dataset?

Thanks to your great work! Is there [Short-chunk CNN + Res] pretrained weight?

I want to run it, because it is one of outperformed models in MTAT dataset.

but i can not find pretrained weight about MTAT dataset.

preprocessing\mtat_read.py has mix of tabs and spaces for indentation

Problems training on jamendo - RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0

Hi, thanks for this awesome project! I'm trying to train it with jamendo-moodtheme tags but I'm getting an error.

I'm trying it on a google colab VM with a cuda enabled GPU.

I downloaded the mel-spectrograms from the jamendo repository specyfing the melspecs data type and autotagging_moodtheme dataset. Then in this project I just replaced the TAGS variables in the code with this and the tsv files with the moodtheme ones from here.

Everything looked fine but for some reason I'm receiving the attached error after running the training code.

The mel spectrograms have 92 bands and different lengths, that might be causing problems maybe?

Let me know if anyone knows what might be the problem :)

Thanks in advance!

# My code
%tensorflow_version 1.x
%cd /content/sota-music-tagging-models/src/
!python -u main.py --data_path /content/data --dataset jamendo-mood

My error message

Namespace(batch_size=16, data_path='/content/data', dataset='jamendo-mood', log_step=20, lr=0.0001, model_load_path='.', model_save_path='./../models', model_type='hcnn', n_epochs=200, num_workers=0, use_tensorboard=1)
Traceback (most recent call last):
  File "main.py", line 61, in <module>
    main(config)
  File "main.py", line 39, in main
    solver.train()
  File "/content/sota-music-tagging-models/src/solver.py", line 172, in train
    for x, y in self.data_loader:
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 346, in __next__
    data = self.dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 80, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 80, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 65, in default_collate
    return default_collate([torch.as_tensor(b) for b in batch])
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 20546 and 9168 in dimension 2 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

Bug running fcn with mtat

result = self.forward(*input, **kwargs)
RuntimeError: builtins: link error: Invalid value
The above operation failed in interpreter, with the following stack trace:

The above operation failed in interpreter, with the following stack trace:

Any idea what the problem is?

would love to run this

replicate says it's using outdated cog

the difference between musicnn and music600

thanks for your work. it's very cool. I want to known the difference between musicnn and music600. thanks in advance.

I have a question

Your model has Mel spectrogram transform.
So if model's forward, that's also work.

I have one question.

Used already makes npy(mel spectrogram) and model using npy vs When the model is operated, make it into a mel spectrogram( your code)
Upper two situation, is there difference?
and how about harmonic transform situation?

Old module versions in requirements.txt

torch 1.2.0? Please. Just remove all "==" and it installs better.

My roc_auc is higher than reported

I train harmonicnn on MTAT, where MTAT is downloaded from https://github.com/Spijkervet/CLMR/blob/master/clmr/datasets/magnatagatune.py
I trained 4 times, the evaluation outputs are:
loss: 0.1400
roc_auc: 0.9151
pr_auc: 0.4636

loss: 0.1399
roc_auc: 0.9157
pr_auc: 0.4666

loss: 0.1405
roc_auc: 0.9155
pr_auc: 0.4617

loss: 0.1403
roc_auc: 0.9153
pr_auc: 0.4643

loss: 0.1402
roc_auc: 0.9148
pr_auc: 0.4641

the roc_auc is much higher than the reported 0.9126.
Did I miss something ?

Data splitting of MTAT dataset

Hi, thank you for your great work and for building benchmark results for all the representative auto-tagging models. I have a further question on the data splitting of the MTAT dataset.

In the SMC paper, you mentioned that you did not discard the tracks with no associate labels (which might lead to performance decay). However, both the split npy files in this repo and also the split files in this repo you referred to in the SMC paper discard those tracks away. Could I kindly ask whether the results are based on the cleaned version of the dataset which discard those tracks?

For your reference, the original version should have 18706 tracks for training, 1825 for validation, and 5329 for testing (25860 in total). The clean version should have 15247 for training, 1529 for validation, and 4332 for testing (21108 in total).

How to get the audio files of MSD?

I tried to find a way to download audio files of MSD on millionsongdataset.com but failed. How do I get the audio files of MSD, if I may ask?

Do you have the best_model.pth of HarmonicCNN model in MTG data

the best_model.pth of HarmonicCNN does not exist, can you sent one?

Dataset paths

You should either mention that this project requires a very specific structure of dataset directories, and what it is, or treat YOUR_DATA_PATH as a top level dir for a given dataset.

Only one class present in y_true. ROC AUC score is not defined in that case.

Hi,
The model runs, but an error occurs when calculating roc-auc. It says that there is only one label for y_true, and I don't know what the problem is.
I ran the fcn and musicnn models with the jamendo dataset(autotagging-moodtheme), and both gave the same error.
The error message is as below.
(I configured the same environment according to the requirements.)
Could you help me to solve this problem?

Traceback (most recent call last):
File "main.py", line 59, in
main(config)
File "main.py", line 37, in main
solver.train()
File "/nas2/epark/sota-music-tagging-models/training/solver.py", line 182, in train
best_metric = self.validation(best_metric, epoch)
File "/nas2/epark/sota-music-tagging-models/training/solver.py", line 258, in validation
roc_auc, pr_auc, loss = self.get_validation_score(epoch)
File "/nas2/epark/sota-music-tagging-models/training/solver.py", line 316, in get_validation_score
roc_auc, pr_auc = self.get_auc(est_array, gt_array)
File "/nas2/epark/sota-music-tagging-models/training/solver.py", line 244, in get_auc
roc_aucs = metrics.roc_auc_score(gt_array, est_array, average=None)
File "/home/epark/anaconda3/envs/music/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 375, in roc_auc_score
sample_weight=sample_weight)
File "/home/epark/anaconda3/envs/music/lib/python3.7/site-packages/sklearn/metrics/_base.py", line 120, in _average_binary_score
sample_weight=score_weight)
File "/home/epark/anaconda3/envs/music/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 221, in _binary_roc_auc_score
raise ValueError("Only one class present in y_true. ROC AUC score "
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

input length of short-chunk CNN

hi,
why is the input length of the short-chunk CNN exactly 59049 samples or 3.69 seconds?
thanks in advance,
hannes

Clarification on dataset format(s)

I have two goals:

run inference from some of the pre-trained models on my own dataset
train a new model on my own dataset(s)

In both cases I am having trouble due to the dataset formats; it seems that the scripts require a very specific format of dataset which is not really detailed in the readme. If you could provide any clarification on how our datasets should be formatted, this would be greatly appreciated! Thanks in advance.

Where can we find the pretrained models ?

Are pretrained models available to use them?

Error in loading the model during the training process

Hi, I'm trying to retrain the ShortChunkCNN model for MagnaTagATune dataset. The training process is error-ring out at the 80th epoch, when the self.load is called from the opt_schedule:

RuntimeError: Error(s) in loading state_dict for ShortChunkCNN:
        size mismatch for spec.mel_scale.fb: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([257, 128]).

Can you tell me why this is happening? Also, can you tell me why the following snippet of code is required in self.load?

if 'spec.mel_scale.fb' in S.keys():
            S['spec.mel_scale.fb'] = torch.tensor([])

Thanks for the nice paper, and neat code!

the small data size training.

Thanks for your work. I trained the musicnn as your suggestion because I only have 1K train data. I finetune the musicnn last ten layer with the pretrained model you provided which is trained with MSD dataset. the result show me that:

Does that look right ? thanks in advace.

minzwon / sota-music-tagging-models Goto Github PK

sota-music-tagging-models's People

Contributors

Stargazers

Watchers

Forkers

sota-music-tagging-models's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs