google-research / sound-separation Goto Github PK

View Code? Open in Web Editor NEW

618.0 27.0 118.0 58.61 MB

License: Apache License 2.0

Python 75.42% Shell 5.60% CSS 0.90% HTML 18.08%

sound-separation's Introduction

Sound Separation

Open-source datasets and deep learning models for separating sounds.

Datasets

Models

sound-separation's People

Stargazers

Watchers

Forkers

github30 vermashivam679 mpariente appleholic nguyenducnhaty tejamoy won21kr templeblock linhduongtuan abdulrahim-a wodole amirstudy alongwithyou milkigit cahuja1992 doytsujin sundy1219 augiot rmithyx mayite diruo001 xiongmaoxia sparkingarthur bob-hu x-ccs b-xiang hoomns maoxin7676 coderpriya linpolly ourislam xingws neozian feizi pashovt breizhn ankitshah009 tuyenbk jsalt2020-asrdiar fsted jiltseb pigee qingshanxiaozi muskanmahajan37 schnjaso2 blank-wang idantony road2018 xzwy rishabhguptantu xiaozhuo12138 wsb929 ysg-great guokejian windstudent 965694547 zerkzerkzerk danhuixie songtaoshi jtabibito xiaofei-wang jiaocq1972 wonderwrj kimyungyeo marciopuga bugmany shafiq-islam-cse karumik guillerecio10 isabella232 under-funk realningzheng joelqai mcusi aigyan lkaig afarkasattila bhban2 dlrudco kimwoonggon hvt1609 cjw-mapu perrettymea python-repository-hub ethanweed sripathisridhar ibinti dilettante-choi mddct zaouk wangfeiyan12 ma5onic samliu ishine aliang-cn eric891224 mickey-stone jinmingche xjohnxjohn nkundiushuti

sound-separation's Issues

about package version

What version does tensorflow/cuda use?
need other package?

Missing train.py file ?

Hey,

Thanks for the ressources !

In the MixIT training recipes, a train file should be added with the implementation of mixit, signal_transformer, consistency, etc..

For example so that those imports are resolvable, and we can reproduce the results.

Could you please share it ?

Thanks !
Manu

Separate tarball for evaluation?

If I'm not mistaken, if we run the pipeline with data augmentation, we currently end up without evaluation data.
This means that, to evaluate the system trained with augmented data, we need to additionally download the dry or reverberated mixtures, depending on the task.

While downloading the tarball with train/valid/eval doesn't take so much space, maybe it would make sense to have separate tarball for dry and reverberated evaluation datasets.

Dataloss error

Not able to load the pretrained model using the inference.py file. Execution stops with dataloss error.. How can i solve this issue.

Can you provide the bird song dataset of bird_MixIT, thank you very much!

Problem with reverb

Hi there,

There is a problem of alignment in reverberate_and_mix that translate the audio when applying the reverb.
This is a problem about the alignment with the labels.

I'm putting this issue here since multiple people asked to use this work: https://hal.inria.fr/hal-02891700 and this work should be updated with the reverb fixed.
The best way to update https://github.com/turpaultn/dcase20_task4 is to update sound-separation repo and I pull the last version.

I want to use your model on the iPhone

But converting it with tf_coreml, it tells erros like : NotImplementedError: Unsupported Ops of type: PadV2,GatherV2,RFFT,ComplexAbs,IRFFT.

Do you have a way converting the model to core_ml format? or give me some guide lines.

appreciate .

How can I run the code to obtain the separated files using only the FUSS dataset?

I'm trying to obtain the output files of the separated sounds using only the FUSS dataset and I don't really know how, can somebody help me?
Thanks!

help need linpoly

hi linpoly need your help , can you give me your contact id or somewhere where i can contact you .

No Checkpoint found in Model

After downloading the birdsong separation model using the gsutil command, I only get a meta, index, and data file. There is no checkpoint file which causes checkpoint_path = tf.train.latest_checkpoint('bird_mixit_model_checkpoints/output_sources4/') to return None.

gsutil command:

gsutil -m cp -r \
  "gs://gresearch/sound_separation/bird_mixit_model_checkpoints" .

files downloaded:

bird_mixit_model_checkpoints
    LICENSE
    README
    output_sources8
        model.ckpt-2178900.index
        inference.meta
        model.ckpt-2178900.data-00000-of-00001
    output_sources4
        model.ckpt-3223090.index
        model.ckpt-3223090.data-00000-of-00001
        inference.meta

Question about the number of masks

I already study about the sound-separation papers, like "UNIVERSAL SOUND SEPARATION" and "Conv-TasNet".
But, I have a question about the masks.
How do you decide on the number of masks?
Is always be four in this competition?
If a mixture wav only has 2 sources, is final masks only two has value, the other two masks are zero?
I don't have any idea about it.

train_sed+ss_baseline

In the dcase2020_desed_fuss_baseline ,when I try to run this command ./make_baseline_file_lists.sh, I don't know the correct value of the "DESED_ROOT_DIR" in the setup.sh.May I ask the details of the DESED_ROOT_DIR?Thank you

wrong

wrong section

.

jams annotation order vs filenames?

Hey, I'm confused about whether the order of the FUSS jams annotations relates to the filenames of the separated sources (eg., background0, foreground0, foreground1, etc.)

I downloaded the dry ssdata from zenodo. For some examples, it seems like the order of the annotations in the JAMS file (ordered by time I believe) is different than the ordering of the foreground sounds in the filenames. For example:

>>> m = jams.load("./ssdata/train/example13537.jams")
>>> m["annotations"][0].data
SortedKeyList([Observation(time=0.0, duration=10.0, value={'label': 'sound', 'source_file': '/data/DCASE2020/fsd_data/train/sound/155571.wav', 'source_time': 1.3854725120886693, 'event_time': 0, 'event_duration': 10.0, 'snr': 0, 'role': 'background', 'pitch_shift': None, 'time_stretch': None}, confidence=1.0), 
Observation(time=0.6711880000000008, duration=9.328812, value={'label': 'sound', 'source_file': '/data/DCASE2020/fsd_data/train/sound/372821.wav', 'source_time': 0.0, 'event_time': 0.6711880000000008, 'event_duration': 9.328812, 'snr': -0.3880649923842725, 'role': 'foreground', 'pitch_shift': None, 'time_stretch': None}, confidence=1.0), 
Observation(time=1.6686094265599583, duration=4.748812, value={'label': 'sound', 'source_file': '/data/DCASE2020/fsd_data/train/sound/375026.wav', 'source_time': 0.0, 'event_time': 1.6686094265599583, 'event_duration': 4.748812, 'snr': 23.017906447229286, 'role': 'foreground', 'pitch_shift': None, 'time_stretch': None}, confidence=1.0), 
Observation(time=3.5576066287429065, duration=1.59025, value={'label': 'sound', 'source_file': '/data/DCASE2020/fsd_data/train/sound/349792.wav', 'source_time': 0.0, 'event_time': 3.5576066287429065, 'event_duration': 1.59025, 'snr': 12.441491562340214, 'role': 'foreground', 'pitch_shift': None, 'time_stretch': None}, confidence=1.0)], key=<bound method Annotation._key of <class 'jams.core.Annotation'>>)

But looking at the separated sources, it looks like foreground1 is the sound that begins at 3.5sec whereas foreground2 begins at 1.66s (whereas I expected the opposite)

I'm wondering if I'm missing how to order jams annotations so that they will consistently match up with the indexes in the filenames? thanks so much!!