hangzhaomit / sound-of-pixels Goto Github PK

View Code? Open in Web Editor NEW

365.0 15.0 74.0 1.27 MB

Codebase for ECCV18 "The Sound of Pixels"

Home Page: http://sound-of-pixels.csail.mit.edu

License: MIT License

Python 96.66% Shell 3.34%

cross-modality computer-vision sound-separation self-supervised-learning

sound-of-pixels's Issues

where is the pixelwise sound

Hi, I saw the func: forward_pixelwise in the code synthesizer, this is the one version of forward function that produce pixel-wise mask. However, throughout the code, and I found only the foward func is invoked but it is not the one of pixel-wise sound. Is there any demo that can produce pixel-wise sound?

Add requirements.txt

Downloading the videos from JSON file

In the JSON file mentioned, there are a number of youtube IDs. I wanted to ask, do we need to manually download them from the said file, or is there any other better way? Also, how to extract frames and the audio signal at the desired rates? I have never used JSON formats before, so please excuse my ignorance. Some guidance would be helpful.

Cannot download the trained model

Hello. I have tried to download the trained model, but I failed to download the model by running the file 'download_trained_model.sh'. And I have also tried to access the website of the model "http://sound-of-pixels.csail.mit.edu/release/", but I got the reply "You don't have permission to access /release/ on this server.". So, I cannot get the trained model. How can I solve that problem?
Thanks a lot.

Failed to loading frames/audio

Sir, first i created .csv files, in the csv files it is showing what inputs are there and it's paths also. but during training it is showing failed to load frames/audio.

Download / pre-process data

Seeing from the issues, this is a common request... How do we download the video files using the .JSON file? How can we pre-process to extract the downloaded videos into the format :

Are there any scripts provided for these ? Thanks.

Dataset Structure

While downloading the duet videos, do we need to make a separate folder like xylophone flute or do we need to put the same video in two separate folders xylophone and flute?

Calculate the evaluation index as zero

When I first calculated the evaluation index using an ideal binary mask, all the indices were zero. Through debugging, it is found that the predicted masks are all less than 0.5. I don't know how to solve this problem, or is this the first evaluation has not been trained, so the result is not good?

Poor visualizations, getting zero SDR, SIR, etc. on evaluation

I was trying to evaluate on 16 videos using downloaded trained model but I am unable to see the results in visualization. Video1 and video2 have only 3 frames each with no audio and predicted audio are also silent.

I'm getting the following output after evaluation:

Loading weights for net_frame
Loading weights for net_synthesizer
samples: 6300
samples: 16
1 Epoch = 196 iters
Evaluating at 0 epochs...
[Eval] iter 0, loss: 0.0115
[Eval Summary] Epoch: 0, Loss: 0.0115, SDR_mixture: 0.0000, SDR: 0.0000, SIR: 0.0000, SAR: 0.0000
Plotting html for visualization...
Evaluation Done!

Hope I would get some help
Thanks

A Question on Evaluation

Hello, I am a Chinese student.
I have downloaded two solo videos（2P83WJXifEs and 3d1b4UH43-E）from 'val.csv' to evaluate the performance of the model. Finally, loss is 0.5479. The effect of each speech separation is very unsatisfactory. Why is that? hope to get your reply.

P.s. I have download the trained model weights for evaluation by:
> ./ scripts / download_trained_model.sh
and I Evaluate the trained model performance by:
> ./ scripts / eval_MUSIC.sh

Why the model does not go training?

Hello, I am a Chinese student.
I have pre-processed the dataset, and use the train_MUSIC.sh to train the default model.
But the result is not what I supposed. The metrics is all 0.
Even I directly use the eval_MUSIC.sh (I have downloaded the trained model), I also get the 0 metics(SDR ,SIR, .etc).
I don't change the code that you submit in github.
So how can I find what the problem is?

About duet and mixtures video

I evaluate the trained model performance by the trained model weights u provided.
I find that the trained model use the Mix-and-Seperate process and finally restruct the two audios by inputing two solo videos,. This is a validation part.
And how about the Test part about duet video?
I am interested in research on sound source localization and separation of natural duo videos.
Should I train the model from scratch？ Or could I still use the trained model u provided？
Could u give me some suggestions please?
Thank u~ I'm looking forward to your reply.

Audio separation only

Can I use this only 'Audio Separation'?

hangzhaomit / sound-of-pixels Goto Github PK

sound-of-pixels's Issues

where is the pixelwise sound

Add requirements.txt

Downloading the videos from JSON file

Cannot download the trained model

Failed to loading frames/audio

Download / pre-process data

Dataset Structure

Calculate the evaluation index as zero

Poor visualizations, getting zero SDR, SIR, etc. on evaluation

A Question on Evaluation

Why the model does not go training?

About duet and mixtures video

Audio separation only

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs