hangzhaomit / sound-of-pixels Goto Github PK
View Code? Open in Web Editor NEWCodebase for ECCV18 "The Sound of Pixels"
Home Page: http://sound-of-pixels.csail.mit.edu
License: MIT License
Codebase for ECCV18 "The Sound of Pixels"
Home Page: http://sound-of-pixels.csail.mit.edu
License: MIT License
Hi, I saw the func: forward_pixelwise in the code synthesizer, this is the one version of forward function that produce pixel-wise mask. However, throughout the code, and I found only the foward func is invoked but it is not the one of pixel-wise sound. Is there any demo that can produce pixel-wise sound?
In the JSON file mentioned, there are a number of youtube IDs. I wanted to ask, do we need to manually download them from the said file, or is there any other better way? Also, how to extract frames and the audio signal at the desired rates? I have never used JSON formats before, so please excuse my ignorance. Some guidance would be helpful.
Hello. I have tried to download the trained model, but I failed to download the model by running the file 'download_trained_model.sh'. And I have also tried to access the website of the model "http://sound-of-pixels.csail.mit.edu/release/", but I got the reply "You don't have permission to access /release/ on this server.". So, I cannot get the trained model. How can I solve that problem?
Thanks a lot.
Sir, first i created .csv files, in the csv files it is showing what inputs are there and it's paths also. but during training it is showing failed to load frames/audio.
Seeing from the issues, this is a common request... How do we download the video files using the .JSON file? How can we pre-process to extract the downloaded videos into the format :
data
├── audio
| ├── acoustic_guitar
│ | ├── M3dekVSwNjY.mp3
│ | ├── ...
│ ├── trumpet
│ | ├── STKXyBGSGyE.mp3
│ | ├── ...
│ ├── ...
|
└── frames
| ├── acoustic_guitar
│ | ├── M3dekVSwNjY.mp4
│ | | ├── 000001.jpg
│ | | ├── ...
│ | ├── ...
│ ├── trumpet
│ | ├── STKXyBGSGyE.mp4
│ | | ├── 000001.jpg
│ | | ├── ...
│ | ├── ...
│ ├── ...
Are there any scripts provided for these ? Thanks.
While downloading the duet videos, do we need to make a separate folder like xylophone flute
or do we need to put the same video in two separate folders xylophone
and flute
?
I was trying to evaluate on 16 videos using downloaded trained model but I am unable to see the results in visualization. Video1 and video2 have only 3 frames each with no audio and predicted audio are also silent.
I'm getting the following output after evaluation:
Loading weights for net_frame
Loading weights for net_synthesizer
samples: 6300
samples: 16
1 Epoch = 196 iters
Evaluating at 0 epochs...
[Eval] iter 0, loss: 0.0115
[Eval Summary] Epoch: 0, Loss: 0.0115, SDR_mixture: 0.0000, SDR: 0.0000, SIR: 0.0000, SAR: 0.0000
Plotting html for visualization...
Evaluation Done!
Hope I would get some help
Thanks
Hello, I am a Chinese student.
I have downloaded two solo videos(2P83WJXifEs and 3d1b4UH43-E)from 'val.csv' to evaluate the performance of the model. Finally, loss is 0.5479. The effect of each speech separation is very unsatisfactory. Why is that? hope to get your reply.
P.s. I have download the trained model weights for evaluation by:
> ./ scripts / download_trained_model.sh
and I Evaluate the trained model performance by:
> ./ scripts / eval_MUSIC.sh
Hello, I am a Chinese student.
I have pre-processed the dataset, and use the train_MUSIC.sh to train the default model.
But the result is not what I supposed. The metrics is all 0.
Even I directly use the eval_MUSIC.sh (I have downloaded the trained model), I also get the 0 metics(SDR ,SIR, .etc).
I don't change the code that you submit in github.
So how can I find what the problem is?
I evaluate the trained model performance by the trained model weights u provided.
I find that the trained model use the Mix-and-Seperate process and finally restruct the two audios by inputing two solo videos,. This is a validation part.
And how about the Test part about duet video?
I am interested in research on sound source localization and separation of natural duo videos.
Should I train the model from scratch? Or could I still use the trained model u provided?
Could u give me some suggestions please?
Thank u~ I'm looking forward to your reply.
Can I use this only 'Audio Separation'?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.