hideandspeak's People
hideandspeak's Issues
pretained model
I'm interested in your work and want to follow it. If you can provide pre-training models and audio sample(the link is valid) , I can directly test your samples. Thanks
STFT and exploding training
Hi Felix,
First of all thanks for sharing this repo, it's an amazingly interesting work. Since the datasets you used to my knowledge are paid and only at 16k and 8k, respectively, I wanted to train your network using the VCTK dataset. For training it, I only chose small files with high activity of the trimmed version of it to avoid long sections of silence. However, when I do so, the training goes well for a few epochs and the message becomes intelligible very fast, but then the loss (I am using L1) explodes to a value order of magnitudes higher as shown here. Have you observed this pattern during your trainings?
As a second question, I wanted to know what you used your own STFT implementation as opposed to the torchaudio ones that allow backprop.
Thanks!
about YOHO dataset
Hello, I'm a graduate student from China, I'm interested in your paper and would like to try to reproduce it, but I've searched for a long time on the internet but I can't find the YOHO dataset, can you provide me with this dataset? I promise to use it for private use only, thank you very much!
Inquiry about Message Retrieval in Time Domain and SNR Calculation
Hi Felix,
I hope this message finds you well. I came across your repository and found your paper implementation to be very interesting. I'm quite intrigued by your research and would like to follow your work more closely. In this regard, I have a question regarding the calculation of the Signal-to-Noise Ratio (SNR) mentioned in the paper.
While examining your code, I noticed that it provides two methods for waveform recovery: one using the original phase and another using the griff-lim algorithm. However, the paper itself does not explicitly mention how the message is retrieved from the spectrogram and returned to the time domain waveform. I have performed some experiments using the VCTK dataset with the default settings, and I found that using the original phase method for message waveform recovery resulted in a time domain SNR better than the one mentioned in the paper (14.34 vs 8.76). However, when I utilized the griff-lim algorithm for waveform recovery, the obtained time domain SNR was -2.53. Although the message remains intelligible, there is a significant difference between these two results and the one mentioned in the paper (8.76).
I would greatly appreciate it if you could provide some clarification regarding how the message is returned to the time domain waveform and how the SNR is calculated in the context of your paper. Any additional insights or guidance you can provide would be highly valuable to me.
Thank you for your time and consideration. I look forward to your response.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.