Comments (4)
@duguiming111 作者采用的是mfcc20,也就是一个bin对应5帧图像(提取mfcc的时候窗口间隔为10ms,你可以查一下savemfcc.m中的opt.Ts参数的含义,所以4个mfcc对应的其实就是40ms,也就是一帧,作者固定视频帧率为25帧/s,所以每帧是40ms;故20个mfcc就对应5帧);然后作者其实是想用五帧中的中间帧来对应每一个bin,所以第一个bin对应视频中的第2帧(从0开始计数),当然提取代码肯定会把所有帧图像都保存下来,但实际训练过程中是第一个bin,也即2.bin去和第2帧,也即2.jpg形成pair输入网络,然后为了形成匹配对方便,作者才写成了2:end,这样生成的bin标号就从2开始了,而2:26应该是作者只取每个样本的前1秒(即25帧)来作为训练样本。
from talking-face-generation-davs.
Hi @duguiming111, the window size of audio is larger than the span of one frame. Specifically, one audio bin would correspond to 5 frames, so that the first audio bin would match the frame with index 2. The -600 is because the processed audio length is a bit longer than the video. This is a value that is designed by experience.
from talking-face-generation-davs.
@Hangz-nju-cuhk 你好,例如,那么我用一段5分钟语音,生成5000个bin文件,那么最终也会生成2-5001个图像,直接用这个图像和语音合成,嘴型就直接能对应上吗?
from talking-face-generation-davs.
@ZhengMengbin 你好,我最近在看这份代码,有个疑问想请教!在process256_224.py中,_mouth.txt_filenames.txt这些文件需要自己来生成吗?在下载的LRW数据集中没有这些文件夹。
from talking-face-generation-davs.
Related Issues (20)
- Hello HOT 1
- No change HOT 5
- Run testing script result not as good as demo HOT 2
- result file HOT 2
- Pre-Processing Data HOT 5
- Code running without CUDA and pytorch version HOT 1
- What's the meaning of the parameter --test_audio_video_length? HOT 1
- Chinese characters are spoken faster than English words, will this model work on Chinese? HOT 4
- About preprocess HOT 3
- May I ask about post processing detail? HOT 1
- Problem with run model on CPU
- face_align HOT 1
- How to turn the output result of test_all into a video (image + audio) form
- How can I conviently convert audio files into .bin files
- 请教一下关于mouth.txt数据的处理过程
- Solution for ValueError: expected 2D or 3D input (got 4D input)
- pretrain checkpoing tarball looks like not one tar file
- question about computing contrastive loss
- Non-human Characters HOT 1
- During the reproduction process, No module named 'lws'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from talking-face-generation-davs.