Light

语音特征提取的一些疑惑 about talking-face-generation-davs HOT 4 OPEN

hangz-nju-cuhk commented on July 19, 2024

语音特征提取的一些疑惑

from talking-face-generation-davs.

Comments (4)

ZhengMengbin commented on July 19, 2024 1

@duguiming111 作者采用的是mfcc20，也就是一个bin对应5帧图像（提取mfcc的时候窗口间隔为10ms，你可以查一下savemfcc.m中的opt.Ts参数的含义，所以4个mfcc对应的其实就是40ms，也就是一帧，作者固定视频帧率为25帧/s，所以每帧是40ms；故20个mfcc就对应5帧）；然后作者其实是想用五帧中的中间帧来对应每一个bin，所以第一个bin对应视频中的第2帧（从0开始计数），当然提取代码肯定会把所有帧图像都保存下来，但实际训练过程中是第一个bin，也即2.bin去和第2帧，也即2.jpg形成pair输入网络，然后为了形成匹配对方便，作者才写成了2:end，这样生成的bin标号就从2开始了，而2:26应该是作者只取每个样本的前1秒（即25帧）来作为训练样本。

from talking-face-generation-davs.

Hangz-nju-cuhk commented on July 19, 2024

Hi @duguiming111, the window size of audio is larger than the span of one frame. Specifically, one audio bin would correspond to 5 frames, so that the first audio bin would match the frame with index 2. The -600 is because the processed audio length is a bit longer than the video. This is a value that is designed by experience.

from talking-face-generation-davs.

duguiming111 commented on July 19, 2024

@Hangz-nju-cuhk 你好，例如，那么我用一段5分钟语音，生成5000个bin文件，那么最终也会生成2-5001个图像，直接用这个图像和语音合成，嘴型就直接能对应上吗？

from talking-face-generation-davs.

lzkzls commented on July 19, 2024

@ZhengMengbin 你好，我最近在看这份代码，有个疑问想请教！在process256_224.py中，_mouth.txt_filenames.txt这些文件需要自己来生成吗？在下载的LRW数据集中没有这些文件夹。

from talking-face-generation-davs.

Related Issues (20)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs