GithubHelp home page GithubHelp logo

Comments (4)

ZhengMengbin avatar ZhengMengbin commented on July 19, 2024 1

@duguiming111 作者采用的是mfcc20,也就是一个bin对应5帧图像(提取mfcc的时候窗口间隔为10ms,你可以查一下savemfcc.m中的opt.Ts参数的含义,所以4个mfcc对应的其实就是40ms,也就是一帧,作者固定视频帧率为25帧/s,所以每帧是40ms;故20个mfcc就对应5帧);然后作者其实是想用五帧中的中间帧来对应每一个bin,所以第一个bin对应视频中的第2帧(从0开始计数),当然提取代码肯定会把所有帧图像都保存下来,但实际训练过程中是第一个bin,也即2.bin去和第2帧,也即2.jpg形成pair输入网络,然后为了形成匹配对方便,作者才写成了2:end,这样生成的bin标号就从2开始了,而2:26应该是作者只取每个样本的前1秒(即25帧)来作为训练样本。

from talking-face-generation-davs.

Hangz-nju-cuhk avatar Hangz-nju-cuhk commented on July 19, 2024

Hi @duguiming111, the window size of audio is larger than the span of one frame. Specifically, one audio bin would correspond to 5 frames, so that the first audio bin would match the frame with index 2. The -600 is because the processed audio length is a bit longer than the video. This is a value that is designed by experience.

from talking-face-generation-davs.

duguiming111 avatar duguiming111 commented on July 19, 2024

@Hangz-nju-cuhk 你好,例如,那么我用一段5分钟语音,生成5000个bin文件,那么最终也会生成2-5001个图像,直接用这个图像和语音合成,嘴型就直接能对应上吗?

from talking-face-generation-davs.

lzkzls avatar lzkzls commented on July 19, 2024

@ZhengMengbin 你好,我最近在看这份代码,有个疑问想请教!在process256_224.py中,_mouth.txt_filenames.txt这些文件需要自己来生成吗?在下载的LRW数据集中没有这些文件夹。

image

from talking-face-generation-davs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.