Comments (28)
you can refer collect_avspeech.py, but you have to do more
- download dataset by using collect_avspeech, you need to modify some code
- data cleaning, because avspeech have a lot of noise and breaking portion, using detect face
- convert to 25fps to train more effectively
from wav2lip_288x288.
Thanks @primepake I am new in Computer Vision, in 2nd step which you told which is data cleaning, Is there script in the repo for it?
from wav2lip_288x288.
and @primepake When I am running collect_avspeech.py, I am getting an error,
from wav2lip_288x288.
I also added clear_data.py, convert2fps.py
I need to modify some code, because I changed a lot in my code to adapt my dataset so you to understand and change it
from wav2lip_288x288.
Thanks @primepake , I need your help in understanding it,
Below is my directory structure:-
Now, When I apply clear_data.py with ```parser.add_argument('--data_root', help='dataset', default='/content/wav2lip_288x288/data_root/main/5535496873950688380/', type=str)
![image](https://user-images.githubusercontent.com/81796368/153756473-8fe68848-f25f-4db6-ad04-4900dabf5f3c.png)
from wav2lip_288x288.
How many hours have you used of AVspeech? How many GBs?
Did you sync corrected it? Or did you just downloaded and preprocessed it?
Thanks!
from wav2lip_288x288.
How many hours have you used of AVspeech? How many GBs? Did you sync corrected it? Or did you just downloaded and preprocessed it?
Thanks!
Sync correction is important for training sync net disc.
And, of course, sync correction is a part of preprocessing
from wav2lip_288x288.
@NikitaKononov , thanks for your answer.
Which tool do you use to check if a downloaded video is out of sync?
If a video is out of sync, how do you fix it?
Thanks!
from wav2lip_288x288.
@NikitaKononov , thanks for your answer.
Which tool do you use to check if a downloaded video is out of sync? If a video is out of sync, how do you fix it?
Thanks!
You are welcome.
I know only one tool for sync check - Syncnet python. It can provide offset between audio and video stream.
Then you can correct offset via ffmpeg using calculated offset.
I recommend correcting only videos with offset less or lower than 3 frames, other videos should be deleted from data
from wav2lip_288x288.
@NikitaKononov Do you have any script for it?
from wav2lip_288x288.
Can anybody please share scripts for downloading dataset and preprocessing it ?
I have been asking the detailed instructions for training, but not getting any response.
from wav2lip_288x288.
Can anybody please share scripts for downloading dataset and preprocessing it ? I have been asking the detailed instructions for training, but not getting any response.
You can write them yourself.
Use youtube-dl, S3 face detector and syncnet python
from wav2lip_288x288.
@NikitaKononov If you already have those scripts, would be a great help.
from wav2lip_288x288.
@NikitaKononov If you already have those scripts, would be a great help.
Sorry, can't share them because of NDA
Writing those scripts doesn't seem to be very hard
Get syncnet to convergention is much harder
from wav2lip_288x288.
you need process your dataset carefully, this is a big problem, GANs is hard to train
from wav2lip_288x288.
you need process your dataset carefully, this is a big problem, GANs is hard to train
I have succesfully trained wav2lip without GAN
My dataset is fine, 100k 5-10s video fragments with face in every frame, videos are in sync
While training GAN fake and real losses are dropping. But percep loss is going up
Sync eval loss is 6-9 during eval. I set syncnet_wt to 0.03 on 190k steps manualy, but it does not take effect. Precep loss keep increasing sync loss stays high ~7.5
On samples it seems that model is COPYING face part
Maybe you have some tips to try to get progress in GAN training? Thanks
from wav2lip_288x288.
the reason is discriminator doesn't recognize your prediction image is fake
from wav2lip_288x288.
the reason is discriminator doesn't recognize your prediction image is fake
So I should include syncnet in training process (set syncnet wt to 0.03) earlier?
from wav2lip_288x288.
the reason is discriminator doesn't recognize your prediction image is fake
L1: 0.0477145505386447, Sync: 7.50645101161843, Percep: 2.509135925100947 | Fake: 0.2972304174135471, Real: 0.3283990882806723, Global Step: 235224
from wav2lip_288x288.
I guess @primepake is going to release all the code with instructions soon.
@primepake when can we expect the complete code release ?
from wav2lip_288x288.
I guess @primepake is going to release all the code with instructions soon. @primepake when can we expect the complete code release ?
As I understand he doesn't plan to release any additional code / instructions.
from wav2lip_288x288.
Well, He mentioned he will : #21 (comment)
from wav2lip_288x288.
Well, He mentioned he will : #21 (comment)
Great news. Haven't seen those comments, sorry for wrong information
from wav2lip_288x288.
@NikitaKononov Can u please share how exactly did you use syncnet_python ?
from wav2lip_288x288.
@NikitaKononov When I am running syncnet python getting below error :
WARNING: Audio (3.6720s) and video (3.7200s) lengths are different. Traceback (most recent call last): File "run_syncnet.py", line 40, in <module> offset, conf, dist = s.evaluate(opt,videofile=fname) File "/home/ubuntu/wav2lip_288x288/syncnet_python/SyncNetInstance.py", line 112, in evaluate im_out = self.__S__.forward_lip(im_in.cuda()); File "/home/ubuntu/wav2lip_288x288/syncnet_python/SyncNetModel.py", line 108, in forward_lip out = self.netfclip(mid); File "/home/ubuntu/anaconda3/envs/wav2lip/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/wav2lip/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/ubuntu/anaconda3/envs/wav2lip/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/wav2lip/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x278528 and 512x512)
Do you know how to resolve this ?
from wav2lip_288x288.
@NikitaKononov , thanks for your answer.
Which tool do you use to check if a downloaded video is out of sync? If a video is out of sync, how do you fix it?
Thanks!You are welcome.
I know only one tool for sync check - Syncnet python. It can provide offset between audio and video stream. Then you can correct offset via ffmpeg using calculated offset. I recommend correcting only videos with offset less or lower than 3 frames, other videos should be deleted from data
did ffmpeg use this command?
ffmpeg -i input.mp4 -itsoffset 00:00:03.0 -i input.mp4 -vcodec copy -acodec copy -map 1:0 -map 0:1 output_shift3s.mp4
and the 00:00:03.0 is the offset?
from wav2lip_288x288.
@NikitaKononov , thanks for your answer.
Which tool do you use to check if a downloaded video is out of sync? If a video is out of sync, how do you fix it?
Thanks!You are welcome.
I know only one tool for sync check - Syncnet python. It can provide offset between audio and video stream. Then you can correct offset via ffmpeg using calculated offset. I recommend correcting only videos with offset less or lower than 3 frames, other videos should be deleted from data
Offset, confidence, min_distance=s. evaluate (opt, videofile=video)
Syncnet_python will output offset, confidence, and min_distance. Do you mean that we only need to pay attention to offset?
Then use ffmpeg to synchronize offset ∈ [-3, 3] to [-1, 1], and discard the remaining videos directly
Is this correct?
from wav2lip_288x288.
@ayush714 Hello, everyone. Where is the collect_avspeech.py file, Please help me, thank you very much
from wav2lip_288x288.
Related Issues (20)
- the input of lpips loss HOT 1
- High resolution dataset HOT 1
- Hi sir, I am a beginner and I would like to inquire whether I should prepare a video of no less than 288 or a video of 384
- Find friends who are training models and share ideas with them.Welcome HOT 3
- Train syncnet use SyncNet_color_384 but train wav2lip use SyncNet_color? HOT 1
- When I use hq_wav2lip_sam_train.py。 HOT 3
- DINet implementation HOT 1
- video clips length
- train_syncnet_sam.py is not using GPU (RTX 4090) HOT 1
- What indicator represents the end of training hq_wav2lip_sam_train? HOT 4
- Why my train loss after introducing sync loss? HOT 4
- How to train HOT 6
- Why can’t training start? HOT 1
- do inference
- Generated bottom half face always blur. HOT 2
- Training failed. The lip shape of a character cannot change according to changes in speech HOT 6
- Syncnet loss does not converge HOT 20
- dataset
- DINet HOT 1
- 这个和普通的easyw字幕交换网站lip有什么区别
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wav2lip_288x288.