Comments (13)
😀👋 Anyone that find out this?
from wav2lip_288x288.
I finally found a better alternative than wav2lip 😄
from wav2lip_288x288.
I finally found a better alternative than wav2lip 😄
Could you share which one is better?
from wav2lip_288x288.
Absolutely, I already posted it here: #35 (comment)
from wav2lip_288x288.
Thx, actually I had try these methods, but DINet use deepspeech for audio encode which isnot good at chinese audio. As for nerf based methods , they suffer from same audio encode problems as well as bad generation.
from wav2lip_288x288.
DINet actually can run on any voice but you need to train from scratch, and modify a lot of things on syncnet training stage
from wav2lip_288x288.
There seems like is no syncnet training scipt in DINet , I also saw your issue DINet/issues/28, will you share the syncnet training script ? Also any idea about modification of syncnet ?
from wav2lip_288x288.
it's basically based on syncnet training of wav2lip: https://github.com/Rudrabha/Wav2Lip/blob/master/color_syncnet_train.py
syncnet architecture here: https://github.com/MRzzm/DINet/blob/master/models/Syncnet.py
the embedding size should be 128
from wav2lip_288x288.
Very appreciated, I will have a try !
from wav2lip_288x288.
@primepake @zhixiongzuo Great! could you share the syncnet training script in DINet? Thanks
from wav2lip_288x288.
@primepake Is this correct?
`from models.Syncnet import SyncNetPerception,SyncNet
from config.config import DINetTrainingOptions
from sync_batchnorm import convert_model
from torch.utils.data import DataLoader
from dataset.dataset_DINet_clip import DINetDataset
from utils.training_utils import get_scheduler, update_learning_rate,GANLoss
import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import os
import torch.nn.functional as F
if name == "main":
'''
training code of SyncNet
in the resolution you want, using clip training code after frame training
'''
# load config
opt = DINetTrainingOptions().parse_args()
random.seed(opt.seed)
np.random.seed(opt.seed)
torch.cuda.manual_seed(opt.seed)
# init network
net_lipsync = SyncNet(15,29,128).cuda()
criterionMSE = nn.MSELoss().cuda()
# set scheduler
# set label of syncnet perception loss
real_tensor = torch.tensor(1.0).cuda()
# setup optimizer
optimizer_s = optim.Adam(net_lipsync.parameters(), lr=opt.lr_g)
# set scheduler
net_s_scheduler = get_scheduler(optimizer_s, opt.non_decay, opt.decay)
# load training data
train_data = DINetDataset(opt.train_data,opt.augment_num,opt.mouth_region_size)
training_data_loader = DataLoader(dataset=train_data, batch_size=opt.batch_size, shuffle=True,drop_last=True,num_workers=4)
train_data_length = len(training_data_loader)
# start train
for epoch in range(opt.start_epoch, opt.non_decay+opt.decay+1):
net_lipsync.train()
for iteration, data in enumerate(training_data_loader):
# forward
optimizer_s.zero_grad()
source_clip,source_clip_mask, reference_clip,deep_speech_clip,deep_speech_full = data
source_clip = torch.cat(torch.split(source_clip, 1, dim=1), 0).squeeze(1).float().cuda()
source_clip = torch.cat(torch.split(source_clip, opt.batch_size, dim=0), 1).cuda()
deep_speech_full = deep_speech_full.float().cuda()
## sync perception loss
source_clip_mouth = source_clip[:, :, train_data.radius:train_data.radius + train_data.mouth_region_size,
train_data.radius_1_4:train_data.radius_1_4 + train_data.mouth_region_size]
sync_score = net_lipsync(source_clip_mouth, deep_speech_full)
loss_sync = criterionMSE(sync_score, real_tensor.expand_as(sync_score))
loss_sync.backward()
optimizer_s.step()
print(
"===> Epoch[{}]({}/{}): Loss_Sync: {:.4f} lr_g = {:.7f} ".format(
epoch, iteration, len(training_data_loader), float(loss_sync) ,
optimizer_s.param_groups[0]['lr']))
update_learning_rate(net_s_scheduler, optimizer_s)
# checkpoint
if epoch % opt.checkpoint == 0:
if not os.path.exists(opt.result_path):
os.mkdir(opt.result_path)
model_out_path = os.path.join(opt.result_path, 'netS_model_epoch_{}.pth'.format(epoch))
states = {
'epoch': epoch + 1,
'state_dict': {'net_s': net_lipsync.state_dict()},
'optimizer': {'net_s': optimizer_s.state_dict()}
}
torch.save(states, model_out_path)
print("Checkpoint saved to {}".format(epoch))
`
from wav2lip_288x288.
that's good but in my implementation I used BCE loss not MSE because it can be inefficent
from wav2lip_288x288.
that's good but in my implementation I used BCE loss not MSE because it can be inefficent
can you share please, I have a DINet focused group of people & could be usefull
from wav2lip_288x288.
Related Issues (20)
- the input of lpips loss HOT 1
- High resolution dataset HOT 1
- Hi sir, I am a beginner and I would like to inquire whether I should prepare a video of no less than 288 or a video of 384
- Find friends who are training models and share ideas with them.Welcome HOT 3
- Train syncnet use SyncNet_color_384 but train wav2lip use SyncNet_color? HOT 1
- When I use hq_wav2lip_sam_train.py。 HOT 3
- DINet implementation HOT 1
- video clips length
- train_syncnet_sam.py is not using GPU (RTX 4090) HOT 1
- What indicator represents the end of training hq_wav2lip_sam_train? HOT 4
- Why my train loss after introducing sync loss? HOT 4
- How to train HOT 6
- Why can’t training start? HOT 1
- do inference
- Generated bottom half face always blur. HOT 2
- Training failed. The lip shape of a character cannot change according to changes in speech HOT 6
- Syncnet loss does not converge HOT 20
- dataset
- DINet HOT 1
- 这个和普通的easyw字幕交换网站lip有什么区别
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wav2lip_288x288.