Hello, what are the steps to modify syncnet to get a successul train

Absolutely, I already posted it here: <a class="issue-link js-issue-link" data-error-t

it's basically based on syncnet training of wav2lip: <a href="https://github.com/Rudra

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Steps to modify and train syncnet successfully? about wav2lip_288x288 HOT 13 CLOSED

davidmartinrius commented on June 8, 2024

Steps to modify and train syncnet successfully?

from wav2lip_288x288.

Comments (13)

davidmartinrius commented on June 8, 2024

😀👋 Anyone that find out this?

from wav2lip_288x288.

davidmartinrius commented on June 8, 2024

I finally found a better alternative than wav2lip 😄

from wav2lip_288x288.

zhixiongzuo commented on June 8, 2024

I finally found a better alternative than wav2lip 😄

Could you share which one is better?

from wav2lip_288x288.

davidmartinrius commented on June 8, 2024

Absolutely, I already posted it here: #35 (comment)

from wav2lip_288x288.

zhixiongzuo commented on June 8, 2024

Thx， actually I had try these methods， but DINet use deepspeech for audio encode which isnot good at chinese audio. As for nerf based methods , they suffer from same audio encode problems as well as bad generation.

from wav2lip_288x288.

commented on June 8, 2024

DINet actually can run on any voice but you need to train from scratch, and modify a lot of things on syncnet training stage

from wav2lip_288x288.

zhixiongzuo commented on June 8, 2024

There seems like is no syncnet training scipt in DINet , I also saw your issue DINet/issues/28, will you share the syncnet training script ? Also any idea about modification of syncnet ?

from wav2lip_288x288.

commented on June 8, 2024

it's basically based on syncnet training of wav2lip: https://github.com/Rudrabha/Wav2Lip/blob/master/color_syncnet_train.py
syncnet architecture here: https://github.com/MRzzm/DINet/blob/master/models/Syncnet.py
the embedding size should be 128

from wav2lip_288x288.

zhixiongzuo commented on June 8, 2024

Very appreciated, I will have a try !

from wav2lip_288x288.

lcc157 commented on June 8, 2024

@primepake @zhixiongzuo Great! could you share the syncnet training script in DINet? Thanks

from wav2lip_288x288.

lcc157 commented on June 8, 2024

@primepake Is this correct？
`from models.Syncnet import SyncNetPerception,SyncNet
from config.config import DINetTrainingOptions
from sync_batchnorm import convert_model

from torch.utils.data import DataLoader
from dataset.dataset_DINet_clip import DINetDataset

from utils.training_utils import get_scheduler, update_learning_rate,GANLoss

import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import os
import torch.nn.functional as F

if name == "main":
'''
training code of SyncNet
in the resolution you want, using clip training code after frame training
'''
# load config
opt = DINetTrainingOptions().parse_args()
random.seed(opt.seed)
np.random.seed(opt.seed)
torch.cuda.manual_seed(opt.seed)
# init network

net_lipsync = SyncNet(15,29,128).cuda()

criterionMSE = nn.MSELoss().cuda()
# set scheduler
# set label of syncnet perception loss
real_tensor = torch.tensor(1.0).cuda()

# setup optimizer
optimizer_s = optim.Adam(net_lipsync.parameters(), lr=opt.lr_g)

# set scheduler
net_s_scheduler = get_scheduler(optimizer_s, opt.non_decay, opt.decay)

# load training data
train_data = DINetDataset(opt.train_data,opt.augment_num,opt.mouth_region_size)
training_data_loader = DataLoader(dataset=train_data,  batch_size=opt.batch_size, shuffle=True,drop_last=True,num_workers=4)
train_data_length = len(training_data_loader)

# start train
for epoch in range(opt.start_epoch, opt.non_decay+opt.decay+1):
    net_lipsync.train()
    for iteration, data in enumerate(training_data_loader):
        # forward
        optimizer_s.zero_grad()
        source_clip,source_clip_mask, reference_clip,deep_speech_clip,deep_speech_full = data
        source_clip = torch.cat(torch.split(source_clip, 1, dim=1), 0).squeeze(1).float().cuda()
        source_clip = torch.cat(torch.split(source_clip, opt.batch_size, dim=0), 1).cuda()
        deep_speech_full = deep_speech_full.float().cuda()

        ## sync perception loss
        source_clip_mouth = source_clip[:, :, train_data.radius:train_data.radius + train_data.mouth_region_size,
        train_data.radius_1_4:train_data.radius_1_4 + train_data.mouth_region_size]
        sync_score = net_lipsync(source_clip_mouth, deep_speech_full)
        loss_sync = criterionMSE(sync_score, real_tensor.expand_as(sync_score))

        loss_sync.backward()
        optimizer_s.step()

        print(
            "===> Epoch[{}]({}/{}):  Loss_Sync: {:.4f} lr_g = {:.7f} ".format(
                epoch, iteration, len(training_data_loader), float(loss_sync) ,
                optimizer_s.param_groups[0]['lr']))

    update_learning_rate(net_s_scheduler, optimizer_s)

    # checkpoint
    if epoch %  opt.checkpoint == 0:
        if not os.path.exists(opt.result_path):
            os.mkdir(opt.result_path)
        model_out_path = os.path.join(opt.result_path, 'netS_model_epoch_{}.pth'.format(epoch))
        states = {
            'epoch': epoch + 1,
            'state_dict': {'net_s': net_lipsync.state_dict()},
            'optimizer': {'net_s': optimizer_s.state_dict()}
        }
        torch.save(states, model_out_path)
        print("Checkpoint saved to {}".format(epoch))

from wav2lip_288x288.

commented on June 8, 2024

that's good but in my implementation I used BCE loss not MSE because it can be inefficent

from wav2lip_288x288.

Inferencer commented on June 8, 2024

that's good but in my implementation I used BCE loss not MSE because it can be inefficent

can you share please, I have a DINet focused group of people & could be usefull

from wav2lip_288x288.

Steps to modify and train syncnet successfully? about wav2lip_288x288 HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs