GithubHelp home page GithubHelp logo

Comments (13)

davidmartinrius avatar davidmartinrius commented on June 8, 2024

😀👋 Anyone that find out this?

from wav2lip_288x288.

davidmartinrius avatar davidmartinrius commented on June 8, 2024

I finally found a better alternative than wav2lip 😄

from wav2lip_288x288.

zhixiongzuo avatar zhixiongzuo commented on June 8, 2024

I finally found a better alternative than wav2lip 😄

Could you share which one is better?

from wav2lip_288x288.

davidmartinrius avatar davidmartinrius commented on June 8, 2024

Absolutely, I already posted it here: #35 (comment)

from wav2lip_288x288.

zhixiongzuo avatar zhixiongzuo commented on June 8, 2024

Thx, actually I had try these methods, but DINet use deepspeech for audio encode which isnot good at chinese audio. As for nerf based methods , they suffer from same audio encode problems as well as bad generation.

from wav2lip_288x288.

 avatar commented on June 8, 2024

DINet actually can run on any voice but you need to train from scratch, and modify a lot of things on syncnet training stage

from wav2lip_288x288.

zhixiongzuo avatar zhixiongzuo commented on June 8, 2024

There seems like is no syncnet training scipt in DINet , I also saw your issue DINet/issues/28, will you share the syncnet training script ? Also any idea about modification of syncnet ?

from wav2lip_288x288.

 avatar commented on June 8, 2024

it's basically based on syncnet training of wav2lip: https://github.com/Rudrabha/Wav2Lip/blob/master/color_syncnet_train.py
syncnet architecture here: https://github.com/MRzzm/DINet/blob/master/models/Syncnet.py
the embedding size should be 128

from wav2lip_288x288.

zhixiongzuo avatar zhixiongzuo commented on June 8, 2024

Very appreciated, I will have a try !

from wav2lip_288x288.

lcc157 avatar lcc157 commented on June 8, 2024

@primepake @zhixiongzuo Great! could you share the syncnet training script in DINet? Thanks

from wav2lip_288x288.

lcc157 avatar lcc157 commented on June 8, 2024

@primepake Is this correct?
`from models.Syncnet import SyncNetPerception,SyncNet
from config.config import DINetTrainingOptions
from sync_batchnorm import convert_model

from torch.utils.data import DataLoader
from dataset.dataset_DINet_clip import DINetDataset

from utils.training_utils import get_scheduler, update_learning_rate,GANLoss

import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import os
import torch.nn.functional as F

if name == "main":
'''
training code of SyncNet
in the resolution you want, using clip training code after frame training
'''
# load config
opt = DINetTrainingOptions().parse_args()
random.seed(opt.seed)
np.random.seed(opt.seed)
torch.cuda.manual_seed(opt.seed)
# init network

net_lipsync = SyncNet(15,29,128).cuda()

criterionMSE = nn.MSELoss().cuda()
# set scheduler
# set label of syncnet perception loss
real_tensor = torch.tensor(1.0).cuda()

# setup optimizer
optimizer_s = optim.Adam(net_lipsync.parameters(), lr=opt.lr_g)

# set scheduler
net_s_scheduler = get_scheduler(optimizer_s, opt.non_decay, opt.decay)

# load training data
train_data = DINetDataset(opt.train_data,opt.augment_num,opt.mouth_region_size)
training_data_loader = DataLoader(dataset=train_data,  batch_size=opt.batch_size, shuffle=True,drop_last=True,num_workers=4)
train_data_length = len(training_data_loader)

# start train
for epoch in range(opt.start_epoch, opt.non_decay+opt.decay+1):
    net_lipsync.train()
    for iteration, data in enumerate(training_data_loader):
        # forward
        optimizer_s.zero_grad()
        source_clip,source_clip_mask, reference_clip,deep_speech_clip,deep_speech_full = data
        source_clip = torch.cat(torch.split(source_clip, 1, dim=1), 0).squeeze(1).float().cuda()
        source_clip = torch.cat(torch.split(source_clip, opt.batch_size, dim=0), 1).cuda()
        deep_speech_full = deep_speech_full.float().cuda()

        ## sync perception loss
        source_clip_mouth = source_clip[:, :, train_data.radius:train_data.radius + train_data.mouth_region_size,
        train_data.radius_1_4:train_data.radius_1_4 + train_data.mouth_region_size]
        sync_score = net_lipsync(source_clip_mouth, deep_speech_full)
        loss_sync = criterionMSE(sync_score, real_tensor.expand_as(sync_score))

        loss_sync.backward()
        optimizer_s.step()

        print(
            "===> Epoch[{}]({}/{}):  Loss_Sync: {:.4f} lr_g = {:.7f} ".format(
                epoch, iteration, len(training_data_loader), float(loss_sync) ,
                optimizer_s.param_groups[0]['lr']))

    update_learning_rate(net_s_scheduler, optimizer_s)

    # checkpoint
    if epoch %  opt.checkpoint == 0:
        if not os.path.exists(opt.result_path):
            os.mkdir(opt.result_path)
        model_out_path = os.path.join(opt.result_path, 'netS_model_epoch_{}.pth'.format(epoch))
        states = {
            'epoch': epoch + 1,
            'state_dict': {'net_s': net_lipsync.state_dict()},
            'optimizer': {'net_s': optimizer_s.state_dict()}
        }
        torch.save(states, model_out_path)
        print("Checkpoint saved to {}".format(epoch))

`

from wav2lip_288x288.

 avatar commented on June 8, 2024

that's good but in my implementation I used BCE loss not MSE because it can be inefficent

from wav2lip_288x288.

Inferencer avatar Inferencer commented on June 8, 2024

that's good but in my implementation I used BCE loss not MSE because it can be inefficent

can you share please, I have a DINet focused group of people & could be usefull

from wav2lip_288x288.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.