prashanthatp / wav2mov Goto Github PK
View Code? Open in Web Editor NEWSpeech to Facial Animation using GANs
Home Page: https://wav2mov.vercel.app/
Speech to Facial Animation using GANs
Home Page: https://wav2mov.vercel.app/
Why check for length of FROM_DIR ,when we can assign the value in the first place
wav2mov/wav2mov/datasets/create_file_list.py
Lines 9 to 17 in 3061ed6
after performing adversarial backward passes for generator and identity discriminator,(or with sync discriminator),
function call for generator weight update is not being called.
Lines 193 to 216 in a8651bf
Should the mean and variance be calculated for entire sample or only across frames?
for example
if the mfccs of audio is of shape (1,7,13)
mean and variance shape
if calculated on entire sample : (1,1)
if calculated across time frames : (1,7)
Currently we are taking across entire sample
wav2mov/wav2mov/core/data/utils.py
Line 135 in 3a4b2bd
Sync loss was initially using swapped labels (0 for real and 1 for fake labels)
But after swapping back to normal , fake labels are still using the value 1
Lines 6 to 16 in 0fdef6b
Currently while computing mfccs , the audio has to be moved to cpu as librosa needs it in numpy form and which in turn needs it be in cpu,
Lines 100 to 106 in 30ccbd4
So why not make use of torchaudio and its capability of computing on gpu. I think it also supports batch wise computation
n_fft = 2048
win_length = None
hop_length = 512
n_mels = 128
n_mfcc = 14
sample_rate=16000
model =nn.Sequential(MFCC(sample_rate=sample_rate,
n_mfcc=n_mfcc,
melkwargs={
'n_fft': n_fft,
'n_mels': n_mels,
'hop_length': hop_length,
'mel_scale': 'htk',
}))
model.to(device)
audio = ...
mfccs = model(audio)
Sync loss is being calculated based on cosine similarity which is already between 0 and 1.
Best to use BCELoss
But have to avoid scaling gradients using mixed precision as the doc suggest to use BCELossWithLogits if scaling required
Lines 6 to 17 in 0fdef6b
Sync and sequence discriminator don't seem to learn
adding batching of videos results in memory error in both local and as well as in colab environments.
Probably because the reference images/still images are copied to match frame dimension before feeding into generator and identity discriminator.
Lines 112 to 126 in aea7f56
Lines 57 to 101 in 0afbf1a
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.