@primepake Hello, thanks for your nice work. I have encountered some difficulties in training on my own dataset (followed your data preparation suggestions) using your sharing code recently.
While I do python hq_wav2lip_train.py
, training log is:
use_cuda: True
Load 2687 audio feats.
use_cuda: True, MULTI_GPU: True
total trainable params 48520755
total DISC trainable params 18210561
Load checkpoint from: checkpoint_syn/checkpoint_step000171000.pth
Starting Epoch: 0
Saved checkpoint: checkpoint_step000000001.pth
Saved checkpoint: disc_checkpoint_step000000001.pth
L1: 0.2313518226146698, Sync: 0.0, Percep: 0.711134135723114 | Fake: 0.6754781603813171, Real: 0.711134135723114
L1: 0.21765484660863876, Sync: 0.0, Percep: 0.709110289812088 | Fake: 0.677438884973526, Real: 0.709110289812088
L1: 0.2188651313384374, Sync: 0.0, Percep: 0.7070923844973246 | Fake: 0.6794042587280273, Real: 0.7070923844973246
L1: 0.2144552432000637, Sync: 0.0, Percep: 0.7049891352653503 | Fake: 0.6814645230770111, Real: 0.7049891352653503
L1: 0.21138261258602142, Sync: 0.0, Percep: 0.7029658198356629 | Fake: 0.6834565877914429, Real: 0.702965784072876
L1: 0.20817621052265167, Sync: 0.0, Percep: 0.7010621925195059 | Fake: 0.6853393216927847, Real: 0.7010620832443237
L1: 0.20210737415722438, Sync: 0.0, Percep: 0.6996434501239231 | Fake: 0.6867433360644749, Real: 0.6996431180409023
L1: 0.19812600128352642, Sync: 0.0, Percep: 0.6987411752343178 | Fake: 0.6876341179013252, Real: 0.6987397372722626
L1: 0.19437309437327915, Sync: 0.0, Percep: 0.6981025603082445 | Fake: 0.6882637408044603, Real: 0.6980936461024814
...
L1: 0.12470868316135908, Sync: 0.0, Percep: 0.703860961136065 | Fake: 0.7219482924593122, Real: 0.69152611988952
L1: 0.12432418142755826, Sync: 0.0, Percep: 0.7042167019098997 | Fake: 0.7212010175765803, Real: 0.6919726772157446
L1: 0.1240154696033173, Sync: 0.0, Percep: 0.7046547470633516 | Fake: 0.7203877433827243, Real: 0.6924839418497868
L1: 0.12360538116523198, Sync: 0.0, Percep: 0.7050436321569948 | Fake: 0.7196274266711303, Real: 0.6929315188026521
L1: 0.12324049579675751, Sync: 0.0, Percep: 0.7055533739051434 | Fake: 0.7187670722904832, Real: 0.6934557074776173
Evaluating for 300 steps
L1: 0.08894559927284718, Sync: 7.633765455087026, Percep: 0.8102987110614777 | Fake: 0.5884036968151728, Real: 0.7851572235425314
L1: 0.12352411426603795, Sync: 0.0, Percep: 0.7059701490402222 | Fake: 0.7179982627183199, Real: 0.6938216637895676
L1: 0.12316158214713087, Sync: 0.0, Percep: 0.7069740667201505 | Fake: 0.7167386501879975, Real: 0.6947587119988258
L1: 0.12274648358716685, Sync: 0.0, Percep: 0.7086389707584008 | Fake: 0.7149891035960001, Real: 0.6961143705702852
L1: 0.122441600682666, Sync: 0.0, Percep: 0.7101659479650478 | Fake: 0.7133496456499239, Real: 0.6971959297446922
L1: 0.12237166498716061, Sync: 0.0, Percep: 0.7136020838068082 | Fake: 0.7105454275957667, Real: 0.6988249019290939
L1: 0.12226668624650865, Sync: 0.0, Percep: 0.7165913383165995 | Fake: 0.7080283074861481, Real: 0.6995955426850179
...
L1: 0.10978432702055822, Sync: 0.0, Percep: 0.7658024462613563 | Fake: 0.8152413980393286, Real: 0.6154363644432882
L1: 0.10972586760557995, Sync: 0.0, Percep: 0.7692448822380323 | Fake: 0.8124342786812339, Real: 0.6124296340577328
L1: 0.10953026241862897, Sync: 0.0, Percep: 0.7743397405519289 | Fake: 0.8092221762695191, Real: 0.610396875484025
L1: 0.10939421248741639, Sync: 0.0, Percep: 0.7824273446941963 | Fake: 0.8055855450166116, Real: 0.6093728619629893
L1: 0.1091919630309757, Sync: 0.0, Percep: 0.7908333182386574 | Fake: 0.8019456527194126, Real: 0.6076706493660488
L1: 0.10912049508790679, Sync: 0.0, Percep: 0.7942769879668473 | Fake: 0.7993013052296446, Real: 0.6045908347347031
L1: 0.10908047136182737, Sync: 0.0, Percep: 0.7925851992034527 | Fake: 0.8665490227044159, Real: 0.6015381057315442
L1: 0.10930284824053846, Sync: 0.0, Percep: 0.7917964988368816 | Fake: 0.8661491764380034, Real: 0.6038172583345233
Evaluating for 300 steps
L1: 0.5731244529287021, Sync: 9.090377567211787, Percep: 1.654488068819046 | Fake: 0.21219109917680423, Real: 1.4065884272257487
L1: 0.10955245170742273, Sync: 0.0, Percep: 0.9697534098973847 | Fake: 0.8618184333870491, Real: 0.707599309967045
L1: 0.10959959211782437, Sync: 0.0, Percep: 0.9730155654561438 | Fake: 0.8586212793076295, Real: 0.7107085656626347
L1: 0.1096739404936238, Sync: 0.0, Percep: 0.9733813980183366 | Fake: 0.8565126432325659, Real: 0.7121740724053752
L1: 0.10970133425566951, Sync: 0.0, Percep: 0.9722086560893506 | Fake: 0.8555087234860062, Real: 0.7122941125653687
L1: 0.10975395110161866, Sync: 0.0, Percep: 0.9709689952921027 | Fake: 0.8545878388406093, Real: 0.7123311641033997
L1: 0.10966527403854742, Sync: 0.0, Percep: 0.9697017977636012 | Fake: 0.8537138932196765, Real: 0.7123305465982057
...
L1: 0.10242784435932453, Sync: 0.0, Percep: 0.8304814692491738 | Fake: 10.244699917241833, Real: 0.6199986526972722
L1: 0.10239110164847111, Sync: 0.0, Percep: 0.8279339800796978 | Fake: 10.520022923630663, Real: 0.618096816339305
L1: 0.10235720289591985, Sync: 0.0, Percep: 0.8254020718837354 | Fake: 10.793661997258704, Real: 0.6162066120079922
L1: 0.10230096286480747, Sync: 0.0, Percep: 0.8228856021523825 | Fake: 11.065632539949988, Real: 0.6143279333128459
L1: 0.10232478230738712, Sync: 0.0, Percep: 0.8203844301093661 | Fake: 11.335949766272329, Real: 0.6124606751568799
Starting Epoch: 1
L1: 0.11442705243825912, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.09390852972865105, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.09028899172941844, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.08729504607617855, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.0875200405716896, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.08466161414980888, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.0851562459553991, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
...
L1: 0.08385955898174599, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.08394778782830518, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.08393304471088492, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.0836903992508139, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
Evaluating for 300 steps
L1: 0.062434629226724304, Sync: 7.282701448599497, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.08369095140779523, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.0834334861073229, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.08353378695167907, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
L1: 0.08348599619962074, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0
...
Is the training process normal? If not, could you please give me some suggestions? Need your help sincerely.