Hi, thank you for your code, very nice work!
I am trying to reproduce the results for the joint-stream on Kinetics 400, but my results are a lot lower than for the pretrained model. I've only trained for 32 epochs, but the test loss and accuracy has not improved for the last 15 epochs. I only have access to 2 GPUs, but the parameters should be the default ones you recommend for that.
Can you see if there is something wrong with my setup?
Here is the ouput from the log.txt:
[ Tue Nov 10 15:18:07 2020 ] Model total number of params: 3144328
[ Tue Nov 10 15:18:07 2020 ] *************************************
[ Tue Nov 10 15:18:07 2020 ] *** Using Half Precision Training ***
[ Tue Nov 10 15:18:07 2020 ] *************************************
[ Tue Nov 10 15:18:07 2020 ] 2 GPUs available, using DataParallel
[ Tue Nov 10 15:18:07 2020 ] Parameters:
{'amp_opt_level': 1,
'assume_yes': False,
'base_lr': 0.05,
'batch_size': 32,
'checkpoint': None,
'config': './config/kinetics-skeleton/train_joint.yaml',
'debug': False,
'device': [0, 1],
'eval_interval': 1,
'eval_start': 1,
'feeder': 'feeders.feeder.Feeder',
'forward_batch_size': 32,
'half': True,
'ignore_weights': [],
'log_interval': 100,
'model': 'model.msg3d.Model',
'model_args': {'graph': 'graph.kinetics.AdjMatrixGraph',
'num_class': 400,
'num_g3d_scales': 8,
'num_gcn_scales': 8,
'num_person': 2,
'num_point': 18},
'model_saved_name': '',
'nesterov': True,
'num_epoch': 65,
'num_worker': 48,
'optimizer': 'SGD',
'optimizer_states': None,
'phase': 'train',
'print_log': True,
'save_interval': 1,
'save_score': False,
'seed': 89,
'show_topk': [1, 5],
'start_epoch': 0,
'step': [45, 55],
'test_batch_size': 32,
'test_feeder_args': {'data_path': './data/kinetics/val_data_joint.npy',
'label_path': './data/kinetics/val_label.pkl'},
'train_feeder_args': {'data_path': './data/kinetics/train_data_joint.npy',
'debug': False,
'label_path': './data/kinetics/train_label.pkl'},
'weight_decay': 0.0005,
'weights': None,
'work_dir': 'work_dir/kinetics/msg3d-joint'}
[ Tue Nov 10 15:18:07 2020 ] Model total number of params: 3144328
[ Tue Nov 10 15:18:07 2020 ] Training epoch: 1, LR: 0.0500
[ Tue Nov 10 20:24:06 2020 ] Mean training loss: 4.9719 (BS 32: 4.9719).
[ Tue Nov 10 20:24:06 2020 ] Time consumption: [Data]00%, [Network]98%
[ Tue Nov 10 20:24:06 2020 ] Eval epoch: 1
[ Tue Nov 10 20:27:22 2020 ] Mean test loss of 619 batches: 4.918049363211784.
[ Tue Nov 10 20:27:22 2020 ] Top 1: 6.34%
[ Tue Nov 10 20:27:23 2020 ] Top 5: 19.42%
[ Tue Nov 10 20:27:23 2020 ] Training epoch: 2, LR: 0.0500
[ Wed Nov 11 01:33:01 2020 ] Mean training loss: 4.5430 (BS 32: 4.5430).
[ Wed Nov 11 01:33:01 2020 ] Time consumption: [Data]00%, [Network]98%
[ Wed Nov 11 01:33:01 2020 ] Eval epoch: 2
[ Wed Nov 11 01:36:19 2020 ] Mean test loss of 619 batches: 4.900795953146668.
[ Wed Nov 11 01:36:19 2020 ] Top 1: 8.28%
[ Wed Nov 11 01:36:19 2020 ] Top 5: 22.56%
[ Wed Nov 11 01:36:20 2020 ] Training epoch: 3, LR: 0.0500
[ Wed Nov 11 06:09:39 2020 ] Mean training loss: 4.3545 (BS 32: 4.3545).
[ Wed Nov 11 06:09:39 2020 ] Time consumption: [Data]00%, [Network]98%
[ Wed Nov 11 06:09:39 2020 ] Eval epoch: 3
[ Wed Nov 11 06:12:58 2020 ] Mean test loss of 619 batches: 4.609359575203817.
[ Wed Nov 11 06:12:58 2020 ] Top 1: 10.86%
[ Wed Nov 11 06:12:59 2020 ] Top 5: 27.58%
[ Wed Nov 11 06:12:59 2020 ] Training epoch: 4, LR: 0.0500
[ Wed Nov 11 11:54:41 2020 ] Mean training loss: 4.2222 (BS 32: 4.2222).
[ Wed Nov 11 11:54:41 2020 ] Time consumption: [Data]00%, [Network]99%
[ Wed Nov 11 11:54:41 2020 ] Eval epoch: 4
[ Wed Nov 11 11:59:37 2020 ] Mean test loss of 619 batches: 4.774576034607525.
[ Wed Nov 11 11:59:38 2020 ] Top 1: 11.37%
[ Wed Nov 11 11:59:38 2020 ] Top 5: 28.21%
[ Wed Nov 11 11:59:39 2020 ] Training epoch: 5, LR: 0.0500
[ Wed Nov 11 18:11:53 2020 ] Mean training loss: 4.1349 (BS 32: 4.1349).
[ Wed Nov 11 18:11:53 2020 ] Time consumption: [Data]00%, [Network]99%
[ Wed Nov 11 18:11:53 2020 ] Eval epoch: 5
[ Wed Nov 11 18:15:15 2020 ] Mean test loss of 619 batches: 4.5376818095347415.
[ Wed Nov 11 18:15:15 2020 ] Top 1: 12.43%
[ Wed Nov 11 18:15:15 2020 ] Top 5: 31.07%
[ Wed Nov 11 18:15:16 2020 ] Training epoch: 6, LR: 0.0500
[ Wed Nov 11 23:00:44 2020 ] Mean training loss: 4.0797 (BS 32: 4.0797).
[ Wed Nov 11 23:00:44 2020 ] Time consumption: [Data]00%, [Network]98%
[ Wed Nov 11 23:00:44 2020 ] Eval epoch: 6
[ Wed Nov 11 23:04:07 2020 ] Mean test loss of 619 batches: 4.315563132574177.
[ Wed Nov 11 23:04:08 2020 ] Top 1: 14.59%
[ Wed Nov 11 23:04:08 2020 ] Top 5: 33.09%
[ Wed Nov 11 23:04:09 2020 ] Training epoch: 7, LR: 0.0500
[ Thu Nov 12 03:37:52 2020 ] Mean training loss: 4.0471 (BS 32: 4.0471).
[ Thu Nov 12 03:37:52 2020 ] Time consumption: [Data]00%, [Network]98%
[ Thu Nov 12 03:37:52 2020 ] Eval epoch: 7
[ Thu Nov 12 03:41:16 2020 ] Mean test loss of 619 batches: 4.347155949218208.
[ Thu Nov 12 03:41:17 2020 ] Top 1: 14.47%
[ Thu Nov 12 03:41:17 2020 ] Top 5: 33.24%
[ Thu Nov 12 03:41:17 2020 ] Training epoch: 8, LR: 0.0500
[ Thu Nov 12 08:15:11 2020 ] Mean training loss: 4.0206 (BS 32: 4.0206).
[ Thu Nov 12 08:15:11 2020 ] Time consumption: [Data]00%, [Network]98%
[ Thu Nov 12 08:15:12 2020 ] Eval epoch: 8
[ Thu Nov 12 08:18:37 2020 ] Mean test loss of 619 batches: 4.342459396322247.
[ Thu Nov 12 08:18:38 2020 ] Top 1: 14.43%
[ Thu Nov 12 08:18:38 2020 ] Top 5: 33.43%
[ Thu Nov 12 08:18:38 2020 ] Training epoch: 9, LR: 0.0500
[ Thu Nov 12 14:47:29 2020 ] Mean training loss: 3.9975 (BS 32: 3.9975).
[ Thu Nov 12 14:47:29 2020 ] Time consumption: [Data]00%, [Network]99%
[ Thu Nov 12 14:47:29 2020 ] Eval epoch: 9
[ Thu Nov 12 14:50:56 2020 ] Mean test loss of 619 batches: 4.374219649057588.
[ Thu Nov 12 14:50:57 2020 ] Top 1: 14.58%
[ Thu Nov 12 14:50:57 2020 ] Top 5: 33.36%
[ Thu Nov 12 14:50:57 2020 ] Training epoch: 10, LR: 0.0500
[ Thu Nov 12 19:25:03 2020 ] Mean training loss: 3.9830 (BS 32: 3.9830).
[ Thu Nov 12 19:25:03 2020 ] Time consumption: [Data]00%, [Network]98%
[ Thu Nov 12 19:25:03 2020 ] Eval epoch: 10
[ Thu Nov 12 19:28:31 2020 ] Mean test loss of 619 batches: 4.325896201110618.
[ Thu Nov 12 19:28:31 2020 ] Top 1: 15.21%
[ Thu Nov 12 19:28:32 2020 ] Top 5: 34.09%
[ Thu Nov 12 19:28:32 2020 ] Training epoch: 11, LR: 0.0500
[ Fri Nov 13 00:02:47 2020 ] Mean training loss: 3.9669 (BS 32: 3.9669).
[ Fri Nov 13 00:02:47 2020 ] Time consumption: [Data]00%, [Network]98%
[ Fri Nov 13 00:02:47 2020 ] Eval epoch: 11
[ Fri Nov 13 00:06:16 2020 ] Mean test loss of 619 batches: 4.51993297028426.
[ Fri Nov 13 00:06:17 2020 ] Top 1: 15.21%
[ Fri Nov 13 00:06:17 2020 ] Top 5: 34.06%
[ Fri Nov 13 00:06:18 2020 ] Training epoch: 12, LR: 0.0500
[ Fri Nov 13 04:40:40 2020 ] Mean training loss: 3.9534 (BS 32: 3.9534).
[ Fri Nov 13 04:40:40 2020 ] Time consumption: [Data]00%, [Network]98%
[ Fri Nov 13 04:40:41 2020 ] Eval epoch: 12
[ Fri Nov 13 04:44:12 2020 ] Mean test loss of 619 batches: 4.18243257872315.
[ Fri Nov 13 04:44:12 2020 ] Top 1: 16.76%
[ Fri Nov 13 04:44:13 2020 ] Top 5: 35.87%
[ Fri Nov 13 04:44:13 2020 ] Training epoch: 13, LR: 0.0500
[ Fri Nov 13 09:18:32 2020 ] Mean training loss: 3.9444 (BS 32: 3.9444).
[ Fri Nov 13 09:18:32 2020 ] Time consumption: [Data]00%, [Network]98%
[ Fri Nov 13 09:18:32 2020 ] Eval epoch: 13
[ Fri Nov 13 09:22:04 2020 ] Mean test loss of 619 batches: 4.246960545972784.
[ Fri Nov 13 09:22:05 2020 ] Top 1: 16.79%
[ Fri Nov 13 09:22:05 2020 ] Top 5: 35.90%
[ Fri Nov 13 09:22:05 2020 ] Training epoch: 14, LR: 0.0500
[ Fri Nov 13 14:22:37 2020 ] Mean training loss: 3.9370 (BS 32: 3.9370).
[ Fri Nov 13 14:22:37 2020 ] Time consumption: [Data]00%, [Network]98%
[ Fri Nov 13 14:22:37 2020 ] Eval epoch: 14
[ Fri Nov 13 14:26:56 2020 ] Mean test loss of 619 batches: 4.39393074909204.
[ Fri Nov 13 14:26:57 2020 ] Top 1: 14.21%
[ Fri Nov 13 14:26:57 2020 ] Top 5: 32.71%
[ Fri Nov 13 14:26:58 2020 ] Training epoch: 15, LR: 0.0500
[ Fri Nov 13 20:57:22 2020 ] Mean training loss: 3.9268 (BS 32: 3.9268).
[ Fri Nov 13 20:57:22 2020 ] Time consumption: [Data]00%, [Network]99%
[ Fri Nov 13 20:57:22 2020 ] Eval epoch: 15
[ Fri Nov 13 21:00:54 2020 ] Mean test loss of 619 batches: 4.3702820344964985.
[ Fri Nov 13 21:00:55 2020 ] Top 1: 16.00%
[ Fri Nov 13 21:00:55 2020 ] Top 5: 35.35%
[ Fri Nov 13 21:00:56 2020 ] Training epoch: 16, LR: 0.0500
[ Sat Nov 14 01:36:31 2020 ] Mean training loss: 3.9230 (BS 32: 3.9230).
[ Sat Nov 14 01:36:31 2020 ] Time consumption: [Data]00%, [Network]98%
[ Sat Nov 14 01:36:31 2020 ] Eval epoch: 16
[ Sat Nov 14 01:40:06 2020 ] Mean test loss of 619 batches: 4.2670101137269105.
[ Sat Nov 14 01:40:06 2020 ] Top 1: 16.45%
[ Sat Nov 14 01:40:07 2020 ] Top 5: 36.25%
[ Sat Nov 14 01:40:07 2020 ] Training epoch: 17, LR: 0.0500
[ Sat Nov 14 08:58:03 2020 ] Mean training loss: 3.9162 (BS 32: 3.9162).
[ Sat Nov 14 08:58:03 2020 ] Time consumption: [Data]00%, [Network]99%
[ Sat Nov 14 08:58:03 2020 ] Eval epoch: 17
[ Sat Nov 14 09:04:05 2020 ] Mean test loss of 619 batches: 4.154163993040464.
[ Sat Nov 14 09:04:06 2020 ] Top 1: 17.26%
[ Sat Nov 14 09:04:06 2020 ] Top 5: 36.72%
[ Sat Nov 14 09:04:06 2020 ] Training epoch: 18, LR: 0.0500
[ Sat Nov 14 14:08:22 2020 ] Mean training loss: 3.9094 (BS 32: 3.9094).
[ Sat Nov 14 14:08:22 2020 ] Time consumption: [Data]00%, [Network]98%
[ Sat Nov 14 14:08:22 2020 ] Eval epoch: 18
[ Sat Nov 14 14:11:58 2020 ] Mean test loss of 619 batches: 4.4448124141415795.
[ Sat Nov 14 14:11:59 2020 ] Top 1: 14.89%
[ Sat Nov 14 14:11:59 2020 ] Top 5: 33.62%
[ Sat Nov 14 14:12:00 2020 ] Training epoch: 19, LR: 0.0500
[ Sat Nov 14 18:47:13 2020 ] Mean training loss: 3.9050 (BS 32: 3.9050).
[ Sat Nov 14 18:47:13 2020 ] Time consumption: [Data]00%, [Network]98%
[ Sat Nov 14 18:47:13 2020 ] Eval epoch: 19
[ Sat Nov 14 18:50:51 2020 ] Mean test loss of 619 batches: 4.1810635703831.
[ Sat Nov 14 18:50:52 2020 ] Top 1: 16.64%
[ Sat Nov 14 18:50:52 2020 ] Top 5: 35.91%
[ Sat Nov 14 18:50:52 2020 ] Training epoch: 20, LR: 0.0500
[ Sat Nov 14 23:26:05 2020 ] Mean training loss: 3.9015 (BS 32: 3.9015).
[ Sat Nov 14 23:26:05 2020 ] Time consumption: [Data]00%, [Network]98%
[ Sat Nov 14 23:26:05 2020 ] Eval epoch: 20
[ Sat Nov 14 23:29:44 2020 ] Mean test loss of 619 batches: 4.292042032389726.
[ Sat Nov 14 23:29:44 2020 ] Top 1: 16.31%
[ Sat Nov 14 23:29:45 2020 ] Top 5: 36.05%
[ Sat Nov 14 23:29:45 2020 ] Training epoch: 21, LR: 0.0500
[ Sun Nov 15 04:04:46 2020 ] Mean training loss: 3.9022 (BS 32: 3.9022).
[ Sun Nov 15 04:04:46 2020 ] Time consumption: [Data]00%, [Network]98%
[ Sun Nov 15 04:04:46 2020 ] Eval epoch: 21
[ Sun Nov 15 04:08:26 2020 ] Mean test loss of 619 batches: 4.401749669446313.
[ Sun Nov 15 04:08:26 2020 ] Top 1: 14.86%
[ Sun Nov 15 04:08:26 2020 ] Top 5: 34.54%
[ Sun Nov 15 04:08:27 2020 ] Training epoch: 22, LR: 0.0500
[ Sun Nov 15 08:43:29 2020 ] Mean training loss: 3.8973 (BS 32: 3.8973).
[ Sun Nov 15 08:43:29 2020 ] Time consumption: [Data]00%, [Network]98%
[ Sun Nov 15 08:43:29 2020 ] Eval epoch: 22
[ Sun Nov 15 08:47:09 2020 ] Mean test loss of 619 batches: 4.267861750283804.
[ Sun Nov 15 08:47:10 2020 ] Top 1: 16.54%
[ Sun Nov 15 08:47:10 2020 ] Top 5: 35.90%
[ Sun Nov 15 08:47:10 2020 ] Training epoch: 23, LR: 0.0500
[ Sun Nov 15 13:22:30 2020 ] Mean training loss: 3.8934 (BS 32: 3.8934).
[ Sun Nov 15 13:22:30 2020 ] Time consumption: [Data]00%, [Network]98%
[ Sun Nov 15 13:22:30 2020 ] Eval epoch: 23
[ Sun Nov 15 13:26:12 2020 ] Mean test loss of 619 batches: 4.327233063770226.
[ Sun Nov 15 13:26:13 2020 ] Top 1: 15.61%
[ Sun Nov 15 13:26:13 2020 ] Top 5: 34.82%
[ Sun Nov 15 13:26:13 2020 ] Training epoch: 24, LR: 0.0500
[ Sun Nov 15 18:32:26 2020 ] Mean training loss: 3.8921 (BS 32: 3.8921).
[ Sun Nov 15 18:32:26 2020 ] Time consumption: [Data]00%, [Network]98%
[ Sun Nov 15 18:32:27 2020 ] Eval epoch: 24
[ Sun Nov 15 18:36:08 2020 ] Mean test loss of 619 batches: 4.113329430196512.
[ Sun Nov 15 18:36:09 2020 ] Top 1: 17.63%
[ Sun Nov 15 18:36:09 2020 ] Top 5: 36.88%
[ Sun Nov 15 18:36:10 2020 ] Training epoch: 25, LR: 0.0500
[ Sun Nov 15 23:11:30 2020 ] Mean training loss: 3.8915 (BS 32: 3.8915).
[ Sun Nov 15 23:11:30 2020 ] Time consumption: [Data]00%, [Network]98%
[ Sun Nov 15 23:11:31 2020 ] Eval epoch: 25
[ Sun Nov 15 23:15:14 2020 ] Mean test loss of 619 batches: 4.205234567645293.
[ Sun Nov 15 23:15:15 2020 ] Top 1: 16.82%
[ Sun Nov 15 23:15:15 2020 ] Top 5: 36.28%
[ Sun Nov 15 23:15:16 2020 ] Training epoch: 26, LR: 0.0500
[ Mon Nov 16 03:50:32 2020 ] Mean training loss: 3.8870 (BS 32: 3.8870).
[ Mon Nov 16 03:50:32 2020 ] Time consumption: [Data]00%, [Network]98%
[ Mon Nov 16 03:50:32 2020 ] Eval epoch: 26
[ Mon Nov 16 03:54:16 2020 ] Mean test loss of 619 batches: 4.226733705338831.
[ Mon Nov 16 03:54:17 2020 ] Top 1: 16.58%
[ Mon Nov 16 03:54:17 2020 ] Top 5: 35.97%
[ Mon Nov 16 03:54:18 2020 ] Training epoch: 27, LR: 0.0500
[ Mon Nov 16 08:29:39 2020 ] Mean training loss: 3.8883 (BS 32: 3.8883).
[ Mon Nov 16 08:29:39 2020 ] Time consumption: [Data]00%, [Network]98%
[ Mon Nov 16 08:29:39 2020 ] Eval epoch: 27
[ Mon Nov 16 08:33:25 2020 ] Mean test loss of 619 batches: 4.199135843887083.
[ Mon Nov 16 08:33:26 2020 ] Top 1: 16.34%
[ Mon Nov 16 08:33:26 2020 ] Top 5: 35.95%
[ Mon Nov 16 08:33:27 2020 ] Training epoch: 28, LR: 0.0500
[ Mon Nov 16 13:24:34 2020 ] Mean training loss: 3.8853 (BS 32: 3.8853).
[ Mon Nov 16 13:24:34 2020 ] Time consumption: [Data]00%, [Network]98%
[ Mon Nov 16 13:24:34 2020 ] Eval epoch: 28
[ Mon Nov 16 13:31:11 2020 ] Mean test loss of 619 batches: 4.273007657493259.
[ Mon Nov 16 13:31:12 2020 ] Top 1: 16.53%
[ Mon Nov 16 13:31:12 2020 ] Top 5: 36.67%
[ Mon Nov 16 13:31:13 2020 ] Training epoch: 29, LR: 0.0500
[ Mon Nov 16 19:23:28 2020 ] Mean training loss: 3.8837 (BS 32: 3.8837).
[ Mon Nov 16 19:23:28 2020 ] Time consumption: [Data]00%, [Network]98%
[ Mon Nov 16 19:23:28 2020 ] Eval epoch: 29
[ Mon Nov 16 19:27:16 2020 ] Mean test loss of 619 batches: 4.165285659337083.
[ Mon Nov 16 19:27:16 2020 ] Top 1: 17.00%
[ Mon Nov 16 19:27:17 2020 ] Top 5: 36.51%
[ Mon Nov 16 19:27:17 2020 ] Training epoch: 30, LR: 0.0500
[ Tue Nov 17 00:02:35 2020 ] Mean training loss: 3.8809 (BS 32: 3.8809).
[ Tue Nov 17 00:02:35 2020 ] Time consumption: [Data]00%, [Network]98%
[ Tue Nov 17 00:02:35 2020 ] Eval epoch: 30
[ Tue Nov 17 00:06:24 2020 ] Mean test loss of 619 batches: 4.481445889095498.
[ Tue Nov 17 00:06:25 2020 ] Top 1: 15.19%
[ Tue Nov 17 00:06:25 2020 ] Top 5: 35.08%
[ Tue Nov 17 00:06:25 2020 ] Training epoch: 31, LR: 0.0500
[ Tue Nov 17 04:41:32 2020 ] Mean training loss: 3.8807 (BS 32: 3.8807).
[ Tue Nov 17 04:41:32 2020 ] Time consumption: [Data]00%, [Network]98%
[ Tue Nov 17 04:41:32 2020 ] Eval epoch: 31
[ Tue Nov 17 04:45:23 2020 ] Mean test loss of 619 batches: 4.213557656246549.
[ Tue Nov 17 04:45:24 2020 ] Top 1: 17.22%
[ Tue Nov 17 04:45:24 2020 ] Top 5: 36.56%
[ Tue Nov 17 04:45:24 2020 ] Training epoch: 32, LR: 0.0500
[ Tue Nov 17 09:20:29 2020 ] Mean training loss: 3.8820 (BS 32: 3.8820).
[ Tue Nov 17 09:20:29 2020 ] Time consumption: [Data]00%, [Network]98%
[ Tue Nov 17 09:20:29 2020 ] Eval epoch: 32
[ Tue Nov 17 09:24:21 2020 ] Mean test loss of 619 batches: 4.307116228852403.
[ Tue Nov 17 09:24:22 2020 ] Top 1: 16.39%
[ Tue Nov 17 09:24:22 2020 ] Top 5: 35.38%