Hi, I have encountered an error when training.
I am trying to train the model using DIV2K dataset, DIV2K_train_HR
, DIV2K_train_LR_bicubic/X4
.
After python create_dataset.py
, I successfully generated the data. According to the LRHR_dataset.py
, I put data in the right place. When I start to train, it downloaded the pretrained model, and I got an error like this:
LogHandlers setup!
21-06-15 20:41:57.700 : ===================== Selected training parameters =====================
21-06-15 20:41:57.701 : Namespace(D_init_iters=0, D_update_ratio=1, alpha=1.2, amsgrad=False, beta1_D=0.9, beta1_G=0.9, beta2_D=0.999, beta2_G=0.999, cuda=True, eps_D=1e-08, eps_G=1e-08, feature_criterion='l1', feature_weight=1.0, gan_type='ragan', gan_weight=1.0, imdbTestPath='./datasets/', imdbTrainPath='./datasets/', in_nc=3, is_mixup=True, is_train=True, lr_D=0.0001, lr_G=0.0001, lr_gamma=0.5, lr_milestones=[5000, 10000, 20000, 30000], lr_restart=None, lr_restart_weights=None, nf=64, niter=51000, numWorkers=4, patch_size=40, pixel_criterion='l1', pixel_weight=10.0, pretrain=True, pretrainedModelPath='pretrained_nets/SRResDNet/G_perceptual.pth', resdnet_depth=5, resume=True, resume_start_epoch=0, rgb_range=255, saveBest=True, saveImgsPath='results', saveLogsPath='logs', saveTrainedModelsPath='trained_nets', save_checkpoint_freq=20, save_path_best_lpips='/best_lpips/', save_path_best_psnr='/best_psnr/', save_path_netD='/netD/', save_path_netG='/netG/', save_path_training_states='/training_states/', seed=123, testBatchSize=1, test_stdn=[0.0], trainBatchSize=16, train_stdn=[0.0], tv_criterion='l1', tv_weight=1.0, upscale_factor=4, use_bn=False, use_chop=False, use_filters=True, warmup_iter=-1, weightdecay_D=0, weightdecay_G=0).
21-06-15 20:41:57.701 : ===================== Loading dataset =====================
21-06-15 20:41:57.706 : training dataset: 2400
21-06-15 20:41:57.706 : training loaders: 150
21-06-15 20:41:57.707 : testing dataset: 100
21-06-15 20:41:57.707 : testing loaders: 100
21-06-15 20:41:57.707 : ===================== Building model =====================
21-06-15 20:41:57.803 : Initialized model with pretrained net from pretrained_nets/SRResDNet/G_perceptual.pth.
Setting up Perceptual loss...
Loading model from: /home/xuwh/RJPcode/SRResCGAN-master/training_codes/modules/weights/v0.1/alex.pth
...[net-lin [alex]] initialized
...Done
21-06-15 20:42:01.452 : Network G structure: SRResDNet, with parameters: 380,356
21-06-15 20:42:01.452 : SRResDNet(
(model): ResDNet(
(conv1): Conv2d(3, 64, kernel_size=(5, 5), stride=(1, 1))
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(relu1): PReLU(num_parameters=64)
(relu2): PReLU(num_parameters=64)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(relu1): PReLU(num_parameters=64)
(relu2): PReLU(num_parameters=64)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
)
(2): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(relu1): PReLU(num_parameters=64)
(relu2): PReLU(num_parameters=64)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
)
(3): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(relu1): PReLU(num_parameters=64)
(relu2): PReLU(num_parameters=64)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
)
(4): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(relu1): PReLU(num_parameters=64)
(relu2): PReLU(num_parameters=64)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
)
)
(conv_out): ConvTranspose2d(64, 3, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(l2proj): L2Proj()
)
(noise_estimator): Wmad_estimator()
(bbproj): Hardtanh(min_val=0.0, max_val=255.0)
)
21-06-15 20:42:01.453 : Network D structure: Discriminator_VGG_128, with parameters: 14,499,401
21-06-15 20:42:01.453 : Discriminator_VGG_128(
(conv0_0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv0_1): Conv2d(64, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(bn0_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1_0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1_0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1_1): Conv2d(128, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(bn1_1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2_0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2_0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2_1): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(bn2_1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3_0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn3_0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3_1): Conv2d(512, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(bn3_1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv4_0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn4_0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv4_1): Conv2d(512, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(bn4_1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(linear1): Linear(in_features=8192, out_features=100, bias=True)
(linear2): Linear(in_features=100, out_features=1, bias=True)
(lrelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
21-06-15 20:42:01.453 : Network F structure: VGGFeatureExtractor, with parameters: 20,024,384
21-06-15 20:42:01.453 : VGGFeatureExtractor(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): ReLU(inplace=True)
(18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(24): ReLU(inplace=True)
(25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(26): ReLU(inplace=True)
(27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(31): ReLU(inplace=True)
(32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(33): ReLU(inplace=True)
(34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
21-06-15 20:42:01.454 : ===================== start training =====================
21-06-15 20:42:01.454 : ===================== resume training =====================
21-06-15 20:42:01.454 : ===> No saved training states to resume.
21-06-15 20:42:01.454 : ===> start training from epoch: 0, iter: 0.
21-06-15 20:42:01.454 : Total # of epochs for training: 340.
21-06-15 20:42:01.454 : ===> train:: Epoch[1]
21-06-15 20:42:03.040 : ===> train:: Epoch[1] Iter-step[1]
Traceback (most recent call last):
File "main_sr_color.py", line 1057, in <module>
main()
File "main_sr_color.py", line 964, in main
current_step)
File "main_sr_color.py", line 418, in train
pred_g_fake = netD(filter_high(fake_H))
File "/home/xuwh/anaconda3/envs/srrescgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/xuwh/RJPcode/SRResCGAN-master/training_codes/models/discriminator_vgg_arch.py", line 57, in forward
fea = self.lrelu(self.linear1(fea))
File "/home/xuwh/anaconda3/envs/srrescgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/xuwh/anaconda3/envs/srrescgan/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/home/xuwh/anaconda3/envs/srrescgan/lib/python3.6/site-packages/torch/nn/functional.py", line 1370, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [16 x 12800], m2: [8192 x 100] at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/generic/THCTensorMathBlas.cu:290
I am new in deep learning. So really confused why does it happen and wonder how can I fix it?
Thanks in advance.