Hello,
I am trying to train the model from scratch on a custom dataset.
When I run the command:
python train.py --gpus 2 --model enhance --name scratch --g_lr 0.0001 --d_lr 0.0004 --beta1 0.5 --gan_mode 'hinge' --lambda_pix 10 --lambda_fm 10 --lambda_ss 1000 --Dinput_nc 22 --D_num 3 --n_layers_D 4 --batch_size 1 --dataset ffhq --dataroot original_test/ --visual_freq 100 --print_freq 10
I get this error:
`
----------------- Options ---------------
D_num: 3
Dinput_nc: 22 [default: 3]
Dnorm: in
Gin_size: 512 [default: 512]
Gnorm: spade
Gout_size: 512 [default: 512]
Pimg_size: 512 [default: 512]
Pnorm: bn
batch_size: 1 [default: 16]
beta1: 0.5
checkpoints_dir: ./check_points
continue_train: False
crop_size: 256
d_lr: 0.0004
data_device: cuda:1 [default: None]
dataroot: original_test/ [default: None]
dataset_name: ffhq [default: single]
debug: False
device: cuda:0 [default: None]
epoch: latest
epoch_count: 1
g_lr: 0.0001
gan_mode: hinge
gpu_ids: [0, 1] [default: None]
gpus: 2 [default: 1]
init_gain: 0.02
init_type: normal
input_nc: 3
isTrain: True [default: None]
lambda_fm: 10.0
lambda_g: 1.0
lambda_pcp: 0.0
lambda_pix: 10.0
lambda_ss: 1000.0
load_iter: 0 [default: 0]
load_size: 512
lr: 0.0002
lr_decay_gamma: 1
lr_decay_iters: 50
lr_policy: step
max_dataset_size: inf
model: enhance
n_epochs: 100
n_epochs_decay: 100
n_layers_D: 4
name: scratch [default: experiment_name]
ndf: 64
ngf: 64
niter_decay: 100
no_flip: False
no_strict_load: False
num_threads: 8
output_nc: 3
parse_net_weight: ./pretrain_models/parse_multi_iter_90000.pth
phase: train
preprocess: none
print_freq: 10 [default: 100]
resume_epoch: 0
resume_iter: 0
save_by_iter: False
save_epoch_freq: 5
save_iter_freq: 5000
save_latest_freq: 500
seed: 123
serial_batches: False
suffix:
total_epochs: 50
verbose: False
visual_freq: 100 [default: 400]
----------------- End -------------------
dataset [FFHQDataset] was created
The number of training images = 2513
initialize network with normal
model [EnhanceModel] was created
---------- Networks initialized -------------
[Network G] Total number of parameters : 45.957 M
[Network D] Total number of parameters : 18.872 M
Start training from epoch: 00000; iter: 0000000
/usr/bin/nvidia-modprobe: unrecognized option: "-s"
ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help
for usage information.
/usr/bin/nvidia-modprobe: unrecognized option: "-s"
ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help
for usage information.
/usr/bin/nvidia-modprobe: unrecognized option: "-s"
ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help
for usage information.
/usr/bin/nvidia-modprobe: unrecognized option: "-s"
ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help
for usage information.
Traceback (most recent call last):
File "train.py", line 78, in
train(opt)
File "train.py", line 39, in train
model.forward(), timer.update_time('Forward')
File "/homes/placeholder/PSFR-GAN/models/enhance_model.py", line 93, in forward
self.real_D_results = self.netD(torch.cat((self.img_HR, self.hr_mask), dim=1), return_feat=True)
File "/homes/placeholder/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/homes/placeholder/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/homes/placeholder/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/homes/placeholder/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/homes/placeholder/miniconda3/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/homes/placeholder/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/homes/placeholder/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'input'
`
I am using torch==1.5.1
and torchvision==0.6.1
. Could you please help me?