lornatang / srgan-pytorch Goto Github PK

View Code? Open in Web Editor NEW

390.0 6.0 103.0 2.92 MB

A simple and complete implementation of super-resolution paper.

License: Apache License 2.0

Python 97.87% Shell 2.13%

srgan-pytorch resolution aritificial-intelligence pytorch gan

srgan-pytorch's Issues

DIV2K数据集的格式是什么样的呢-

运行您的命令下载不了DIV2K训练集，如下图

请问训练集的目录文件格式应该是什么样的呢？

Looking for old checkpoints (v0.2.2)

I was using an old version of your repo SRGAN-Pytorch (v0.2.2) and the model urls from the generator are:

model_urls = {
"srgan_2x2": "https://github.com/Lornatang/SRGAN-PyTorch/releases/download/v0.2.2/SRGAN_2x2_ImageNet2012-3f1d605edcbfb83dc836668731cd6135b00ff62ea6f8633559fbb5dffe8413ba.pth",
"srgan": "https://github.com/Lornatang/SRGAN-PyTorch/releases/download/v0.2.2/SRGAN_ImageNet2012-158a3f9e70f45aef607e4146e29cde745e8d9a35972cb067f1ee00cb92254e02.pth",
"srgan_8x8": "https://github.com/Lornatang/SRGAN-PyTorch/releases/download/v0.2.2/SRGAN_8x8_ImageNet2012-c8207fead3ec73cdf6772fb60fef759833bae4a535eb8d3287aba470696219c1.pth"
}

These links are down, as you have updated the repo, however, the generator network I am using fails on loading the newest weights from the google drive (https://drive.google.com/drive/folders/1jS4psAFj8WrnTS9U470RhGdp2OAlvILW).

The releases (v0.2.2) from SRGAN-Pytorch were erased.

I am wondering if you still have the SRGAN_2x2_ImageNet2012-3f1d605edcbfb83dc836668731cd6135b00ff62ea6f8633559fbb5dffe8413ba.pth, SRGAN_ImageNet2012-158a3f9e70f45aef607e4146e29cde745e8d9a35972cb067f1ee00cb92254e02.pth or SRGAN_8x8_ImageNet2012-c8207fead3ec73cdf6772fb60fef759833bae4a535eb8d3287aba470696219c1.pth checkpoints saved somewhere.

Multi-GPU training on local device

Hi Lornatang,
How can I train on 2 GPUs on my local device. For example, device 3 and 4 only ? When I use os.environ["CUDA_VISIBLE_DEVICES"]="3,4" it gives me an error.

"RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!"

Please help.

Thanks!

Tiny little comma in config.py at the end of the file

Hi,

Not much but there is a random comma at the end of the config.py, on line 113.

Weights link doesn't contain weights, it contains Set5 image dataset!

Weights link doesn't contain weights, it contains Set5 image dataset! Please upload the trained weights and provide the correct link. :)

Question about discriminator loss

Hi there, thank you for your dedicated work
However, I found that while training srgan, in discriminator, you said: # The real sample label is 1, and the generated sample label is 0. Then you used d_loss = (d_loss_real + d_loss_fake) / 2. Can you please kindly explain me this loss and the motivation behind setting real sample and generated sample? I may be blind but tbh I did not see it in the paper.

Thank you, best regards

Different patch size

Hi, I'm wondering if you experimented with different patch sizes besides the 96px one. This one works fine, but if I try anything else it will throw errors.

If the patch size is smaller than 96 (this example is 28) then I usually get:
** On entry to SGEMM parameter number 10 had an illegal value Traceback (most recent call last): File "train.py", line 283, in <module> main() File "train.py", line 175, in main train_gan(epoch) File "train.py", line 246, in train_gan d_loss_real = adv_criterion(netD(target), real_label) File "/home/calexand/env/srpy/lib/python3.8/site-packages/torch/nn/modules/module.py", li$ return forward_call(*input, **kwargs) File "/home/calexand/defTest/srpy/srgan_pytorch/model.py", line 124, in forward out = self.classifier(out) File "/home/calexand/env/srpy/lib/python3.8/site-packages/torch/nn/modules/module.py", li$ return forward_call(*input, **kwargs) File "/home/calexand/env/srpy/lib/python3.8/site-packages/torch/nn/modules/container.py",$ input = module(input) File "/home/calexand/env/srpy/lib/python3.8/site-packages/torch/nn/modules/module.py", li$ return forward_call(*input, **kwargs) File "/home/calexand/env/srpy/lib/python3.8/site-packages/torch/nn/modules/linear.py", li$ return F.linear(input, self.weight, self.bias) File "/home/calexand/env/srpy/lib/python3.8/site-packages/torch/nn/functional.py", line 1$ return torch._C._nn.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, ople, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

If the patch is bigger, for example 192:
Traceback (most recent call last): File "train.py", line 418, in <module> main() File "train.py", line 215, in main allLossD,allLossG = train_gan(epoch) File "train.py", line 315, in train_gan d_loss.backward() File "/home/calexand/env/srpy/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/calexand/env/srpy/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward Variable._execution_engine.run_backward( RuntimeError: Function AddmmBackward returned an invalid gradient at index 1 - got [32, 18432] but expected shape compatible with [32, 73728]

Error while training data

I am trying to train data using one of the specified dataset (Div2k_valid_HR).
I have unzipped the Div2k dataset in the data folder as mentioned in the readme.md including train and val folders as it was mentioned i the training example. There were a number of errors including a error on non-extent test folder in data folder, so I created one.
I got his error.

No such file or directory: 'data/test'

Thanks for creating this.

I was trying this for the first time today.

I ran

cd data/
bash download_dataset.sh

then

cd ..
python3 test_benchmark.py -a srgan --pretrained --gpu 0 data

I get
FileNotFoundError: [Errno 2] No such file or directory: 'data/test'

I'm unsure on what should go into the test folder

can't find RRDBNet weight

Hi,sorry to bother you, but I find the RRDBNet weight is not in the weight set

Scheduler

I don't understand why using define optimizer in line 51 train_srgan.py. Can you help me, thank you!

Pre-trained Discriminator model cannot load

Pre-trained Discriminator model cannot load when I try to continue training with my data.
Error:
Load dataset successfully.
Build SRGAN model successfully.
Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /root/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth
100% 548M/548M [00:03<00:00, 156MB/s]
Define all loss functions successfully.
Define all optimizer functions successfully.
Define all optimizer scheduler functions successfully.
Loading SRResNet model weights
Loaded SRResNet model weights.
Check whether the pretrained discriminator model is restored...
Traceback (most recent call last):
File "train_srgan.py", line 512, in
main()
File "train_srgan.py", line 76, in main
d_optimizer.load_state_dict(checkpoint["optimizer"])
File "/usr/local/lib/python3.7/dist-packages/torch/optim/optimizer.py", line 146, in load_state_dict
raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

Visualizing

Hi,
would you provide code for how to visualizing the network result in order to compare the images?
Thanks in advance.

[ WARNING ] NaN or Inf found in input tensor.

I got these output，why？please help me！ myGPU ： 2080Ti driver 11.2
[ WARNING ] You have chosen a specific GPU. This will completely disable data parallelism.
[ INFO ] Use GPU: 0 for training.
[ INFO ] Creating model srgan_2x2.
[ INFO ] Losses function information:
Pixel: MSELoss
Content: VGG19_36th
Adversarial: BCELoss
[ INFO ] Optimizer information:
PSNR learning rate: 0.0001
Discriminator learning rate: 0.0001
Generator learning rate: 0.0001
PSNR optimizer: Adam, [betas=(0.9,0.999)]
Discriminator optimizer: Adam, [betas=(0.9,0.999)]
Generator optimizer: Adam, [betas=(0.9,0.999)]
PSNR scheduler: None
Discriminator scheduler: StepLR, [step_size=self.gan_epochs // 2, gamma=0.1]
Generator scheduler: StepLR, [step_size=self.gan_epochs // 2, gamma=0.1]
[ INFO ] Load training dataset
[ INFO ] Dataset information:
Train Path: /home/zhang/code-space/data/train
Test Path: /home/zhang/code-space/data/test
Number of train samples: 800
Number of test samples: 100
Number of train batches: 50
Number of test batches: 7
Shuffle of train: True
Shuffle of test: False
Sampler of train: False
Sampler of test: None
Workers of train: 4
Workers of test: 4
[ INFO ] Turn on mixed precision training.
[ INFO ] Train information:
PSNR-oral epochs: 500
GAN-oral epochs: 500
Epoch: [0][ 0/50] Time 4.5959 (4.5959) Loss 0.405921 (0.405921)
Epoch: [0][ 5/50] Time 0.0693 (1.0562) Loss 0.087571 (0.207982)
Epoch: [0][10/50] Time 0.0969 (0.8460) Loss 0.068509 (0.141885)
Epoch: [0][15/50] Time 0.0597 (0.7817) Loss 0.040221 (0.114032)
Epoch: [0][20/50] Time 2.5262 (0.8705) Loss 0.027806 (0.094019)
Epoch: [0][25/50] Time 0.0585 (0.7959) Loss 0.032152 (0.081912)
Epoch: [0][30/50] Time 0.0532 (0.7511) Loss 0.016601 (0.072890)
Epoch: [0][35/50] Time 0.0593 (0.7287) Loss 0.029243 (0.066070)
Epoch: [0][40/50] Time 3.0444 (0.7835) Loss 0.024154 (0.060616)
Epoch: [0][45/50] Time 0.0654 (0.7643) Loss 0.021094 (0.056146)
PSNR: nan SSIM: nan LPIPS: 0.5387 GMSD: nan: 100%|███████████████████████████████████████████| 7/7 [00:05<00:00, 1.33it/s]
[ WARNING ] NaN or Inf found in input tensor.
[ WARNING ] NaN or Inf found in input tensor.
[ WARNING ] NaN or Inf found in input tensor.
[ WARNING ] NaN or Inf found in input tensor.
Epoch: [1][ 0/50] Time 3.3172 (3.3172) Loss nan (nan)
[ WARNING ] NaN or Inf found in input tensor.
[ WARNING ] NaN or Inf found in input tensor.
Epoch: [1][ 5/50] Time 0.0581 (1.0160) Loss 0.020317 (nan)
[ WARNING ] NaN or Inf found in input tensor.
[ WARNING ] NaN or Inf found in input tensor.
[ WARNING ] NaN or Inf found in input tensor.
Epoch: [1][10/50] Time 0.5341 (0.9001) Loss nan (nan)

There is a bug on loss.py

During the execution of GAN epochs:

Traceback (most recent call last):
File "/content/drive/MyDrive/SRGAN-PyTorch-master/train.py", line 590, in
main()
File "/content/drive/MyDrive/SRGAN-PyTorch-master/train.py", line 154, in main
main_worker(args.gpu, ngpus_per_node, args)
File "/content/drive/MyDrive/SRGAN-PyTorch-master/train.py", line 392, in main_worker
args=args)
File "/content/drive/MyDrive/SRGAN-PyTorch-master/train.py", line 544, in train_gan
content_loss = content_criterion(sr, hr.detach())
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/MyDrive/SRGAN-PyTorch-master/srgan_pytorch/loss.py", line 156, in forward
source = (source - self.mean) / self.std
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Any advice?

关于训练集的问题

请问这个Demo的DIV2K训练集只需用到HR图像吗？因为您给的下载链接里只有HR

Broken Weights Link

Link on overview page for downloading weights is broken

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient,

When the code run at 'scaler.scale(d_loss).backward()' , I got an error："one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)."

It has been bothering me for many days, can someone help me?

SSIM calculation error

Hi,
Would you please provide code for SSIM calculation error during the training?

Thanks in advance.

FileNotFoundError: [Errno 2] No such file or directory: 'DIR/test'

Facing this issue while running the following command- python3 test_benchmark.py -a srgan --pretrained --gpu 0 DIR

FileNotFoundError: [Errno 2] No such file or directory: 'DIR/test'

upscale_factor has no effect

I changed upscale_factor = 2 in config.py
I run inferernce.py and results are upscaled by 4.
How can I pass in an upscale_factor of 2 in inference.py?

Pre-trained model dataset

Hey!
I was playing around with your solution and was positively surprised that there is a pre-trained model included. I set everything up and tried to verify your results for Set5. I used the test.py. Sadly the results are rather mediocre:
0.png:
MSE 0.0209
RMSE 0.1445
PSNR 16.80
SSIM 0.6989
LPIPS 0.3019
GMSD 0.1458

2.png:
MSE 0.0020
RMSE 0.0444
PSNR 27.05
SSIM 0.7365
LPIPS 0.1433
GMSD 0.0778

3.png
MSE 0.0058
RMSE 0.0765
PSNR 22.33
SSIM 0.8100
LPIPS 0.1888
GMSD 0.0901

4.png:
MSE 0.0058
RMSE 0.0763
PSNR 22.35
SSIM 0.4969
LPIPS 0.2875
GMSD 0.0887

Image 1 gave an error.

What dataset is the pre-trained model trained with? Do you have any idea why I got so bad results?

Anyway, cool model and thanks for your work:)

TypeError: Can't mix strings and bytes in path components

I am trying to do inference on one of the default dataset images suggested in the Readme.md


`user#:/home/ubuntu/SRGAN-PyTorch# python test_image.py --arch srgan --lr /home/ubuntu/SRGAN-PyTorch/data/train/0001.png --hr /home/ubuntu/SRGAN-PyTorch/data/0001_HR.png --model-path /home/ubuntu/SRGAN-PyTorch/weights/lpips_vgg.pth --pretrained --gpu 0
[ WARNING ] Directory `/home/ubuntu/SRGAN-PyTorch/tests` already exists!
[ INFO ] TestEngine:
[ INFO ] 	API version .......... 0.3.0
[ INFO ] 	Build ................ 2021.06.13
[ INFO ] Using pre-trained model `srgan`.
Traceback (most recent call last):
  File "test_image.py", line 150, in <module>
    main(args)
  File "test_image.py", line 59, in main
    model = configure(args)
  File "/home/ubuntu/SRGAN-PyTorch/srgan_pytorch/utils/common.py", line 46, in configure
    model = models.__dict__[args.arch](pretrained=True)
  File "/home/ubuntu/SRGAN-PyTorch/srgan_pytorch/models/generator.py", line 104, in srgan
    return _gan("srgan", pretrained, progress)
  File "/home/ubuntu/SRGAN-PyTorch/srgan_pytorch/models/generator.py", line 92, in _gan
    state_dict = load_state_dict_from_url(model_urls[arch], progress=progress, map_location=torch.device("cpu"))
  File "/opt/conda/lib/python3.8/site-packages/torch/hub.py", line 517, in load_state_dict_from_url
    cached_file = os.path.join(model_dir, filename)
  File "/opt/conda/lib/python3.8/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/opt/conda/lib/python3.8/genericpath.py", line 155, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components

facing similar problem while test_benchmark as well.

Pretrained weight not working for upscaling of x2

When I tried to run the test_image.py with --upscale-factor 2, I got an error pointing
images = torch.cat([bicubic, sr], dim=-1)
RuntimeError: Sizes of tensors must match except in dimension 3. Got 256 and 512 in dimension 2 (The offending index is 1)
Could you please suggest a solution to solve this.

About the mixed precision calculation of multiple optimizers

Hello!

I noticed that you have recently removed the gradient clipping from the Automatic Mixed Precision, I would really like to ask you why you made this change, is it because the gradient clipping is not suitable for this case?

Thank you very much!

Training dataset structure

Hi!

How should I structure my training dataset in order for this implementation to work?
Also, what does IMAGE_SIZE refer to? Why the default is 96?

Thank you

Question to Training routine for adversarial loss

Hi,

I have a question regarding the training of the GAN and namely I wonder why you don't call the optimizer from the generator as well as from the discriminator with zero_grad() at the beginning of the for loop and only call discriminator.zero_grad() and generator.zero_grad() respectively ? Unfortunately I'm still quite new to Pytorch and have always used Tensorflow before and could imagine that it amounts to the same thing ?

Further, I also wonder why, for example, you don't do something like G.train(False) and D.train(True) when updating the discriminator and vice versa ?

I hope you can answer my questions and I remain with kind regards
Niklas

requirements.txt is missing tensorboard and setuptools

Hi,

Correct me if I am wrong, but I needed to install tensorboard version 2.8.0 and setuptools 59.5.0
Not a big deal, but it was not in the requirements.txt.
I might be wrong with that, just wanted to ask

Train with SRGAN Generator got a big size model weights

HI,thanks for you job ! When i train SRGAN Generator model in my environment(pytorch 1.7 torchvision 0.8 )，and it work,but got a big size model weights about 18.7M.But I found your SRGAN Generator model weights Download in https://drive.google.com/drive/folders/1A6lzGeQrFMxPqJehK9s37ce-tPDj20mD which name is "SRGAN_x4-ImageNet-c71a4860.pth.tar" only got 5.98M.i found those two mode weights get the same backbone . How can i got the same size of model weights?

Another: file train_srgan.py line 51
"d_scheduler, g_scheduler = define_scheduler(discriminator, generator)"
should be "d_scheduler, g_scheduler = define_scheduler(d_optimizer, g_optimizer)"??
look forward to your answer ！！

Handling grayscale dataset

Hi, when I was trying to train grayscale tiff images I get RuntimeError: Given groups=1, weight of size [64, 1, 9, 9], expected input[16, 3, 48, 48] to have 1 channels, but got 3 channels instead.

I changed first Conv2d input channel 3 to 1 but still the same. Can you help?

lpips_vgg weight

Hello everyone :)
thanks a lot for the repo! Training the PSNR part worked well. However, when training the GAN I had problems, because of a missing lpips_vgg.pth file. Maybe I oversee something, but I can't find it.
Would be nice if you could help me out here!
Best,
Jenny

Learning rate scheduler is working incorrect

While training SRGAN model, learning rate scheduler is used in train.py with parameters step_size=epochs // 2 and gamma=0.1.
From this I suppose schedulerD.step() and schedulerG.step() to be called at each epoch.
However, they are called at each training step which means that learning rate is quickly becoming zero after a few training steps.

The SRGAN does not seem to train.
But when I put schedulerD.step() and schedulerG.step() after the epoch end, it started showing nice results.

"create_dataset_for_kernelGAN.py"

When following the documentation in https://github.com/Lornatang/SRGAN-PyTorch/tree/master/data the command
"create_dataset_for_kernelGAN.py" throws an error:

                File "create_dataset_for_kernelGAN.py", line 46
lr_dir = f"./{args.upscale_factor}x/input"
                                         ^

PSNR falls drastically during adversarial training

The PSNR improves during generator training, but drops drastically during adversarial training.

Train Epoch[0045/0046](00010/00015) Loss: 0.007902.
Train Epoch[0045/0046](00015/00015) Loss: 0.006159.
Valid stage: generator Epoch[0045] avg PSNR: 19.95.

Train Epoch[0046/0046](00010/00015) Loss: 0.006377.
Train Epoch[0046/0046](00015/00015) Loss: 0.008251.
Valid stage: generator Epoch[0046] avg PSNR: 19.99.

Train stage: adversarial Epoch[0001/0010](00010/00015) D Loss: 0.139652 G Loss: 0.598175 D(HR): 0.990013 D(SR1)/D(SR2): 0.112813/0.022210.
Train stage: adversarial Epoch[0001/0010](00015/00015) D Loss: 0.002624 G Loss: 0.810450 D(HR): 0.998733 D(SR1)/D(SR2): 0.001354/0.000455.
Valid stage: adversarial Epoch[0001] avg PSNR: 9.15.

Train stage: adversarial Epoch[0002/0010](00010/00015) D Loss: 0.002039 G Loss: 0.589604 D(HR): 0.998040 D(SR1)/D(SR2): 0.000008/0.000007.
Train stage: adversarial Epoch[0002/0010](00015/00015) D Loss: 0.001770 G Loss: 0.579492 D(HR): 0.998254 D(SR1)/D(SR2): 0.000018/0.000017.
Valid stage: adversarial Epoch[0002] avg PSNR: 8.84.

Train stage: adversarial Epoch[0003/0010](00010/00015) D Loss: 0.001410 G Loss: 0.456838 D(HR): 0.999054 D(SR1)/D(SR2): 0.000449/0.000344.
Train stage: adversarial Epoch[0003/0010](00015/00015) D Loss: 0.000123 G Loss: 0.389203 D(HR): 0.999966 D(SR1)/D(SR2): 0.000089/0.000067.
Valid stage: adversarial Epoch[0003] avg PSNR: 8.22.

Train stage: adversarial Epoch[0004/0010](00010/00015) D Loss: 0.023198 G Loss: 0.501722 D(HR): 0.999708 D(SR1)/D(SR2): 0.016052/0.000103.
Train stage: adversarial Epoch[0004/0010](00015/00015) D Loss: 0.006275 G Loss: 0.574956 D(HR): 0.993783 D(SR1)/D(SR2): 0.000000/0.000000.
Valid stage: adversarial Epoch[0004] avg PSNR: 8.21.

Validation Error in loading state_dict for Generator

Hi. I got this error message while validating on Colab:

RuntimeError: Error(s) in loading state_dict for Generator:
	Missing key(s) in state_dict: "conv_block1.0.weight", "conv_block1.0.bias", "conv_block1.1.weight", "trunk.0.rcb.0.weight", "trunk.0.rcb.1.weight", "trunk.0.rcb.1.bias", 
     ... 
     "upsampling.1.upsample_block.0.bias", "upsampling.1.upsample_block.2.weight", "conv_block3.weight", "conv_block3.bias". 
	Unexpected key(s) in state_dict: "epoch", "best_psnr", "state_dict", "optimizer", "scheduler".

Tries to use this weight file: SRGAN_x4-ImageNet-2204c839.pth.tar as provided in the link

To overcome the error, updated the code;
model.load_state_dict(checkpoint["state_dict"])
to
model.load_state_dict(checkpoint["state_dict"],False)
in validate.py on line 34.

After changing, validation works but PSNR is only 8.89dB. I tried also all weight files of SRGAN in the link but all 3 of them have the same PSNR result: 8.89

Which point have I missed?

test_image.py invalid arch

hello,
I've tried to run inference using the test_image.py script but am receiving the following:

$SRGAN-PyTorch$ python test_image.py --lr lr.png --hr hr.png -a srgan_4x4_16 --upscale-factor 4 --pretrained --device 0
Receive the following error:
usage: test_image.py [-h] --lr LR --hr HR [-a ARCH] [--upscale-factor {4}] [--model-path PATH] [--pretrained] [--detail] [--outf PATH] [--device DEVICE] test_image.py: error: argument -a/--arch: invalid choice: 'srgan_4x4_16' (choose from 'discriminator', 'load_state_dict_from_url', 'srgan', 'srresnet')

I then attempted:

$SRGAN-PyTorch$ python test_image.py --lr lr.png --hr hr.png -a srgan --upscale-factor 4 --pretrained --device 0
Receive the following error:
Traceback (most recent call last): File "test_image.py", line 65, in <module> estimate = Estimate(args) File "/home/user/SRGAN-PyTorch/tester.py", line 138, in __init__ self.model, self.device = configure(args) File "/home/user/SRGAN-PyTorch/srgan_pytorch/utils/common.py", line 56, in configure model = models.__dict__[args.arch](pretrained=True, upscale_factor=args.upscale_factor).to(device) TypeError: srgan() got an unexpected keyword argument 'upscale_factor'

Steps to reproduce on debian box:
$ git clone https://github.com/Lornatang/SRGAN-PyTorch.git
$ cd SRGAN-PyTorch/
$ pip3 install -r requirements.txt

Issue with testing the code for images

Hello! While running the test.py I am not getting the results, instead, the process is getting killed. Can you tell if I am doing something wrong?

pankhuri@pankhuri-G5-5500:~/academics/sem2/hlcv/project/baseline-setup/SRGAN-PyTorch$ python3 test.py --pretrained
[ WARNING ] Directory `/home/pankhuri/academics/sem2/hlcv/project/baseline-setup/SRGAN-PyTorch/tests` already exists!
[ WARNING ] Directory `/home/pankhuri/academics/sem2/hlcv/project/baseline-setup/SRGAN-PyTorch/tests/Set5` already exists!
[ INFO ] TrainEngine:
[ INFO ]        API version .......... 0.4.0
[ INFO ]        Build ................ 2021.07.09
Killed

CUDA out of memory

why when i want to calculate psnr during training i get this error even though the batch_size config is very small?

Load all datasets successfully.
Build SRResNet model successfully.
Define all loss functions successfully.
Define all optimizer functions successfully.
Check whether the pretrained model is restored...
Epoch: [1][ 0/1989] Time 15.878 (15.878) Data 0.000 ( 0.000) Loss 0.267222 (0.267222)
Traceback (most recent call last):
File "train_srresnet.py", line 463, in
main()
File "train_srresnet.py", line 98, in main
train_loss = train(model, train_prefetcher, pixel_criterion, optimizer, epoch, scaler, writer, psnr_model, ssim_model)
File "train_srresnet.py", line 249, in train
scaler.scale(loss).backward()
File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 288.00 MiB (GPU 0; 14.76 GiB total capacity; 11.70 GiB already allocated; 123.75 MiB free; 13.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

img should be PIL Image

python3 test_video.py --file /Users/admin/Work/Python/SRGAN-PyTorch/C.mp4 --pretrained --view
[ WARNING ] Directory /Users/admin/Work/Python/SRGAN-PyTorch/videos already exists!
[ INFO ] TestEngine:
[ INFO ] API version .......... 0.4.0
[ INFO ] Build ................ 2021.07.09
[ INFO ] show fps:15.0
[processing video and saving/view result videos]: 0%| | 0/4486 [00:28<?, ?it/s]
Traceback (most recent call last):
File "test_video.py", line 144, in
main()
File "test_video.py", line 106, in main
compare_image = Resize(compare_image_size, Mode.BICUBIC)(raw_frame)
File "/Users/admin/.conda/envs/P38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/admin/.conda/envs/P38/lib/python3.8/site-packages/torchvision/transforms/transforms.py", line 297, in forward
return F.resize(img, self.size, self.interpolation, self.max_size, self.antialias)
File "/Users/admin/.conda/envs/P38/lib/python3.8/site-packages/torchvision/transforms/functional.py", line 401, in resize
return F_pil.resize(img, size=size, interpolation=pil_interpolation, max_size=max_size)
File "/Users/admin/.conda/envs/P38/lib/python3.8/site-packages/torchvision/transforms/functional_pil.py", line 209, in resize
raise TypeError('img should be PIL Image. Got {}'.format(type(img)))
TypeError: img should be PIL Image. Got <class 'numpy.ndarray'>

Testing time

Thanks for the code,

when I use CPU inference time is less compared to GPU while testing the model.
Could you help me understand why such a thing is happening?

Code for reference,

Just change in line 109,
with torch.no_grad():
c_time = time.time()
sr = model(lr)
print('Total time is %f sec.'%(time.time()- c_time))

train does not generate weights

Hi,

I successfully have run train.py, but there is no .pth "weights file" generated at all. How may we generate the weights file?

Mike

Code doesn't work

[ WARNING ] Directory /glb/hou/pt.sgs/data/ml_ai_us/4d/usadh7/github_repos/SRGAN-PyTorch/benchmarks already exists!
[ INFO ] TestingEngine:
[ INFO ] Use GPU: 0 for testing.
[ INFO ] Using pre-trained model srgan.
[ INFO ] Load testing dataset.
/glb/hou/pt.sgs/data/ml_ai_us/4d/csoftware/miniconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 1, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
[ INFO ] Dataset information:
Path: /glb/hou/pt.sgs/data/ml_ai_us/4d/usadh7/github_repos/SRGAN-PyTorch/./data/test
Number of samples: 0
Number of batches: 0
Shuffle: False
Sampler: None
Workers: 8
0it [00:00, ?it/s]
Traceback (most recent call last):
File "test_benchmark.py", line 167, in
main()
File "test_benchmark.py", line 82, in main
main_worker(args.gpu, args)
File "test_benchmark.py", line 149, in main_worker
print(f"MSE {total_mse_value / len(dataloader):6.4f}\n"
ZeroDivisionError: float division by zero

Errors when SR near to white colors

Hello,

I've applied your model to images containing white and yellow sections. The SR results shows green instead of yellow but the strangest is the white color is replaced by a patron I suspect is provoked by storing NaN or something like that. I attach two examples.
With the provided weights:

After fine tuning with images from the same dataset:

Any ideas how to fix it?

Python and Tensorboard version?

Hi, I have some incompatibility problems, would you mind sharing what python version and tensorboard version you were using while you made the latest release (few days ago)?

请问您能否提供您的测试结果数据和预训练好的模型？

谢谢

Modify the upscale_factor param

when I reduce the upscale_factor value to 2 or 3, I had the problem at classifier layer. I used the DIV2K dataset. Can you help me to fix it?

Thank you!

Where's the weight file?

Hello, I'm interested in the field of super resolution, so I'm looking your code.
My GPU is not good, so I want to get pre-trained weight.
Can I install the weight file?

D(SR) = 0 and D(HR) = 1

Hi,

I have been training the SRGAN model from the repo and I have got the following losses :

I do not understand why D(SR) = 0 over the epochs and D(HR) = 1.
The dataset that I used is the ImageNet dataset provided in the readme. Is the generator unable to trick the discriminator ?

Thank you very much

backward Error

ssh://[email protected]:22/data/tianhao.lu/software/anaconda3/envs/GAN/bin/python -u /home/tianhao.lu/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 0.0.0.0 --port 40739 --file /data/tianhao.lu/code/ProjectArchitecture/complete/SRGAN_demo.py
已连接到 pydev 调试器(内部版本 211.6693.115)# generator parameters: 734219

discriminator parameters: 5215425

0%| | 0/5 [00:06<?, ?it/s]
Traceback (most recent call last):
File "/home/tianhao.lu/.pycharm_helpers/pydev/pydevd.py", line 1483, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/tianhao.lu/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/data/tianhao.lu/code/ProjectArchitecture/complete/SRGAN_demo.py", line 98, in
g_loss.backward()
File "/data/tianhao.lu/software/anaconda3/envs/GAN/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/data/tianhao.lu/software/anaconda3/envs/GAN/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 1024, 1, 1]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
python-BaseException

lornatang / srgan-pytorch Goto Github PK

srgan-pytorch's Issues

discriminator parameters: 5215425

Recommend Projects

Recommend Topics

Recommend Org

Jobs