pkuxmq / invertible-image-rescaling Goto Github PK

View Code? Open in Web Editor NEW

627.0 627.0 86.0 45.17 MB

[ECCV 2020, IJCV 2022] Invertible Image Rescaling

License: Apache License 2.0

Python 95.73% Shell 0.12% MATLAB 4.14%

invertible-image-rescaling's People

Contributors

Stargazers

Watchers

Forkers

wangg12 wh-forker manolo1988 zjk1988 xiusdk zymale darrenxc amo5 leonselina zhengsx splinter21 present-cjn pbdahzou peterzhousz cv-ip chisyliu sunbingfeng drownfish19 yangzhen0000 duanexiao lotayou jamesli1618 smallflyingpig baifree zhangdan8962 jayzhan211 jiecehub bzlff 18810118767 henry-avery greitzmann wubianluomu2000 rainyfly peterzs zardzen bruinxiong magsail juingzhou positive666 lliai skipper17 peterhan91 zoq howardfive pdhananjaya liuwenbo3 aiedward yangyang222244 ding3820 updating00 greatlog yang-liu1082 templeblock felipeinagaki cvlinks perseverancelx coriverchen branchcode snowbbbb hityzy1122 jamekuma kite-hz highcoldman hdhzhd andy530 zhengxinchenee celesteup kbviolet monaco12138 makex1n zhangjinrong dhtsing wwhappylife thu-kingmin zhangjinghan brighteast liguo12138 ld-xy gg-big-org fmetaphorize yyang181 sophiesongge yuemingyue marenan lzuwjy

invertible-image-rescaling's Issues

The formula in paper seems not the same shown in code？

It is a great job！
The inverse-operation is quite novel for me，I notice that in the paper， the formula is

hl+1 = h | exp( (hl 2)) + φ(hl 2);
hl+12 = hl 2 exp(ρ(hl 1+1)) + η(hl 1+1)

However，the code is

if not rev:
y1 = x1 + self.F(x2)
self.s = self.clamp * (torch.sigmoid(self.H(y1)) * 2 - 1)
y2 = x2.mul(torch.exp(self.s)) + self.G(y1)

the exp operation in h1 seams not work？

Besides, ，Could please explain the jacobean value mean？I couldn‘t got it:)

Where is the implementation for stage 2 with full distribution matching loss `L_distr`

The given configs (e.g. train_IRN_x4.yml) seem to be the stage 1 (pre-training stage).

Why the PSNR value decreases by several DBs when I introduces H.264 encoding losses?

In my application scenario, the RGB data captured by the camera is down-scale(x2) at the transmit phase, then the down-sampled(x2) image is encoded by H.264 and transmitted over the network, and the H.264 stream are received by the receiver(such as phone or PC), then I do up-scale (x2) by the super-resolution algorithm in the receiver side. My goal is expected that this will save bandwidth for network transmission.

I wanna to use IRN as my down-scale and up-scale algorithm. After using the IRN algorithm to obtain LR images, I use H264 to encode it, but it will introduces some encoding losses. as you know, H264 is not a lossless encoder, and this means y(in below image) has a little change. However, After the H.264 streams are decoded at the receiver and then input to the IRN network for up-scale, the PSNR value decreases by several DBs.

This means during reconstruction of SR image, IRN are Sensitive to the input of the LR( y in below image ), any minor change will result in failure? Is there a way to solve this problem?

train IRN_x4??

您好，请问一下，在训练IRN_x4的时候，得到的低分辨率小图很正常，但是重建后的大图为什么会出现彩色小点点，在调整完lambda_fit_forw到160，还是出现彩色小点？望解答啊

Hello, I'm training IRN_ In x4, the low resolution small image is normal, but why do the reconstructed large image appear small color dots? After adjusting the lambda_ fit_ Forw to 160, or colored dots? Look for an answer

Questions about the models

Thanks for your great work.
I read your paper too, and I got several questions.
May I know that does the z is only used in the loss function computation?
Do we need to provide an input LR y with z during influence?
Is the downsampling process from x to y also embed some information in y, such that y contains some features that favor the inverted process for upsampling, or the process is just similar to the bicubic downsample?
Thank you.

Training time?

How much time do you train the model in your paper? Thank you

Questions about network with jpeg

Thanks for your great work.
I read your paper really interestingly.
And I have a question.
I wonder how to train 'IRN_model_CRM'. I guess first you train IRN, afterwards you train CRM('only_jpeg_reconstruction')... is that right?
I want to know exactly how to train this network.
Thank you :)

Training Details

Hi, thank for your amazing work.
I want to reproduce your work using the config train_IRN_x2.yml, but the training is slow and will cost 4 days using the default config. Do I need to notice any detail? BTW, does the default config in yml file is the correct one to reproduce the paper results?

图像大小的问题

原始图像大小可能200k，但是经过网络保存的GT图片就打到400k，LR图片甚至比原始图片200k还要大，请问这是什么原因引起的呢

H264 compression

如何把h264编码损失加入到压缩框架中啊？？

GPU memory requirement

Hi,

I tried to train the net again, but failed with memory issue.

RuntimeError: CUDA out of memory. Tried to allocate 42.00 MiB (GPU 0; 7.79 GiB total capacity; 6.11 GiB already allocated; 29.31 MiB free; 6.33 GiB reserved in total by PyTorch)

So I was wondering what GPU you used for training, or if there are some tips to optimize the GPU memory usage.

Thanks!

About Distribution Loss

Hi,
I have a doubt that has troubled me for a while. Since the network is reversible, why do you use JS divergence instead of a series of Jacobians to constrain the distribution? (P.S. I also found that you calculated Jacobian in your code, but did not use it)

About loss_ce?

Hi , I'm confused why use ''torch.sum(z**2)'' to calculate loss_ce. Besides, how can it be transformed from calculating loss_distr on Y x Z to calculating Loss_ce only on Z space?

Separate downscaling from upscaling

Hi,
Thanks for your amazing work.
I'm trying to separate the process of downscaling and upscaling in test step .
For upscaling I've followed #4 and renamed test.py to upscaling.py and works.
I'm trying to do the same thing for downscaling but without sucess so far.
I've tried:

img_path = data['GT_path'][0]
img_name = osp.splitext(osp.basename(img_path))[0]

output_img = model.downscale(data['GT'].cuda()).detach()[0].float().cpu()
sr_img = util.tensor2img(output_img)

however obtained the following error:
Traceback (most recent call last):
File "codes/downscale.py", line 59, in
output_img = model.downscale(data['GT'].cuda()).detach()[0].float().cpu()
File "/content/m/codes/models/IRN_model.py", line 163, in downscale
LR_img = self.Quantization(self.forw_L)
AttributeError: 'IRNModel' object has no attribute 'forw_L'

Could you help me ?
I want to generate LR image from dowsncaling.py and use as input in upscaling.py

ssim计算问题

Invertible-Image-Rescaling/codes/utils/util.py

Lines 167 to 185 in 93aaa94

 def calculate_ssim(img1, img2): 

 '''calculate SSIM 

  the same outputs as MATLAB's 

  img1, img2: [0, 255] 

  ''' 

 if not img1.shape == img2.shape: 

 raise ValueError('Input images must have the same dimensions.') 

 if img1.ndim == 2: 

 return ssim(img1, img2) 

 elif img1.ndim == 3: 

 if img1.shape[2] == 3: 

 ssims = [] 

 for i in range(3): 

 ssims.append(ssim(img1, img2)) 

 return np.array(ssims).mean() 

 elif img1.shape[2] == 1: 

 return ssim(np.squeeze(img1), np.squeeze(img2)) 

 else: 

 raise ValueError('Wrong input image dimensions.')

计算ssim这里的第180行是不是写错了？处理3通道图片的时候

Weird artifacts when testing on Kodak dataset

Thanks for sharing the great work!

I had a test of IRN_x2 on the Kodak dataset (http://r0k.us/graphics/kodak/), which often adoped in multiple image-related tasks.
I saw some weird green shadow-like artifacts in many of the reconstructed images. both in LR and the reconstructed ones (like images shown below). Since the Kodak is also natural image dataset, I don't expect it has special characteristics or distributions compared to set5, set14, or DIV2K, which will lead to these artifacts. Did I ignore something or make something wrong? Do you observe similar artifacts in your experiments?

about trainning by Y channel with yuv files

hi amazing work
I want to train use Y channel with yuv files , can you give me some advices about it ?

8 times downscale

Hi, thanks for your work.
The results of IRN4X and IRN2X are amizing.I really want to know if you have ever trained a IRN8X network?Did it work well too?Do you have any suggestions to train a IRN8X network.

What if the log determinant of Jacobian diverge to nan?

I have tried several times, but the log det of Jacobian couldn't converge. What should I do to solve the problem or what can I do to analyze the problem? Thanks for your reply.

question about image rescaling problem

你好，为了方便就用中文辽～我是新手，想问下image rescaling任务如果只使用cnn做encode & decode psnr相比image SR能提升多少呢？有没有可能可以改进，而不使用flow based的方法

Will the pre-trained model be released later?

Hi @pkuxmq,
Thank you very much for your meaningful research and quick sharing.
Will the pre-trained model be released later？Looking forward to your reply.

How to calculate the jacobian

Hi, thanks for your novel work, and I'm confused about how to calculate the last Jacobian. For example, in the class HaarDownsampling, the Jacobian is calculated as follows,

if not rev: self.elements = x.shape[1] * x.shape[2] * x.shape[3] self.last_jac = self.elements / 4 * np.log(1/16.) ... else: self.elements = x.shape[1] * x.shape[2] * x.shape[3] self.last_jac = self.elements / 4 * np.log(16.)
And in the class InvBlockExp,
def jacobian(self, x, rev=False): if not rev: jac = torch.sum(self.s) else: jac = -torch.sum(self.s) return jac / x.shape[0]

I want to replace the HaarDownsampling with other neural networks, should the kernel follow some special design? And could you give me some tips on how to calculate the Jacobian of this new network?
Thanks in advance!
Sincerely

如何输入LQ，得到HQ

你好作者，首先非常感谢您上传代码，我想问一下我该如何修改代码，才能让输入图像和输出图像是一个尺寸大小，并且我怎么才能输入低质量图像，利用训练好的模型，得到高质量的图像呢，非常感谢您的回答，谢谢！

您好，请问该怎么做使test.py得到的fake_H图片与原GT图大小一致？

作者大大您好，感谢您的工作，我目前工作也使用到您这种压缩后恢复的方法。但是在采用您的方法，我遇到了一个困难，在使用预训练模型IRN_x8.pth做test.py时发现重构的fake_H图片与原ground truth图片大小不一致，我该如何修改才能使他们一致呢？谢谢您的回复！

Why we need to crop border?

Hello, in test.py, you use scale to determine the border. My question is why you use scale to determine border? If I have a H*W image and a super-resolution one, it will become (H-2)*(W-2) after cropping with scale = 1. Why you calculate psnr on (H-2)*(W-2) one instead of H*W one? I think you are right, but I just want to know the reason.

Performance on Bicubic-based LR images to HR images

Hi,

I am wondering how is the performance of your model from bicubic-based LR to HR images on DIV2K during testing.

I noticed that either for training or testing, the input for model will always be HR images. So does the code support for LR images as input for testing?

Thank you in advance!

关于哈尔小波下采样

一般的哈尔小波变换是基于pytorch_wavelets库做的，本文设置一些haar_weight，通过卷积形式来实现，不太明白为什么这样可以，可以解释一下吗，谢谢

About Training Details

Hi, I runed the training code with the config of 'train_IRN_x2.yml' on a single V100 GPU. I think I should get the same psnr results listed in your paper. But I got a lower psnr results, which is about 0.2-0.3dB less than yours. I wonder if I missed some training details or techniques. Would you give me some advices?

What is the reason that let you use gradient clipping here?

Hi, I want to train the network without gradient clipping. However, the loss will converge first and suddenly diverge. Do you know the reason about it? What is the reason that let you use gradient clipping?

why is so slow? in one V100 GPU, 10 second per frame?

Hi, @ALL, this is really good work, and I test it on DIV2k image, and 4 times SR from about resolution 400x500 --> 1600x2000, but is seems really slow: in one V100 GPU, it cost about 10 second per frame.

I wonder to know why is so slow, which part of the technology is slow? INN?

Signal high-frequency information recovery

Thanks for your amazing work.
I tried to use the IRN model to restore the lost high-frequency information in the signal. The model training results show that the matching effect of SR and GT is very good, but the difference between LR and LR_Ref is very large, unfortunately. My purpose is to get SR through LR_Ref, how should I adjust the model?

Link of pretrained model is broken

Hi,
I'm trying to perform inference with pre-trained model.
However, I notice that the links are broken.
Can you share the pretrained model again?

The training is unstable

Thank you for your impressive work. But when I try to recurrent this network(I rewrite the code myself), sometimes the loss will suddenly increase by 10 times. The structure of the network is correct because I can load the pretrained network, so I think there may be some details I didn't notice. Could you tell me what methods you have taken in training to ensure stability?

Questions regarding the the test function at IRN_model.py

Hi,
I would like to thank the authors for their great work in SR field!
However, I do have a question regarding the following test() function

def test(self):
 Lshape = self.ref_L.shape
 input_dim = Lshape[1]
 self.input = self.real_H
 zshape = [Lshape[0], input_dim * (self.opt['scale']**2) - Lshape[1], Lshape[2], Lshape[3]]
 gaussian_scale = 1
 if self.test_opt and self.test_opt['gaussian_scale'] != None:
     gaussian_scale = self.test_opt['gaussian_scale']
 self.netG.eval()
 with torch.no_grad():
     self.forw_L = self.netG(x=self.input)[:, :3, :, :]
     self.forw_L = self.Quantization(self.forw_L)
     y_forw = torch.cat((self.forw_L, gaussian_scale * self.gaussian_batch(zshape)), dim=1)
     self.fake_H = self.netG(x=y_forw, rev=True)[:, :3, :, :]
 self.netG.train()

I'm not sure why we used self.real_H as an input during the test time.
In reality, such a high-resolution ground truth is not available during testing.
As a more general case, i.e., only the low resolution input self.ref_L is provided, I replaced self.forw_L= self.netG(x=self.input)[:, :3, :, :] with self.forw_L=self.ref_L and run the test.
Unfortunately, the result is very poor. :( Could you please help me with this?

Testing with desktop captured image, I got a blur output.

Firstly, many thanks for the authors' great job on image rescaling!I've trained the model in 'IRN 4x mode', and got a really good result with 'DIV2K' validation dataset.But when I use a downscaled screenshot image as the HR image, the output SR image is not good enough.Firstly, I captured my desktop into an image, which size is 3840x2160, then downscale it with 'ffmpeg' to 1920x1080, and then use this 1920x1080 image as the test image, I got an blur SR image.The characters in this SR image are all blurred, which are much more difficult to recognize than those in ground truth 1920x1080 image.However, the animals in the image are not blurred.On the contrary, If I use the original 3840x2160 image to test, the characters will not be blurred.

The original 3840x2160 image(ground truth):
3840x2160-gt
The SR of Raw 3840x2160 image:
3840x2160-sr
The donwscaled 1920x1080 image(ground truth):
1920x1080-gt
The SR of donwscaled 1920x1080 image:
1920x1080-sr

I trained the model with 'DIV2K' dataset, I found that all images in this dataset are very clear ones.So, should test images also be very clear ones?Is the 1080p(1920x1080) image not clear enough?

About hyper-parameters lambda

Hi
In Equation 10 of the paper, lambda_1 and lambda_3 are 1, but lambda_2 is set to 16. why setting lambda_2 so large?

what is train_IRN_x2_finetune.yml used for?

我没有找到train_IRN_x2对应的gan训练的文件，请问是没有上传吗？另外这里train_IRN_x2_finetune.yml的作用是什么？论文中的2x 指标是用gan训练的结果吗？

Is this training process correct?

@pkuxmq Hi, when I run "python train.py -opt options/train/train_IRN_x4.yml" (nothing change in train_IRN_x4.yml) . I find the validation log as below:

20-07-28 20:02:43.941 - INFO: <epoch: 99, iter: 5,000> psnr: 2.9862e+01.
20-07-28 21:41:00.651 - INFO: <epoch:199, iter: 10,000> psnr: 3.0981e+01.
20-07-28 23:24:36.609 - INFO: <epoch:299, iter: 15,000> psnr: 3.1322e+01.
20-07-29 01:05:20.143 - INFO: <epoch:399, iter: 20,000> psnr: 3.1410e+01.
20-07-29 02:43:22.314 - INFO: <epoch:499, iter: 25,000> psnr: 3.1553e+01.
20-07-29 04:24:01.338 - INFO: <epoch:599, iter: 30,000> psnr: 3.1646e+01.
20-07-29 06:05:05.640 - INFO: <epoch:699, iter: 35,000> psnr: 3.1639e+01.
20-07-29 07:43:24.878 - INFO: <epoch:799, iter: 40,000> psnr: 3.1862e+01.
20-07-29 09:25:29.029 - INFO: <epoch:899, iter: 45,000> psnr: 3.1765e+01.
20-07-29 11:04:49.718 - INFO: <epoch:999, iter: 50,000> psnr: 3.1857e+01.
20-07-29 12:43:52.355 - INFO: <epoch:1099, iter: 55,000> psnr: 3.1875e+01.
20-07-29 14:26:15.819 - INFO: <epoch:1199, iter: 60,000> psnr: 3.2005e+01.
20-07-29 16:06:06.029 - INFO: <epoch:1299, iter: 65,000> psnr: 3.1971e+01.
20-07-29 17:43:06.634 - INFO: <epoch:1399, iter: 70,000> psnr: 3.1794e+01.
20-07-29 19:25:28.171 - INFO: <epoch:1499, iter: 75,000> psnr: 3.1746e+01.
20-07-29 21:08:17.830 - INFO: <epoch:1599, iter: 80,000> psnr: 3.1740e+01.
20-07-29 22:46:15.390 - INFO: <epoch:1699, iter: 85,000> psnr: 3.1913e+01.
20-07-30 00:26:51.227 - INFO: <epoch:1799, iter: 90,000> psnr: 3.1833e+01.
20-07-30 02:07:26.182 - INFO: <epoch:1899, iter: 95,000> psnr: 3.1923e+01.
20-07-30 03:45:40.531 - INFO: <epoch:1999, iter: 100,000> psnr: 3.1870e+01.
20-07-30 05:26:20.864 - INFO: <epoch:2099, iter: 105,000> psnr: 3.2184e+01.
20-07-30 07:09:07.773 - INFO: <epoch:2199, iter: 110,000> psnr: 3.2158e+01.
20-07-30 08:51:11.258 - INFO: <epoch:2299, iter: 115,000> psnr: 3.2238e+01.
20-07-30 10:34:42.669 - INFO: <epoch:2399, iter: 120,000> psnr: 3.2157e+01.
20-07-30 12:16:32.068 - INFO: <epoch:2499, iter: 125,000> psnr: 3.2186e+01.
20-07-30 14:05:44.933 - INFO: <epoch:2599, iter: 130,000> psnr: 3.2131e+01.
20-07-30 16:04:55.918 - INFO: <epoch:2699, iter: 135,000> psnr: 3.2209e+01.
20-07-30 18:02:46.877 - INFO: <epoch:2799, iter: 140,000> psnr: 3.2193e+01.
20-07-30 20:01:30.929 - INFO: <epoch:2899, iter: 145,000> psnr: 3.2184e+01.
20-07-30 22:00:33.235 - INFO: <epoch:2999, iter: 150,000> psnr: 3.2234e+01.
20-07-30 23:58:48.030 - INFO: <epoch:3099, iter: 155,000> psnr: 3.2176e+01.
20-07-31 01:58:16.605 - INFO: <epoch:3199, iter: 160,000> psnr: 3.2134e+01.
20-07-31 03:57:55.036 - INFO: <epoch:3299, iter: 165,000> psnr: 3.2190e+01.
20-07-31 05:57:29.178 - INFO: <epoch:3399, iter: 170,000> psnr: 3.2055e+01.
20-07-31 08:34:08.977 - INFO: <epoch:3499, iter: 175,000> psnr: 3.2192e+01.
20-07-31 10:34:28.891 - INFO: <epoch:3599, iter: 180,000> psnr: 3.2137e+01.
20-07-31 12:32:34.701 - INFO: <epoch:3699, iter: 185,000> psnr: 3.2155e+01.
20-07-31 14:30:21.658 - INFO: <epoch:3799, iter: 190,000> psnr: 3.2176e+01.
20-07-31 16:30:04.046 - INFO: <epoch:3899, iter: 195,000> psnr: 3.2132e+01.
20-07-31 18:27:40.744 - INFO: <epoch:3999, iter: 200,000> psnr: 3.2186e+01.
20-07-31 21:18:03.124 - INFO: <epoch:4099, iter: 205,000> psnr: 3.2356e+01.
20-07-31 23:18:25.399 - INFO: <epoch:4199, iter: 210,000> psnr: 3.2415e+01.
20-08-01 01:30:43.122 - INFO: <epoch:4299, iter: 215,000> psnr: 3.2399e+01.
20-08-01 03:29:23.416 - INFO: <epoch:4399, iter: 220,000> psnr: 3.2401e+01.
20-08-01 05:42:45.864 - INFO: <epoch:4499, iter: 225,000> psnr: 3.2354e+01.
20-08-01 07:41:49.838 - INFO: <epoch:4599, iter: 230,000> psnr: 3.2396e+01.
20-08-01 09:40:05.609 - INFO: <epoch:4699, iter: 235,000> psnr: 3.2428e+01.
20-08-01 11:35:44.124 - INFO: <epoch:4799, iter: 240,000> psnr: 3.2334e+01.
20-08-01 13:25:11.782 - INFO: <epoch:4899, iter: 245,000> psnr: 3.2112e+01.
20-08-01 14:40:20.978 - INFO: <epoch:4999, iter: 250,000> psnr: 3.2427e+01.
20-08-01 15:34:56.752 - INFO: <epoch:5099, iter: 255,000> psnr: 3.2367e+01.
20-08-01 16:31:03.010 - INFO: <epoch:5199, iter: 260,000> psnr: 3.2382e+01.
20-08-01 17:26:51.330 - INFO: <epoch:5299, iter: 265,000> psnr: 3.2443e+01.
20-08-01 18:30:16.764 - INFO: <epoch:5399, iter: 270,000> psnr: 3.2142e+01.
20-08-01 19:40:53.347 - INFO: <epoch:5499, iter: 275,000> psnr: 3.2391e+01.
20-08-01 20:35:14.469 - INFO: <epoch:5599, iter: 280,000> psnr: 3.2404e+01.
20-08-01 21:54:01.785 - INFO: <epoch:5699, iter: 285,000> psnr: 3.2371e+01.
20-08-01 23:15:16.187 - INFO: <epoch:5799, iter: 290,000> psnr: 3.2415e+01.
20-08-02 00:37:40.440 - INFO: <epoch:5899, iter: 295,000> psnr: 3.2385e+01.
20-08-02 01:35:00.178 - INFO: <epoch:5999, iter: 300,000> psnr: 3.2393e+01.
20-08-02 02:42:40.792 - INFO: <epoch:6099, iter: 305,000> psnr: 3.2570e+01.
20-08-02 04:02:45.011 - INFO: <epoch:6199, iter: 310,000> psnr: 3.2594e+01.
20-08-02 05:21:02.174 - INFO: <epoch:6299, iter: 315,000> psnr: 3.2575e+01.
20-08-02 06:41:36.365 - INFO: <epoch:6399, iter: 320,000> psnr: 3.2598e+01.
20-08-02 08:01:57.685 - INFO: <epoch:6499, iter: 325,000> psnr: 3.2608e+01.
20-08-02 09:21:45.858 - INFO: <epoch:6599, iter: 330,000> psnr: 3.2603e+01.
20-08-02 10:43:11.029 - INFO: <epoch:6699, iter: 335,000> psnr: 3.2582e+01.
20-08-02 12:04:38.429 - INFO: <epoch:6799, iter: 340,000> psnr: 3.2589e+01.
20-08-02 13:27:13.231 - INFO: <epoch:6899, iter: 345,000> psnr: 3.2610e+01.
20-08-02 14:48:32.692 - INFO: <epoch:6999, iter: 350,000> psnr: 3.2611e+01.
20-08-02 16:08:36.940 - INFO: <epoch:7099, iter: 355,000> psnr: 3.2601e+01.
20-08-02 17:29:17.291 - INFO: <epoch:7199, iter: 360,000> psnr: 3.2577e+01.
20-08-02 18:49:37.341 - INFO: <epoch:7299, iter: 365,000> psnr: 3.2599e+01.
20-08-02 20:10:23.640 - INFO: <epoch:7399, iter: 370,000> psnr: 3.2640e+01.
20-08-02 21:36:42.104 - INFO: <epoch:7499, iter: 375,000> psnr: 3.2617e+01.
20-08-02 23:10:36.419 - INFO: <epoch:7599, iter: 380,000> psnr: 3.2585e+01.
20-08-03 00:50:44.942 - INFO: <epoch:7699, iter: 385,000> psnr: 3.2614e+01.
20-08-03 02:30:23.124 - INFO: <epoch:7799, iter: 390,000> psnr: 3.2633e+01.
20-08-03 04:08:43.651 - INFO: <epoch:7899, iter: 395,000> psnr: 3.2592e+01.
20-08-03 05:48:41.351 - INFO: <epoch:7999, iter: 400,000> psnr: 3.2608e+01.
20-08-03 07:28:49.655 - INFO: <epoch:8099, iter: 405,000> psnr: 3.2696e+01.
20-08-03 09:08:23.989 - INFO: <epoch:8199, iter: 410,000> psnr: 3.2598e+01.
20-08-03 10:50:37.364 - INFO: <epoch:8299, iter: 415,000> psnr: 3.2601e+01.
20-08-03 12:32:00.880 - INFO: <epoch:8399, iter: 420,000> psnr: 3.2716e+01.
20-08-03 14:11:19.063 - INFO: <epoch:8499, iter: 425,000> psnr: 3.2717e+01.

I find that in your paper (table 1), the psnr on the validation set of DIV2K is 35.07 (500k iteration). But the psnr in the validation log above is around 32.7 (425k iteration). It may be something wrong happen?

Environmet:
pytorch 1.2.0
cuda 10.2
4x Tesla v100 (32GB)

About Gaussian Meaning

Hi,
Thanks your good work in advance.
But I notices the followin code:

def gaussian_batch(self, dims):
return torch.randn(tuple(dims)).to(self.device)

def test(self):
 Lshape = self.ref_L.shape
 input_dim = Lshape[1]
 self.input = self.real_H
 zshape = [Lshape[0], input_dim * (self.opt['scale']**2) - Lshape[1], Lshape[2], Lshape[3]]
 gaussian_scale = 1
 if self.test_opt and self.test_opt['gaussian_scale'] != None:
     gaussian_scale = self.test_opt['gaussian_scale']
 self.netG.eval()
 with torch.no_grad():
     self.forw_L = self.netG(x=self.input)[:, :3, :, :]
     self.forw_L = self.Quantization(self.forw_L)
     y_forw = torch.cat((self.forw_L, gaussian_scale * self.gaussian_batch(zshape)), dim=1)
     self.fake_H = self.netG(x=y_forw, rev=True)[:, :3, :, :]
 self.netG.train()

My question is:
What's the meaning with LR + Gaussian Noise to generate HR image ? Is it better than 0 or just for test ?

How IRN deals with the loss caused by video compression artifact?

@pkuxmq Hi, thanks for your nice work！
I have a question. I think that the whole pipeline of your model does not consider the loss of video compression artifact ? In the the real-world application, the whole pipeline may like this: hr images -> downsample with IRN -> video encoder -> video decode -> upsample with IRN. But the pipeline in your model is: hr images -> downsample with IRN -> upsample with IRN. How do this model deal with the loss caused by the missing "video encode + video decode" part (i.e. video compression artifact) ? Or any ideas on dealing with such loss?
Thanks in advance!

Problem about loss_ce

Thanks for your novel work! But I'm a little confused about loss_ce: l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0]. I want to know why this loss function can restrain z following the Gaussian distribution. Looking forward to your reply!

why using gt size 144?

Hello, thanks you for researching!

I wonder why you use GT size 144?

LQ image size is 36, HQ image size is 144. that lost much information..

Running on Windows 10 getting RuntimeError

Hi i got an error when trying to run this on Windows 10. I already found the solution but i don't know how to implement it in your code.

https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

I get the error:

...AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

A minor mistake?

A method called downscale of IRNModel seems unused. So, maybe it is a nonsignificant mistake here:

Invertible-Image-Rescaling/codes/models/IRN_model.py

Line 163 in f74e844

LR_img = self.Quantization(self.forw_L)

Should it be like this LR_img = self.Quantization(LR_img )

Some questions about the experiment？

Have you tried feeding Bicubic's LR image into the model to generate the HR image and compare with the original image during the test phase, instead of using the LR image generated by the model and feeding it back into the model to generate the HR result ?

model output is the same shape as input...

Hi, thanks for releasing the repo.
I want to use it to train a image enhance network ,My input are those low quality images,and output are high-quality ones, but they are the same size.
How do I modify the network so it can adjustify my work?Because, I see your net is always downscale ,and when I change the scale factor in opt yml to 1, it return a null net ....

Why we need the HR reconstruction loss?

Dear authors,

I appreciate your novel work, that's really impressive.

I was wondering why you need the reconstruction loss in the case the invertible net is able to recover the HR images. I am also curious if you have any ablation study on this loss, which can better shows the contribution of it.

Thanks for your time.

Dosen't work with LQ mode

Hi @pkuxmq
Thanks for your amazing work.
I am trying to run the test code with LQ mode, ie. merely low resolution images are provided. it seems that this mode hasn't been tested, say, LQ_dataset is not defined. Could you please update your code later?

Thanks,
Lei

about training convergence speed

I have used your code for replication, and the first tens of thousands of loss and validation results are as follows. Is this convergence normal？ (training on div2k, val in set5)

validation:
23-09-11 12:14:06.207 - INFO: <epoch: 99, iter:   5,000> psnr: 3.3753e+01.
23-09-11 13:09:13.902 - INFO: <epoch:199, iter:  10,000> psnr: 3.4853e+01.
23-09-11 14:15:48.923 - INFO: <epoch:299, iter:  15,000> psnr: 3.6143e+01.
23-09-11 15:26:26.413 - INFO: <epoch:399, iter:  20,000> psnr: 3.6398e+01.
23-09-11 16:31:23.065 - INFO: <epoch:499, iter:  25,000> psnr: 3.6537e+01.

traing loss:
........
........
........
23-09-11 15:47:40.054 - INFO: <epoch:429, iter:  21,500, lr:2.000e-04> l_forw_fit: 9.0363e+00 l_forw_ce: 7.0932e-01 l_back_rec: 5.3173e+02 
23-09-11 15:49:06.322 - INFO: <epoch:431, iter:  21,600, lr:2.000e-04> l_forw_fit: 1.1463e+01 l_forw_ce: 2.1733e+00 l_back_rec: 4.9017e+02 
23-09-11 15:50:32.917 - INFO: <epoch:433, iter:  21,700, lr:2.000e-04> l_forw_fit: 2.0284e+01 l_forw_ce: 9.2368e-01 l_back_rec: 6.0587e+02 
23-09-11 15:51:58.845 - INFO: <epoch:435, iter:  21,800, lr:2.000e-04> l_forw_fit: 1.0459e+01 l_forw_ce: 1.3879e+00 l_back_rec: 4.6550e+02 
23-09-11 15:53:25.101 - INFO: <epoch:437, iter:  21,900, lr:2.000e-04> l_forw_fit: 2.2528e+01 l_forw_ce: 1.8308e+00 l_back_rec: 7.1213e+02 
23-09-11 15:54:51.292 - INFO: <epoch:439, iter:  22,000, lr:2.000e-04> l_forw_fit: 1.9186e+01 l_forw_ce: 1.1946e+00 l_back_rec: 7.1015e+02 
23-09-11 15:56:17.214 - INFO: <epoch:441, iter:  22,100, lr:2.000e-04> l_forw_fit: 1.4027e+01 l_forw_ce: 1.2276e+00 l_back_rec: 5.7817e+02 
23-09-11 15:57:39.386 - INFO: <epoch:443, iter:  22,200, lr:2.000e-04> l_forw_fit: 1.2232e+01 l_forw_ce: 2.5495e+00 l_back_rec: 5.3580e+02 
23-09-11 15:58:53.436 - INFO: <epoch:445, iter:  22,300, lr:2.000e-04> l_forw_fit: 2.6561e+01 l_forw_ce: 5.2295e+00 l_back_rec: 5.9502e+02 
23-09-11 16:00:08.231 - INFO: <epoch:447, iter:  22,400, lr:2.000e-04> l_forw_fit: 1.1626e+01 l_forw_ce: 6.5276e-01 l_back_rec: 5.2617e+02 
23-09-11 16:01:20.911 - INFO: <epoch:449, iter:  22,500, lr:2.000e-04> l_forw_fit: 3.6728e+01 l_forw_ce: 5.2806e+00 l_back_rec: 7.5190e+02 
23-09-11 16:02:35.347 - INFO: <epoch:451, iter:  22,600, lr:2.000e-04> l_forw_fit: 9.0049e+00 l_forw_ce: 2.3042e+00 l_back_rec: 4.5918e+02 
23-09-11 16:03:49.039 - INFO: <epoch:453, iter:  22,700, lr:2.000e-04> l_forw_fit: 7.4976e+00 l_forw_ce: 1.0088e+00 l_back_rec: 4.1533e+02 
23-09-11 16:05:00.367 - INFO: <epoch:455, iter:  22,800, lr:2.000e-04> l_forw_fit: 1.1508e+01 l_forw_ce: 1.4149e+00 l_back_rec: 5.1731e+02 
23-09-11 16:06:09.921 - INFO: <epoch:457, iter:  22,900, lr:2.000e-04> l_forw_fit: 1.1961e+01 l_forw_ce: 4.2065e+00 l_back_rec: 5.1242e+02 
23-09-11 16:07:24.249 - INFO: <epoch:459, iter:  23,000, lr:2.000e-04> l_forw_fit: 1.2395e+01 l_forw_ce: 4.6911e+00 l_back_rec: 5.2116e+02 
23-09-11 16:08:38.850 - INFO: <epoch:461, iter:  23,100, lr:2.000e-04> l_forw_fit: 2.2654e+01 l_forw_ce: 6.0414e+00 l_back_rec: 6.2512e+02 
23-09-11 16:09:54.473 - INFO: <epoch:463, iter:  23,200, lr:2.000e-04> l_forw_fit: 1.6905e+01 l_forw_ce: 2.5700e+00 l_back_rec: 5.8382e+02 
23-09-11 16:11:12.018 - INFO: <epoch:465, iter:  23,300, lr:2.000e-04> l_forw_fit: 9.7896e+00 l_forw_ce: 2.4648e+00 l_back_rec: 4.5673e+02 
23-09-11 16:12:26.675 - INFO: <epoch:467, iter:  23,400, lr:2.000e-04> l_forw_fit: 2.8638e+01 l_forw_ce: 8.3577e-01 l_back_rec: 7.3611e+02 
23-09-11 16:13:43.065 - INFO: <epoch:469, iter:  23,500, lr:2.000e-04> l_forw_fit: 1.9655e+01 l_forw_ce: 5.6923e+00 l_back_rec: 6.4559e+02 
23-09-11 16:14:59.031 - INFO: <epoch:471, iter:  23,600, lr:2.000e-04> l_forw_fit: 1.7584e+01 l_forw_ce: 2.0713e+00 l_back_rec: 5.8603e+02 
23-09-11 16:16:12.780 - INFO: <epoch:473, iter:  23,700, lr:2.000e-04> l_forw_fit: 2.1075e+01 l_forw_ce: 4.7084e+00 l_back_rec: 6.8721e+02 
23-09-11 16:17:26.316 - INFO: <epoch:475, iter:  23,800, lr:2.000e-04> l_forw_fit: 8.9918e+00 l_forw_ce: 1.4385e+00 l_back_rec: 4.2553e+02 
23-09-11 16:18:39.170 - INFO: <epoch:477, iter:  23,900, lr:2.000e-04> l_forw_fit: 1.8673e+01 l_forw_ce: 2.5744e+00 l_back_rec: 6.7018e+02 
23-09-11 16:19:49.879 - INFO: <epoch:479, iter:  24,000, lr:2.000e-04> l_forw_fit: 8.5514e+00 l_forw_ce: 9.6314e-01 l_back_rec: 5.1129e+02 
23-09-11 16:21:02.620 - INFO: <epoch:481, iter:  24,100, lr:2.000e-04> l_forw_fit: 1.8538e+01 l_forw_ce: 6.1894e+00 l_back_rec: 7.0081e+02 
23-09-11 16:22:12.007 - INFO: <epoch:483, iter:  24,200, lr:2.000e-04> l_forw_fit: 1.4954e+01 l_forw_ce: 4.5996e+00 l_back_rec: 6.5795e+02 
23-09-11 16:23:19.003 - INFO: <epoch:485, iter:  24,300, lr:2.000e-04> l_forw_fit: 3.1993e+01 l_forw_ce: 2.5733e+00 l_back_rec: 6.6076e+02 
23-09-11 16:24:29.014 - INFO: <epoch:487, iter:  24,400, lr:2.000e-04> l_forw_fit: 1.6874e+01 l_forw_ce: 1.7691e+00 l_back_rec: 6.2266e+02 
23-09-11 16:25:37.750 - INFO: <epoch:489, iter:  24,500, lr:2.000e-04> l_forw_fit: 2.6306e+01 l_forw_ce: 3.2614e+00 l_back_rec: 8.1468e+02 
23-09-11 16:26:47.252 - INFO: <epoch:491, iter:  24,600, lr:2.000e-04> l_forw_fit: 2.2743e+01 l_forw_ce: 2.8730e+00 l_back_rec: 6.6198e+02 
23-09-11 16:27:55.281 - INFO: <epoch:493, iter:  24,700, lr:2.000e-04> l_forw_fit: 2.2022e+01 l_forw_ce: 5.1522e+03 l_back_rec: 5.8593e+02 
23-09-11 16:29:04.216 - INFO: <epoch:495, iter:  24,800, lr:2.000e-04> l_forw_fit: 2.1125e+01 l_forw_ce: 3.3240e+00 l_back_rec: 6.7275e+02 
23-09-11 16:30:12.514 - INFO: <epoch:497, iter:  24,900, lr:2.000e-04> l_forw_fit: 1.5764e+01 l_forw_ce: 3.6478e+00 l_back_rec: 6.5580e+02 
23-09-11 16:31:22.468 - INFO: <epoch:499, iter:  25,000, lr:2.000e-04> l_forw_fit: 2.1957e+01 l_forw_ce: 1.2598e+00 l_back_rec: 7.8794e+02 
23-09-11 16:31:23.065 - INFO: # Validation # PSNR: 3.6537e+01.
23-09-11 16:31:23.066 - INFO: Saving models and training states.
23-09-11 16:32:37.636 - INFO: <epoch:501, iter:  25,100, lr:2.000e-04> l_forw_fit: 1.5851e+01 l_forw_ce: 1.7870e+01 l_back_rec: 5.9740e+02 
23-09-11 16:33:47.202 - INFO: <epoch:503, iter:  25,200, lr:2.000e-04> l_forw_fit: 1.5613e+01 l_forw_ce: 1.1140e+00 l_back_rec: 5.6695e+02 
23-09-11 16:34:56.355 - INFO: <epoch:505, iter:  25,300, lr:2.000e-04> l_forw_fit: 1.6678e+01 l_forw_ce: 6.1646e+00 l_back_rec: 6.0877e+02 
23-09-11 16:36:08.002 - INFO: <epoch:507, iter:  25,400, lr:2.000e-04> l_forw_fit: 1.0693e+01 l_forw_ce: 5.5600e+00 l_back_rec: 5.3143e+02 
23-09-11 16:37:17.161 - INFO: <epoch:509, iter:  25,500, lr:2.000e-04> l_forw_fit: 1.8000e+01 l_forw_ce: 1.3387e+01 l_back_rec: 7.1700e+02 
23-09-11 16:38:27.538 - INFO: <epoch:511, iter:  25,600, lr:2.000e-04> l_forw_fit: 5.8641e+01 l_forw_ce: 7.0964e+00 l_back_rec: 9.0976e+02 
23-09-11 16:39:35.403 - INFO: <epoch:513, iter:  25,700, lr:2.000e-04> l_forw_fit: 1.5490e+01 l_forw_ce: 9.3917e-01 l_back_rec: 5.7312e+02 
23-09-11 16:40:45.120 - INFO: <epoch:515, iter:  25,800, lr:2.000e-04> l_forw_fit: 8.2399e+00 l_forw_ce: 6.9752e-01 l_back_rec: 4.5810e+02 
23-09-11 16:41:53.473 - INFO: <epoch:517, iter:  25,900, lr:2.000e-04> l_forw_fit: 1.1270e+01 l_forw_ce: 1.1326e+00 l_back_rec: 5.1678e+02 
23-09-11 16:43:02.744 - INFO: <epoch:519, iter:  26,000, lr:2.000e-04> l_forw_fit: 9.7699e+00 l_forw_ce: 7.1436e-01 l_back_rec: 4.6954e+02 
23-09-11 16:44:11.669 - INFO: <epoch:521, iter:  26,100, lr:2.000e-04> l_forw_fit: 2.2217e+01 l_forw_ce: 2.3241e+00 l_back_rec: 6.2240e+02 
23-09-11 16:45:20.023 - INFO: <epoch:523, iter:  26,200, lr:2.000e-04> l_forw_fit: 1.6940e+01 l_forw_ce: 1.0100e+00 l_back_rec: 6.1535e+02 
23-09-11 16:46:27.886 - INFO: <epoch:525, iter:  26,300, lr:2.000e-04> l_forw_fit: 2.4195e+01 l_forw_ce: 2.7161e+00 l_back_rec: 6.4196e+02

The final PSNR is expected to reach 39.7, why do I feel that the convergence was not very good at the beginning

about test execute time

Hi, I have some questions about the average execute time of test.py

#in x2
Set5:0.3974s
Set14:0.5433s
BSDS100:0.3029s
DIV2K:5.8029s
#in x4
Set5:0.1534s
Set14:0.4018s
BSDS100:0.2725s
DIV2K:4.8987s

if the x4 of Set5 execute time is 0.1534s, should the x2 of Set5 execute time is the double time of x4's ?
Thank you for replaying!

	def calculate_ssim(img1, img2):
	'''calculate SSIM
	the same outputs as MATLAB's
	img1, img2: [0, 255]
	'''
	if not img1.shape == img2.shape:
	raise ValueError('Input images must have the same dimensions.')
	if img1.ndim == 2:
	return ssim(img1, img2)
	elif img1.ndim == 3:
	if img1.shape[2] == 3:
	ssims = []
	for i in range(3):
	ssims.append(ssim(img1, img2))
	return np.array(ssims).mean()
	elif img1.shape[2] == 1:
	return ssim(np.squeeze(img1), np.squeeze(img2))
	else:
	raise ValueError('Wrong input image dimensions.')

pkuxmq / invertible-image-rescaling Goto Github PK

invertible-image-rescaling's People

Contributors

Stargazers

Watchers

Forkers

invertible-image-rescaling's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs