xinntao / real-esrgan Goto Github PK

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

esrgan pytorch real-esrgan super-resolution image-restoration denoise jpeg-compression amine

real-esrgan's Issues

Can you add the 2x model to the portable version with bin and param?

As the title says, can you add it to it, or do you have any documentation how to convert it?
Would love to try it out because for some reason, i cannot install BasicSR on my system.

Compatibility with old ESRGAN PTH models

Hello, is it possible to convert the old 64 MB ESRGAN PTH models to use with this fast inference solution?

Training time

Hello and thank you for your great work!

You trained ESRNET for 1,000K and ESRGAN for 400K iterations. I was wondering how long did training take in your case with 4 V100 GPU?
I am training with 2 RTX 3090 GPU and training only ESRNET shows 10days 😕 . My training dataset includes FFHQ dataset also (i.e. DIV2K+Flickr2K+FFHQ). Maybe training on FFHQ improves human face result.
Thank you.

license

great work, can you please add a license file

ImportError: cannot import name 'circular_lowpass_kernel' from 'basicsr.data.degradations'

I am trying to train on my own dataset.
encounter this error:
ImportError: cannot import name 'circular_lowpass_kernel' from 'basicsr.data.degradations'

I am using basicsr 1.3.3.3

The usage of _dequeue_and_enqueue function in RealESRNetModel

Hi,
I read the code serveral times, but cannot figure out what the role of _dequeue_and_enqueue function in RealESRNetModel is. This function is only used in feed_data(), which just put self.lq and self.gt into self.queue_lq and self.queue_gt. But I cannot find some other codes to use self.queue_lq and self.queue_gt. I would appreciate it if someone could explain this.

'VCOMP140D.DLL' is required

A straightforward execution of the 'Windows executable files' fails because 'VCOMP140D.DLL' is required.

For me it was necessary to install 'Visual Studio 2019' and load the 'MSVC v142' package to solve this problem.
Putting the 'VCOMP140D.DLL' into the 'Windows executable files' would help other users.

The result look good, thanks for releasing.

Why not use pixel shuffle instead of interpolate('mode='nearest'') for upsampling?

Any special reason?

Error related to File/Video Import Widget

MessageError Traceback (most recent call last)
in ()
14
15 # upload images
---> 16 uploaded = files.upload()
17 for filename in uploaded.keys():
18 dst_path = os.path.join(upload_folder, filename)

2 frames
/usr/local/lib/python3.7/dist-packages/google/colab/_message.py in read_reply_from_input(message_id, timeout_sec)
104 reply.get('colab_msg_id') == message_id):
105 if 'error' in reply:
--> 106 raise MessageError(reply['error'])
107 return reply.get('data', None)
108

MessageError: TypeError: Cannot read property '_uploadFiles' of undefined

Question, how did you guys build the WIndows binary from python

I have also been trying to ship one windows binary that would include all dependencies so people don't have to deal with windows install issues. Any tips on how you guys build the binaries for different environment? Much appreciated.

train on one gpu, windows machine

I am trying to train on one gpu windows machine:

general settings

name: train_RealESRNetx4plus_1000k_B12G4_fromESRGAN
model_type: RealESRNetModel
scale: 4
num_gpu: 1 #4
manual_seed: 0

but when I run:
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 realesrgan/train.py -opt options/train_realesrnet_x4plus.yml --launcher pytorch --auto_resume

I get the following error:
line 625, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " raise RuntimeError("Distributed package doesn't have NCCL "
raise RuntimeError("Distributed package doesn't have NCCL "
RuntimeErrorraise RuntimeError("Distributed package doesn't have NCCL "
RuntimeError:
RuntimeError: Distributed package doesn't have NCCL built inRuntimeError: Distributed package doesn't have NCCL built in
: Distributed package doesn't have NCCL built in
Distributed package doesn't have NCCL built in
.
.
...
line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

D:\SR_v01\Real-ESRGAN\train.py FAILED

Root Cause:
[0]:
time: 2021-08-18_10:28:49
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 13540)
error_file: <N/A>
msg: "Process failed with exitcode 1"

Other Failures:
[1]:
time: 2021-08-18_10:28:49
rank: 1 (local_rank: 1)
exitcode: 1 (pid: 13676)
error_file: <N/A>
msg: "Process failed with exitcode 1"
[2]:
time: 2021-08-18_10:28:49
rank: 2 (local_rank: 2)
exitcode: 1 (pid: 14228)
error_file: <N/A>
msg: "Process failed with exitcode 1"
[3]:
time: 2021-08-18_10:28:49
rank: 3 (local_rank: 3)
exitcode: 1 (pid: 13656)
error_file: <N/A>
msg: "Process failed with exitcode 1"

I tried it in WSL too. same error raised.
could you give us a guide?

realesrgan-ncnn-vulkan: Argument -m does not work unless path contains the string "models"

There's a strange unnecessary limitation in the portable executable:

The -m parameter will not work if you specify a model path that does not contain the word "models". Could this be changed?

Idea: flexible upsampling at various scale(would this works?)

Would it make it sense to train the model to upsampling at various scaling factor(2/4/8) instead of train multiple model on specified scale?

My idea is to split the network into 3 part, 1:image encoder network/mapping network, 2: 2x feature upscaler, 3: network to convert the image to RGB(n, n, 3) filter, While training training the LR Image go through to image encoder network then nScale/2 number of times to feature upscaler network and finally to toRGB network to get the y_pred SR image.

will this idea works?

huggingface spaces port

Hi I am looking to port this to huggingface spaces and getting this error

Collecting basicsr>=1.3.3.10
Downloading basicsr-1.3.3.10.tar.gz (131 kB)
�[91m ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-17oc2oxy/basicsr_ed5642abb2de4ee78654eca90f718aa2/setup.py'"'"'; file='"'"'/tmp/pip-install-17oc2oxy/basicsr_ed5642abb2de4ee78654eca90f718aa2/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-i7jyt39j
cwd: /tmp/pip-install-17oc2oxy/basicsr_ed5642abb2de4ee78654eca90f718aa2/
Complete output (5 lines):
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-17oc2oxy/basicsr_ed5642abb2de4ee78654eca90f718aa2/setup.py", line 8, in
import torch
ModuleNotFoundError: No module named 'torch'

here is the requirements.txt file

https://huggingface.co/spaces/akhaliq/Real-ESRGAN/blob/main/requirements.txt

args.scale for more than 4x in one pass?

I'm using the compiled windows exe, since it plugged into my workflow nicely. Thanks for providing that option. If the python version is better I can migrate to that.

Is there a technical limitation to scaling things more than 4 in a single pass? Right now I'm running images 4x and then 4x again, to get a 16x size increase. It would be nice to do other non-power of two sizes as well, like a 3x or 5x, etc.

-s or --scale doesn't seem to work with anything but 4. I receive "invalid scale argument" on other numbers like 1,2,8, etc.

I made a GUI with Video/Photo/GIF support

Hello again, I am the developer of Waifu2x-Extension-GUI, and I just added RealESRGAN-NCNN-Vulkan support to my GUI in v3.80.01-beta update

So now we can use RealESRGAN-NCNN-Vulkan to process Video/Photo/GIF on a Windows PC with a user friendly GUI

Could you consider add my project page to the README? thanks

https://github.com/AaronFeng753/Waifu2x-Extension-GUI

[Training Issue] ModuleNotFoundError: No module named 'realesrgan'

Hi,
When I follow the training guide to train the model, an error occurred as follows:

Traceback (most recent call last):
  File "realesrgan/train.py", line 5, in <module>
    import realesrgan.archs
ModuleNotFoundError: No module named 'realesrgan'

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/SDD/zzm/env/torch1.7/lib/python3.6/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/SDD/zzm/env/torch1.7/lib/python3.6/site-packages/torch/distributed/launch.py", line 256, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/SDD/zzm/env/torch1.7/bin/python', '-u', 'realesrgan/train.py', '--local_rank=3', '-opt', 'options/train_realesrnet_x2plus.yml', '--launcher', 'pytorch', '--debug']' returned non-zero exit status 1.

Execute command：

python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 realesrgan/train.py -opt options/train_realesrnet_x2plus.yml --launcher pytorch --debug

What should I do to avoid this error?

Multi-gpu inference

How can it be done? i tried using the torch.distributed.launch module as described on training, the inference script doesnt seem to support local_rank, is it a trivial fix?

Results in MSU Video Super Resolution Benchmark

Hello,
MSU Video Group has recently launched Video Super Resolution Benchmark and evaluated this algorithm.

Real-ESRGAN takes 8th place by subjective score, 17th place by PSNR, and 10th by our metric ERQAv1.0. Real-ESRNet takes 18th place by subjective score, 9th place by PSNR, and 18th by our metric ERQAv1.0. ESRGAN takes 7th place by subjective score, 8th place by PSNR, and 3th by our metric ERQAv1.0. You can see the results here.

If you have any other VSR method you want to see in our benchmark, we kindly invite you to participate.
You can submit it for the benchmark, following the submission steps.

weird error message, if the output directory doesn't exist

hi, thanks for this cool tool.
when runnning windows executable against existing input directory (but without existing output one), it fails with the message: invalid outputpath extension type. it's not a big deal to create output directory beforehand manually, but it really took a while to understand first time what was the actual problem.
it would be very helpful either to change that message to something more meaningful, or (better) to add output directory auto-creation.

Inconsistent default value of pre_pad argument

The default value for pre_pad argument is inconsistent in

Real-ESRGAN/inference_realesrgan.py

Line 23 in fb3ff05

 parser.add_argument('--pre_pad', type=int, default=0, help='Pre padding size at each border') 

and

Real-ESRGAN/realesrgan/utils.py

Line 16 in fb3ff05

 def __init__(self, scale, model_path, tile=0, tile_pad=10, pre_pad=10, half=False): 

Which one is preferred over the other? Thank you.

Error regarding circular_lowpass_kernel

Hi, I just updated my git repo to the latest codebase. When I run the inference code, I get the following errors:
python inference_realesrgan.py --scale 4 --model_path experiments/pretrained_models/RealESRGAN_x4plus.pth --input inputs
"
Traceback (most recent call last):
File "C:\msys64\home\manju\tools\esrgan\Real-ESRGAN\inference_realesrgan.py", line 6, in
from realesrgan import RealESRGANer
File "C:\msys64\home\manju\tools\esrgan\Real-ESRGAN\realesrgan_init_.py", line 3, in
from .data import *
File "C:\msys64\home\manju\tools\esrgan\Real-ESRGAN\realesrgan\data_init_.py", line 10, in
dataset_modules = [importlib.import_module(f'realesrgan.data.{file_name}') for file_name in dataset_filenames]
File "C:\msys64\home\manju\tools\esrgan\Real-ESRGAN\realesrgan\data_init.py", line 10, in
dataset_modules = [importlib.import_module(f'realesrgan.data.{file_name}') for file_name in dataset_filenames]
File "C:\Python39\lib\importlib_init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\msys64\home\manju\tools\esrgan\Real-ESRGAN\realesrgan\data\realesrgan_dataset.py", line 9, in
from basicsr.data.degradations import circular_lowpass_kernel, random_mixed_kernels
ImportError: cannot import name 'circular_lowpass_kernel' from 'basicsr.data.degradations' (C:\Python39\lib\site-packages\basicsr\data\degradations.py)
"

Thanks,
Manju

Use of args.scale

Hello! At the inference_realesrgan.py there are the following lines of code:

        if args.scale == 2:
            mod_scale = 2
        elif args.scale == 1:
            mod_scale = 4

For me, it seems like there is a misprint and there should be elif args.scale == 4 as the default args.scale value equals to 4 and there is a zero-padding up to mod_scale multipliers. Is it actually a misprint or I just did not understand your code?

how to use RealESRGAN_x2plus.pth for inference

great job, thx

Will GFPGAN be supported in the portable NCNN builds?

Or does it rely on Pytorch?

Training Problem about GAN

Hi,
Great works! I am reproducing your works. When training real-ESRGAN model, the pictures generated are even more blur than generated by real-ESRNET. Have you ever encountered this problem?

Can I develop a GUI for this?

Hi, I am the developer of Waifu2x-Extension-GUI

I wonder if I can add Real-Esrgan support to my GUI, since your project does not have a license, I think I should ask for your permission first

Waifu2x-Extension-GUI:
https://github.com/AaronFeng753/Waifu2x-Extension-GUI

possibility for a x2 model?

Thank you for your amazing work!

Would you be able to provide us with a x2 model? Or is it current not possible?

About Dataset Preparation

In Training.md

‘For the DF2K dataset, we use a multi-scale strategy, i.e., we downsample HR images to obtain several Ground-Truth images with different scales.

We then crop DF2K images into sub-images for faster IO and processing.

You need to prepare a txt file containing the image paths. The following are some examples in meta_info_DF2Kmultiscale+OST_sub.txt (As different users may have different sub-images partitions, this file is not suitable for your purpose and you need to prepare your own txt file):’

Are 'the downsample HR images' DIV2K_train_LR_bicubic/X2, DIV2K_train_LR_bicubic/X3, DIV2K_train_LR_bicubic/X4?
And how to prepare the sub-images and the txt file?

Thanks!

A question of sinc filter

Hello author, I have a question. SR is a process of denoising and deblurring so that we can add blur and noise to HR image. But the ringing and overshoot artifacts usually appear on the SR result image（HR, not LR）, the method of adding sinc filter on LR is really effective??

result looks strange on text area

result looks strange on text area (like creating a new font), I know its hard to restoration, did you know any project which uses reference image (such as the middle image) to get good result?

Is it suitable for faces? (using high-order degradation modeling process in GFPGAN )

error while using --face_enhance in cpu mode

Error The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1
If you encounter CUDA out of memory, try to set --tile with a smaller number.

Inference with OpenVino

Hi! Is it possible to convert model to the OpenVino format?

denoise level

Is there a way to control the denoise level in the inference, or in the pre-built binaries?

colab

Hi can you please add a google colab for inference

轻量化模型的疑问

您好，非常激动看到在超分领域的又一成就；
想了解是否使用real-esrgan在Mobile端有类似的尝试？包括模型压缩、轻量化？谢谢

Improvment Idea.

I think it would be very useful to add more discriminators, from the tests I have done with conditional GANs, it seems that having several discriminators with different levels of reception fields increases the support of the distributions as well as the stability and quality of the images (maybe can remove the artifacts like the ones on Figure 11 in the paper). It would also be interesting to try a discriminator with MLP Mixer architecture (https://github.com/jaketae/mlp-mixer, https://github.com/sradc/patchless_mlp_mixer) since the paper shows that the "way" that this type of architecture selects the features is different from what a CNN does, so maybe it helps the Generator to not create certain types of artifacts.

Also, I'm not sure, but does the ESRGAN architecture have multiple noise inputs? If not, I also think it would be useful to add noise to each res-block, since more noise usually helps.

Bug of training realsrnet

Hi, Xintao：
When I train realsrnet with the recommended script, in the debug phase, it goes correctly. But when starting training with script --auto_resume, it will lock in l_total.backwrd() without any error log. Have you met this before?

有关于训练的一些疑惑

我看了您的train.md文件,
有些疑惑,为什么要先训练 Real-ESRNet 然后训练 Real-ESRGAN.
其中 Real-ESRGAN中修改判别器为 UNet和SN；优化器改为adam；将原本的L1 loss 增加为 L1loss + vgg19权重为{0.1，0.1，1，1，1}的12345层作为纹理损失 + 加入gan loss。
请问为什么不直接训练 Real-ESRGAN ？
是因为直接训练 Real-ESRGAN 震荡比较厉害难以收敛，所以采用先训练 Real-ESRNet 一次得到已经收敛的模型，再训练一次进行参数调整吗

盲道中 sinc模块做振铃和过冲伪像

因为我像看看sinc 模块最终生成图像的样子,所以我想将sinc 模块提取如下
我想将sinc 模块提出来, 删除了作者的注释,现在如下,可是我cv2.imshow的结果图却是白板,您能帮我瞅瞅我哪儿的处理做错了

def circular_lowpass_kernel(cutoff, kernel_size, pad_to=0):
    assert kernel_size % 2 == 1, 'Kernel size must be an odd number.'
    kernel = np.fromfunction(
        lambda x, y: cutoff * special.j1(cutoff * np.sqrt(
            (x - (kernel_size - 1) / 2) ** 2 + (y - (kernel_size - 1) / 2) ** 2)) / (2 * np.pi * np.sqrt(
            (x - (kernel_size - 1) / 2) ** 2 + (y - (kernel_size - 1) / 2) ** 2)), [kernel_size, kernel_size])
    kernel[(kernel_size - 1) // 2, (kernel_size - 1) // 2] = cutoff ** 2 / (4 * np.pi)
    kernel = kernel / np.sum(kernel)
    if pad_to > kernel_size:
        pad_size = (pad_to - kernel_size) // 2
        kernel = np.pad(kernel, ((pad_size, pad_size), (pad_size, pad_size)))
    return kernel


def filter2D(img, kernel):
    img = torch.FloatTensor(img)
    kernel = torch.FloatTensor(kernel)
    k = kernel.shape[-1]
    assert k % 2 == 1, 'Kernel size must be an odd number.'
    h, w, c = img.shape
    img = img.view(1, c, h, w)
    if k % 2 == 1:
        img = F.pad(img, (k // 2, k // 2, k // 2, k // 2), mode='reflect')  # padding
    else:
        raise ValueError('Wrong kernel size')
    ph, pw = img.size()[-2:]
    device = torch.device('cuda')
    if kernel.size(0) == 1:
        # apply the same kenrel to all batch images
        img = img.view(1 * c, 1, ph, pw).to(device)
        kernel = kernel.view(1, 1, k, k).to(device)
        return F.conv2d(img, kernel, padding=0).view(h, w, c).cpu().numpy().clip(0, 255)
    else:
        img = img.view(1, c, ph, pw).to(device)
        kernel = kernel.view(1, 1, k, k).repeat(1, c, 1, 1).view(c, 1, k, k).to(device)
        return F.conv2d(img, kernel, groups=c).view(h, w, c).cpu().numpy().clip(0, 255)
img = cv2.imread("./1111.jpg")
sinc_kernel_size = random.choice([7, 9, 11, 13, 15, 17, 19, 21])
omega_c = np.random.uniform(np.pi / 3, np.pi)
sinc_kernel = circular_lowpass_kernel(omega_c, sinc_kernel_size, pad_to=21)
cv2.imshow("sinc_kernel", sinc_kernel)
cv2.imshow("img0", img)
res_img = filter2D(img, sinc_kernel)
cv2.imshow("res_img", res_img)
cv2.waitKey(0)

Will this repo work for satellite images as well?

Hi , thankyou for the repo,
I wanted to ask if this repo will super-resolve satellite images too? I mean the ones that are downloaded directly from like say a sentinel2 satellite and then converted to png? let's say I got a cloud free image 10m resolution RGB image

Inference on cpu

Hi! Really great project, but inference on cpu is very slow :(
Any ideas how to speed it up? (probably deploy on openvino or something else)

代码疑惑

  # gan 网络这里 将 HR 数据 传入
  # real
  real_d_pred = self.net_d(gan_gt)
  l_d_real = self.cri_gan(real_d_pred, True, is_disc=True)
  loss_dict['l_d_real'] = l_d_real
  loss_dict['out_d_real'] = torch.mean(real_d_pred.detach())
  l_d_real.backward()
  # fake
  fake_d_pred = self.net_d(self.output.detach().clone())  # clone for pt1.9
  l_d_fake = self.cri_gan(fake_d_pred, False, is_disc=True)
  loss_dict['l_d_fake'] = l_d_fake
  loss_dict['out_d_fake'] = torch.mean(fake_d_pred.detach())
  l_d_fake.backward()
  self.optimizer_d.step()

涛哥,这训练 RealESRGAN第二部分时, LOSS 一块的代码没有看懂,涛哥能否抽空指导指导我,请问这两部分有什么作用

The possibility of using small models

Thank you for sharing this great work! Is it ok to train with a smaller model? The model parameters are 16697987, is it ok to use half of the parameters?

x2 doesn't working at all

But anyway this is the best upscale tool I ever seen before

Training ESRNet

Hi! I'm trying to train the ESRNET in order to train ESRGAN on my own data. I followed the instructions in your Training.md file, but I'm not seeing anything being printed when training. Is it possible that I have an underlying error? Or is there no output shown during training? Thanks!

---EDIT---:
This is what my terminal looks like now:
/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
Please read local_rank from os.environ('LOCAL_RANK') instead.
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : train.py
min_nodes : 1
max_nodes : 1
nproc_per_node : 4
run_id : none
rdzv_backend : static
rdzv_endpoint : 127.0.0.1:4321
rdzv_configs : {'rank': 0, 'timeout': 900}
max_restarts : 3
monitor_interval : 5
log_dir : None
metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future.
warnings.warn(
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=4321
group_rank=0
group_world_size=1
local_ranks=[0, 1, 2, 3]
role_ranks=[0, 1, 2, 3]
global_ranks=[0, 1, 2, 3]
role_world_sizes=[4, 4, 4, 4]
global_world_sizes=[4, 4, 4, 4]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_0/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_0/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_0/3/error.json
Path already exists. Rename it to /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN_archived_20210806_162441
Path already exists. Rename it to /home/rkurinch/Real-ESRGAN/tb_logger/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN_archived_20210806_162441
2021-08-06 16:24:41,364 INFO:
____ _ _____ ____
/ __ ) ____ _ _____ ()____/ / / __
/ __ |/ __ `// // // /_ \ / // /
/ // // // /( )/ // / / // _, _/
// _,///// ___///// ||
______ __ __ __ __
/ / ____ / / / / __ __ _____ / / / /
/ / __ / __ \ / __ \ / __ / / / / / / // // /// / /
/ // // // // // // // / / // // // / / /< //
_/ _/ _/ _/ //_/ __///|| (_)

Version Information:
BasicSR: 1.3.3.10
PyTorch: 1.9.0
TorchVision: 0.10.0
2021-08-06 16:24:41,364 INFO:
name: train_RealESRNetx4plus_1000k_B12G4_fromESRGAN
model_type: RealESRNetModel
scale: 4
num_gpu: 4
manual_seed: 0
gt_usm: True
resize_prob: [0.2, 0.7, 0.1]
resize_range: [0.15, 1.5]
gaussian_noise_prob: 0.5
noise_range: [1, 30]
poisson_scale_range: [0.05, 3]
gray_noise_prob: 0.4
jpeg_range: [30, 95]
second_blur_prob: 0.8
resize_prob2: [0.3, 0.4, 0.3]
resize_range2: [0.3, 1.2]
gaussian_noise_prob2: 0.5
noise_range2: [1, 25]
poisson_scale_range2: [0.05, 2.5]
gray_noise_prob2: 0.4
jpeg_range2: [30, 95]
gt_size: 256
queue_size: 180
datasets:[
train:[
name: SOLAR+OST
type: RealESRGANDataset
dataroot_gt: data/solar/HR
meta_info: data_names.txt
io_backend:[
type: disk
]
blur_kernel_size: 21
kernel_list: ['iso', 'aniso', 'generalized_iso', 'generalized_aniso', 'plateau_iso', 'plateau_aniso']
kernel_prob: [0.45, 0.25, 0.12, 0.03, 0.12, 0.03]
sinc_prob: 0.1
blur_sigma: [0.2, 3]
betag_range: [0.5, 4]
betap_range: [1, 2]
blur_kernel_size2: 21
kernel_list2: ['iso', 'aniso', 'generalized_iso', 'generalized_aniso', 'plateau_iso', 'plateau_aniso']
kernel_prob2: [0.45, 0.25, 0.12, 0.03, 0.12, 0.03]
sinc_prob2: 0.1
blur_sigma2: [0.2, 1.5]
betag_range2: [0.5, 4]
betap_range2: [1, 2]
final_sinc_prob: 0.8
gt_size: 256
use_hflip: True
use_rot: False
use_shuffle: True
num_worker_per_gpu: 5
batch_size_per_gpu: 12
dataset_enlarge_ratio: 1
prefetch_mode: None
phase: train
scale: 4
]
]
network_g:[
type: RRDBNet
num_in_ch: 3
num_out_ch: 3
num_feat: 64
num_block: 23
num_grow_ch: 32
]
path:[
pretrain_network_g: experiments/pretrained_models/ESRGAN_SRx4_DF2KOST_official-ff704c30.pth
param_key_g: params_ema
strict_load_g: True
resume_state: None
experiments_root: /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN
models: /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN/models
training_states: /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN/training_states
log: /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN
visualization: /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN/visualization
]
train:[
ema_decay: 0.999
optim_g:[
type: Adam
lr: 0.0002
weight_decay: 0
betas: [0.9, 0.99]
]
scheduler:[
type: MultiStepLR
milestones: [1000000]
gamma: 0.5
]
total_iter: 1000000
warmup_iter: -1
pixel_opt:[
type: L1Loss
loss_weight: 1.0
reduction: mean
]
]
logger:[
print_freq: 100
save_checkpoint_freq: 5000.0
use_tb_logger: True
wandb:[
project: None
resume_id: None
]
]
dist_params:[
backend: nccl
port: 29500
]
is_train: True
auto_resume: True
dist: True
rank: 0
world_size: 4
root_path: /home/rkurinch/Real-ESRGAN

[libprotobuf FATAL google/protobuf/stubs/common.cc:87] This program was compiled against version 3.9.2 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.17.3). Contact the program author for an update. If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library. (Version verification failed in "bazel-out/k8-opt/bin/tensorflow/core/framework/tensor_shape.pb.cc".)
terminate called after throwing an instance of 'google::protobuf::FatalException'
what(): This program was compiled against version 3.9.2 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.17.3). Contact the program author for an update. If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library. (Version verification failed in "bazel-out/k8-opt/bin/tensorflow/core/framework/tensor_shape.pb.cc".)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 10289) of binary: /anaconda/envs/py38_default/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=1
master_addr=127.0.0.1
master_port=4321
group_rank=0
group_world_size=1
local_ranks=[0, 1, 2, 3]
role_ranks=[0, 1, 2, 3]
global_ranks=[0, 1, 2, 3]
role_world_sizes=[4, 4, 4, 4]
global_world_sizes=[4, 4, 4, 4]

Segmentation fault

ubuntu throw 'Segmentation fault' when formal training...
no such bug when set --debug

After searching for a solution, it was found that
moving 'from torch.utils.tensorboard import SummaryWriter' in basicsr/utils/logger.py, line 82 to line 0
work.

System information:

ubuntu18.04
py 3.7
others for project default

Didn't use Cpu on m1 Mac

I've been using the program for a while, and it goes well. However, I found that it barely use CPU on m1, while the GPU is fully loaded, the Cpu is barely used.
Does it use Cpu on other platform or it only use gpu? I've heard that apple has built-in machine learning units in their m1 chip, maybe we can make use of them in a future update.

有关评价指标

未精读论文,

在论文后面贴了一些超分效果比较好的图片,请问这个方案在开源数据集上测试效果怎么样,比如 PSNR SSIM LPIPS 评估指标咋样

xinntao / real-esrgan Goto Github PK

real-esrgan's Issues

general settings

D:\SR_v01\Real-ESRGAN\train.py FAILED

Root Cause: [0]: time: 2021-08-18_10:28:49 rank: 0 (local_rank: 0) exitcode: 1 (pid: 13540) error_file: <N/A> msg: "Process failed with exitcode 1"

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Root Cause:
[0]:
time: 2021-08-18_10:28:49
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 13540)
error_file: <N/A>
msg: "Process failed with exitcode 1"