xinntao / real-esrgan Goto Github PK
View Code? Open in Web Editor NEWReal-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
License: BSD 3-Clause "New" or "Revised" License
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
License: BSD 3-Clause "New" or "Revised" License
As the title says, can you add it to it, or do you have any documentation how to convert it?
Would love to try it out because for some reason, i cannot install BasicSR on my system.
Hello, is it possible to convert the old 64 MB ESRGAN PTH models to use with this fast inference solution?
Hello and thank you for your great work!
You trained ESRNET for 1,000K and ESRGAN for 400K iterations. I was wondering how long did training take in your case with 4 V100 GPU?
I am training with 2 RTX 3090 GPU and training only ESRNET shows 10days 😕 . My training dataset includes FFHQ dataset also (i.e. DIV2K+Flickr2K+FFHQ). Maybe training on FFHQ improves human face result.
Thank you.
great work, can you please add a license file
I am trying to train on my own dataset.
encounter this error:
ImportError: cannot import name 'circular_lowpass_kernel' from 'basicsr.data.degradations'
I am using basicsr 1.3.3.3
Hi,
I read the code serveral times, but cannot figure out what the role of _dequeue_and_enqueue function in RealESRNetModel is. This function is only used in feed_data(), which just put self.lq and self.gt into self.queue_lq and self.queue_gt. But I cannot find some other codes to use self.queue_lq and self.queue_gt. I would appreciate it if someone could explain this.
A straightforward execution of the 'Windows executable files' fails because 'VCOMP140D.DLL' is required.
For me it was necessary to install 'Visual Studio 2019' and load the 'MSVC v142' package to solve this problem.
Putting the 'VCOMP140D.DLL' into the 'Windows executable files' would help other users.
The result look good, thanks for releasing.
Any special reason?
MessageError Traceback (most recent call last)
in ()
14
15 # upload images
---> 16 uploaded = files.upload()
17 for filename in uploaded.keys():
18 dst_path = os.path.join(upload_folder, filename)
2 frames
/usr/local/lib/python3.7/dist-packages/google/colab/_message.py in read_reply_from_input(message_id, timeout_sec)
104 reply.get('colab_msg_id') == message_id):
105 if 'error' in reply:
--> 106 raise MessageError(reply['error'])
107 return reply.get('data', None)
108
MessageError: TypeError: Cannot read property '_uploadFiles' of undefined
I have also been trying to ship one windows binary that would include all dependencies so people don't have to deal with windows install issues. Any tips on how you guys build the binaries for different environment? Much appreciated.
I am trying to train on one gpu windows machine:
name: train_RealESRNetx4plus_1000k_B12G4_fromESRGAN
model_type: RealESRNetModel
scale: 4
num_gpu: 1 #4
manual_seed: 0
but when I run:
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 realesrgan/train.py -opt options/train_realesrnet_x4plus.yml --launcher pytorch --auto_resume
I get the following error:
line 625, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " raise RuntimeError("Distributed package doesn't have NCCL "
raise RuntimeError("Distributed package doesn't have NCCL "
RuntimeErrorraise RuntimeError("Distributed package doesn't have NCCL "
RuntimeError:
RuntimeError: Distributed package doesn't have NCCL built inRuntimeError: Distributed package doesn't have NCCL built in
: Distributed package doesn't have NCCL built in
Distributed package doesn't have NCCL built in
.
.
...
line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Other Failures:
[1]:
time: 2021-08-18_10:28:49
rank: 1 (local_rank: 1)
exitcode: 1 (pid: 13676)
error_file: <N/A>
msg: "Process failed with exitcode 1"
[2]:
time: 2021-08-18_10:28:49
rank: 2 (local_rank: 2)
exitcode: 1 (pid: 14228)
error_file: <N/A>
msg: "Process failed with exitcode 1"
[3]:
time: 2021-08-18_10:28:49
rank: 3 (local_rank: 3)
exitcode: 1 (pid: 13656)
error_file: <N/A>
msg: "Process failed with exitcode 1"
I tried it in WSL too. same error raised.
could you give us a guide?
There's a strange unnecessary limitation in the portable executable:
The -m
parameter will not work if you specify a model path that does not contain the word "models". Could this be changed?
Would it make it sense to train the model to upsampling at various scaling factor(2/4/8) instead of train multiple model on specified scale?
My idea is to split the network into 3 part, 1:image encoder network/mapping network, 2: 2x feature upscaler, 3: network to convert the image to RGB(n, n, 3) filter, While training training the LR Image go through to image encoder network then nScale/2 number of times to feature upscaler network and finally to toRGB network to get the y_pred SR image.
will this idea works?
Hi I am looking to port this to huggingface spaces and getting this error
Collecting basicsr>=1.3.3.10
Downloading basicsr-1.3.3.10.tar.gz (131 kB)
�[91m ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-17oc2oxy/basicsr_ed5642abb2de4ee78654eca90f718aa2/setup.py'"'"'; file='"'"'/tmp/pip-install-17oc2oxy/basicsr_ed5642abb2de4ee78654eca90f718aa2/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-i7jyt39j
cwd: /tmp/pip-install-17oc2oxy/basicsr_ed5642abb2de4ee78654eca90f718aa2/
Complete output (5 lines):
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-17oc2oxy/basicsr_ed5642abb2de4ee78654eca90f718aa2/setup.py", line 8, in
import torch
ModuleNotFoundError: No module named 'torch'
here is the requirements.txt file
https://huggingface.co/spaces/akhaliq/Real-ESRGAN/blob/main/requirements.txt
I'm using the compiled windows exe, since it plugged into my workflow nicely. Thanks for providing that option. If the python version is better I can migrate to that.
Is there a technical limitation to scaling things more than 4 in a single pass? Right now I'm running images 4x and then 4x again, to get a 16x size increase. It would be nice to do other non-power of two sizes as well, like a 3x or 5x, etc.
-s or --scale doesn't seem to work with anything but 4. I receive "invalid scale argument" on other numbers like 1,2,8, etc.
Hello again, I am the developer of Waifu2x-Extension-GUI, and I just added RealESRGAN-NCNN-Vulkan support to my GUI in v3.80.01-beta update
So now we can use RealESRGAN-NCNN-Vulkan to process Video/Photo/GIF on a Windows PC with a user friendly GUI
Could you consider add my project page to the README? thanks
Hi,
When I follow the training guide to train the model, an error occurred as follows:
Traceback (most recent call last):
File "realesrgan/train.py", line 5, in <module>
import realesrgan.archs
ModuleNotFoundError: No module named 'realesrgan'
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/SDD/zzm/env/torch1.7/lib/python3.6/site-packages/torch/distributed/launch.py", line 260, in <module>
main()
File "/SDD/zzm/env/torch1.7/lib/python3.6/site-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/SDD/zzm/env/torch1.7/bin/python', '-u', 'realesrgan/train.py', '--local_rank=3', '-opt', 'options/train_realesrnet_x2plus.yml', '--launcher', 'pytorch', '--debug']' returned non-zero exit status 1.
Execute command:
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 realesrgan/train.py -opt options/train_realesrnet_x2plus.yml --launcher pytorch --debug
What should I do to avoid this error?
How can it be done? i tried using the torch.distributed.launch module as described on training, the inference script doesnt seem to support local_rank, is it a trivial fix?
Hello,
MSU Video Group has recently launched Video Super Resolution Benchmark and evaluated this algorithm.
Real-ESRGAN takes 8th place by subjective score, 17th place by PSNR, and 10th by our metric ERQAv1.0. Real-ESRNet takes 18th place by subjective score, 9th place by PSNR, and 18th by our metric ERQAv1.0. ESRGAN takes 7th place by subjective score, 8th place by PSNR, and 3th by our metric ERQAv1.0. You can see the results here.
If you have any other VSR method you want to see in our benchmark, we kindly invite you to participate.
You can submit it for the benchmark, following the submission steps.
hi, thanks for this cool tool.
when runnning windows executable against existing input directory (but without existing output one), it fails with the message: invalid outputpath extension type
. it's not a big deal to create output directory beforehand manually, but it really took a while to understand first time what was the actual problem.
it would be very helpful either to change that message to something more meaningful, or (better) to add output directory auto-creation.
The default value for pre_pad
argument is inconsistent in
Real-ESRGAN/inference_realesrgan.py
Line 23 in fb3ff05
Real-ESRGAN/realesrgan/utils.py
Line 16 in fb3ff05
Which one is preferred over the other? Thank you.
Hi, I just updated my git repo to the latest codebase. When I run the inference code, I get the following errors:
python inference_realesrgan.py --scale 4 --model_path experiments/pretrained_models/RealESRGAN_x4plus.pth --input inputs
"
Traceback (most recent call last):
File "C:\msys64\home\manju\tools\esrgan\Real-ESRGAN\inference_realesrgan.py", line 6, in
from realesrgan import RealESRGANer
File "C:\msys64\home\manju\tools\esrgan\Real-ESRGAN\realesrgan_init_.py", line 3, in
from .data import *
File "C:\msys64\home\manju\tools\esrgan\Real-ESRGAN\realesrgan\data_init_.py", line 10, in
dataset_modules = [importlib.import_module(f'realesrgan.data.{file_name}') for file_name in dataset_filenames]
File "C:\msys64\home\manju\tools\esrgan\Real-ESRGAN\realesrgan\data_init.py", line 10, in
dataset_modules = [importlib.import_module(f'realesrgan.data.{file_name}') for file_name in dataset_filenames]
File "C:\Python39\lib\importlib_init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\msys64\home\manju\tools\esrgan\Real-ESRGAN\realesrgan\data\realesrgan_dataset.py", line 9, in
from basicsr.data.degradations import circular_lowpass_kernel, random_mixed_kernels
ImportError: cannot import name 'circular_lowpass_kernel' from 'basicsr.data.degradations' (C:\Python39\lib\site-packages\basicsr\data\degradations.py)
"
Thanks,
Manju
Hello! At the inference_realesrgan.py
there are the following lines of code:
if args.scale == 2:
mod_scale = 2
elif args.scale == 1:
mod_scale = 4
For me, it seems like there is a misprint and there should be elif args.scale == 4
as the default args.scale
value equals to 4 and there is a zero-padding up to mod_scale
multipliers. Is it actually a misprint or I just did not understand your code?
great job, thx
Or does it rely on Pytorch?
Hi,
Great works! I am reproducing your works. When training real-ESRGAN model, the pictures generated are even more blur than generated by real-ESRNET. Have you ever encountered this problem?
Hi, I am the developer of Waifu2x-Extension-GUI
I wonder if I can add Real-Esrgan support to my GUI, since your project does not have a license, I think I should ask for your permission first
Waifu2x-Extension-GUI:
https://github.com/AaronFeng753/Waifu2x-Extension-GUI
Thank you for your amazing work!
Would you be able to provide us with a x2 model? Or is it current not possible?
In Training.md
‘For the DF2K dataset, we use a multi-scale strategy, i.e., we downsample HR images to obtain several Ground-Truth images with different scales.
We then crop DF2K images into sub-images for faster IO and processing.
You need to prepare a txt file containing the image paths. The following are some examples in meta_info_DF2Kmultiscale+OST_sub.txt (As different users may have different sub-images partitions, this file is not suitable for your purpose and you need to prepare your own txt file):’
Are 'the downsample HR images' DIV2K_train_LR_bicubic/X2, DIV2K_train_LR_bicubic/X3, DIV2K_train_LR_bicubic/X4?
And how to prepare the sub-images and the txt file?
Thanks!
Hello author, I have a question. SR is a process of denoising and deblurring so that we can add blur and noise to HR image. But the ringing and overshoot artifacts usually appear on the SR result image(HR, not LR), the method of adding sinc filter on LR is really effective??
Error The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1
If you encounter CUDA out of memory, try to set --tile with a smaller number.
Hi! Is it possible to convert model to the OpenVino format?
Is there a way to control the denoise level in the inference, or in the pre-built binaries?
Hi can you please add a google colab for inference
您好,非常激动看到在超分领域的又一成就;
想了解是否使用real-esrgan在Mobile端有类似的尝试?包括模型压缩、轻量化?谢谢
I think it would be very useful to add more discriminators, from the tests I have done with conditional GANs, it seems that having several discriminators with different levels of reception fields increases the support of the distributions as well as the stability and quality of the images (maybe can remove the artifacts like the ones on Figure 11 in the paper). It would also be interesting to try a discriminator with MLP Mixer architecture (https://github.com/jaketae/mlp-mixer, https://github.com/sradc/patchless_mlp_mixer) since the paper shows that the "way" that this type of architecture selects the features is different from what a CNN does, so maybe it helps the Generator to not create certain types of artifacts.
Also, I'm not sure, but does the ESRGAN architecture have multiple noise inputs? If not, I also think it would be useful to add noise to each res-block, since more noise usually helps.
Hi, Xintao:
When I train realsrnet with the recommended script, in the debug phase, it goes correctly. But when starting training with script --auto_resume, it will lock in l_total.backwrd() without any error log. Have you met this before?
我看了您的train.md文件,
有些疑惑,为什么要先训练 Real-ESRNet 然后 训练 Real-ESRGAN.
其中 Real-ESRGAN中修改判别器为 UNet和SN;优化器改为adam;将原本的L1 loss 增加为 L1loss + vgg19权重为{0.1,0.1,1,1,1}的12345层作为纹理损失 + 加入gan loss。
请问为什么不直接训练 Real-ESRGAN ?
是因为直接训练 Real-ESRGAN 震荡比较厉害难以收敛,所以采用 先训练 Real-ESRNet 一次得到 已经收敛的模型,再训练一次进行参数调整吗
因为我像看看sinc 模块最终生成图像的样子,所以我想将sinc 模块提取如下
我想将sinc 模块提出来, 删除了作者的注释,现在如下,可是我cv2.imshow的结果图却是白板,您能帮我瞅瞅我哪儿的处理做错了
def circular_lowpass_kernel(cutoff, kernel_size, pad_to=0):
assert kernel_size % 2 == 1, 'Kernel size must be an odd number.'
kernel = np.fromfunction(
lambda x, y: cutoff * special.j1(cutoff * np.sqrt(
(x - (kernel_size - 1) / 2) ** 2 + (y - (kernel_size - 1) / 2) ** 2)) / (2 * np.pi * np.sqrt(
(x - (kernel_size - 1) / 2) ** 2 + (y - (kernel_size - 1) / 2) ** 2)), [kernel_size, kernel_size])
kernel[(kernel_size - 1) // 2, (kernel_size - 1) // 2] = cutoff ** 2 / (4 * np.pi)
kernel = kernel / np.sum(kernel)
if pad_to > kernel_size:
pad_size = (pad_to - kernel_size) // 2
kernel = np.pad(kernel, ((pad_size, pad_size), (pad_size, pad_size)))
return kernel
def filter2D(img, kernel):
img = torch.FloatTensor(img)
kernel = torch.FloatTensor(kernel)
k = kernel.shape[-1]
assert k % 2 == 1, 'Kernel size must be an odd number.'
h, w, c = img.shape
img = img.view(1, c, h, w)
if k % 2 == 1:
img = F.pad(img, (k // 2, k // 2, k // 2, k // 2), mode='reflect') # padding
else:
raise ValueError('Wrong kernel size')
ph, pw = img.size()[-2:]
device = torch.device('cuda')
if kernel.size(0) == 1:
# apply the same kenrel to all batch images
img = img.view(1 * c, 1, ph, pw).to(device)
kernel = kernel.view(1, 1, k, k).to(device)
return F.conv2d(img, kernel, padding=0).view(h, w, c).cpu().numpy().clip(0, 255)
else:
img = img.view(1, c, ph, pw).to(device)
kernel = kernel.view(1, 1, k, k).repeat(1, c, 1, 1).view(c, 1, k, k).to(device)
return F.conv2d(img, kernel, groups=c).view(h, w, c).cpu().numpy().clip(0, 255)
img = cv2.imread("./1111.jpg")
sinc_kernel_size = random.choice([7, 9, 11, 13, 15, 17, 19, 21])
omega_c = np.random.uniform(np.pi / 3, np.pi)
sinc_kernel = circular_lowpass_kernel(omega_c, sinc_kernel_size, pad_to=21)
cv2.imshow("sinc_kernel", sinc_kernel)
cv2.imshow("img0", img)
res_img = filter2D(img, sinc_kernel)
cv2.imshow("res_img", res_img)
cv2.waitKey(0)
Hi , thankyou for the repo,
I wanted to ask if this repo will super-resolve satellite images too? I mean the ones that are downloaded directly from like say a sentinel2 satellite and then converted to png? let's say I got a cloud free image 10m resolution RGB image
Hi! Really great project, but inference on cpu is very slow :(
Any ideas how to speed it up? (probably deploy on openvino or something else)
# gan 网络这里 将 HR 数据 传入
# real
real_d_pred = self.net_d(gan_gt)
l_d_real = self.cri_gan(real_d_pred, True, is_disc=True)
loss_dict['l_d_real'] = l_d_real
loss_dict['out_d_real'] = torch.mean(real_d_pred.detach())
l_d_real.backward()
# fake
fake_d_pred = self.net_d(self.output.detach().clone()) # clone for pt1.9
l_d_fake = self.cri_gan(fake_d_pred, False, is_disc=True)
loss_dict['l_d_fake'] = l_d_fake
loss_dict['out_d_fake'] = torch.mean(fake_d_pred.detach())
l_d_fake.backward()
self.optimizer_d.step()
涛哥,这训练 RealESRGAN第二部分时, LOSS 一块的代码没有看懂,涛哥能否抽空指导指导我,请问这两部分有什么作用
Thank you for sharing this great work! Is it ok to train with a smaller model? The model parameters are 16697987, is it ok to use half of the parameters?
Hi! I'm trying to train the ESRNET in order to train ESRGAN on my own data. I followed the instructions in your Training.md file, but I'm not seeing anything being printed when training. Is it possible that I have an underlying error? Or is there no output shown during training? Thanks!
---EDIT---:
This is what my terminal looks like now:
/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
Please read local_rank from os.environ('LOCAL_RANK')
instead.
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : train.py
min_nodes : 1
max_nodes : 1
nproc_per_node : 4
run_id : none
rdzv_backend : static
rdzv_endpoint : 127.0.0.1:4321
rdzv_configs : {'rank': 0, 'timeout': 900}
max_restarts : 3
monitor_interval : 5
log_dir : None
metrics_cfg : {}
INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/anaconda/envs/py38_default/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future.
warnings.warn(
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=4321
group_rank=0
group_world_size=1
local_ranks=[0, 1, 2, 3]
role_ranks=[0, 1, 2, 3]
global_ranks=[0, 1, 2, 3]
role_world_sizes=[4, 4, 4, 4]
global_world_sizes=[4, 4, 4, 4]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_0/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_0/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_0/3/error.json
Path already exists. Rename it to /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN_archived_20210806_162441
Path already exists. Rename it to /home/rkurinch/Real-ESRGAN/tb_logger/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN_archived_20210806_162441
2021-08-06 16:24:41,364 INFO:
____ _ _____ ____
/ __ ) ____ _ _____ ()____/ / / __
/ __ |/ __ `// // // /_ \ / // /
/ // // // /( )/ // / / // _, _/
// _,///// ___///// ||
______ __ __ __ __
/ / ____ / / / / __ __ _____ / / / /
/ / __ / __ \ / __ \ / __ / / / / / / // // /// / /
/ // // // // // // // / / // // // / / /< //
_/ _/ _/ _/ //_/ __///|| (_)
Version Information:
BasicSR: 1.3.3.10
PyTorch: 1.9.0
TorchVision: 0.10.0
2021-08-06 16:24:41,364 INFO:
name: train_RealESRNetx4plus_1000k_B12G4_fromESRGAN
model_type: RealESRNetModel
scale: 4
num_gpu: 4
manual_seed: 0
gt_usm: True
resize_prob: [0.2, 0.7, 0.1]
resize_range: [0.15, 1.5]
gaussian_noise_prob: 0.5
noise_range: [1, 30]
poisson_scale_range: [0.05, 3]
gray_noise_prob: 0.4
jpeg_range: [30, 95]
second_blur_prob: 0.8
resize_prob2: [0.3, 0.4, 0.3]
resize_range2: [0.3, 1.2]
gaussian_noise_prob2: 0.5
noise_range2: [1, 25]
poisson_scale_range2: [0.05, 2.5]
gray_noise_prob2: 0.4
jpeg_range2: [30, 95]
gt_size: 256
queue_size: 180
datasets:[
train:[
name: SOLAR+OST
type: RealESRGANDataset
dataroot_gt: data/solar/HR
meta_info: data_names.txt
io_backend:[
type: disk
]
blur_kernel_size: 21
kernel_list: ['iso', 'aniso', 'generalized_iso', 'generalized_aniso', 'plateau_iso', 'plateau_aniso']
kernel_prob: [0.45, 0.25, 0.12, 0.03, 0.12, 0.03]
sinc_prob: 0.1
blur_sigma: [0.2, 3]
betag_range: [0.5, 4]
betap_range: [1, 2]
blur_kernel_size2: 21
kernel_list2: ['iso', 'aniso', 'generalized_iso', 'generalized_aniso', 'plateau_iso', 'plateau_aniso']
kernel_prob2: [0.45, 0.25, 0.12, 0.03, 0.12, 0.03]
sinc_prob2: 0.1
blur_sigma2: [0.2, 1.5]
betag_range2: [0.5, 4]
betap_range2: [1, 2]
final_sinc_prob: 0.8
gt_size: 256
use_hflip: True
use_rot: False
use_shuffle: True
num_worker_per_gpu: 5
batch_size_per_gpu: 12
dataset_enlarge_ratio: 1
prefetch_mode: None
phase: train
scale: 4
]
]
network_g:[
type: RRDBNet
num_in_ch: 3
num_out_ch: 3
num_feat: 64
num_block: 23
num_grow_ch: 32
]
path:[
pretrain_network_g: experiments/pretrained_models/ESRGAN_SRx4_DF2KOST_official-ff704c30.pth
param_key_g: params_ema
strict_load_g: True
resume_state: None
experiments_root: /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN
models: /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN/models
training_states: /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN/training_states
log: /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN
visualization: /home/rkurinch/Real-ESRGAN/experiments/train_RealESRNetx4plus_1000k_B12G4_fromESRGAN/visualization
]
train:[
ema_decay: 0.999
optim_g:[
type: Adam
lr: 0.0002
weight_decay: 0
betas: [0.9, 0.99]
]
scheduler:[
type: MultiStepLR
milestones: [1000000]
gamma: 0.5
]
total_iter: 1000000
warmup_iter: -1
pixel_opt:[
type: L1Loss
loss_weight: 1.0
reduction: mean
]
]
logger:[
print_freq: 100
save_checkpoint_freq: 5000.0
use_tb_logger: True
wandb:[
project: None
resume_id: None
]
]
dist_params:[
backend: nccl
port: 29500
]
is_train: True
auto_resume: True
dist: True
rank: 0
world_size: 4
root_path: /home/rkurinch/Real-ESRGAN
[libprotobuf FATAL google/protobuf/stubs/common.cc:87] This program was compiled against version 3.9.2 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.17.3). Contact the program author for an update. If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library. (Version verification failed in "bazel-out/k8-opt/bin/tensorflow/core/framework/tensor_shape.pb.cc".)
terminate called after throwing an instance of 'google::protobuf::FatalException'
what(): This program was compiled against version 3.9.2 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.17.3). Contact the program author for an update. If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library. (Version verification failed in "bazel-out/k8-opt/bin/tensorflow/core/framework/tensor_shape.pb.cc".)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 10289) of binary: /anaconda/envs/py38_default/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=1
master_addr=127.0.0.1
master_port=4321
group_rank=0
group_world_size=1
local_ranks=[0, 1, 2, 3]
role_ranks=[0, 1, 2, 3]
global_ranks=[0, 1, 2, 3]
role_world_sizes=[4, 4, 4, 4]
global_world_sizes=[4, 4, 4, 4]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_1/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_1/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_1/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_unoq8ytd/none_dtwpfqb8/attempt_1/3/error.json
ubuntu throw 'Segmentation fault' when formal training...
no such bug when set --debug
After searching for a solution, it was found that
moving 'from torch.utils.tensorboard import SummaryWriter' in basicsr/utils/logger.py, line 82 to line 0
work.
System information:
I've been using the program for a while, and it goes well. However, I found that it barely use CPU on m1, while the GPU is fully loaded, the Cpu is barely used.
Does it use Cpu on other platform or it only use gpu? I've heard that apple has built-in machine learning units in their m1 chip, maybe we can make use of them in a future update.
未精读论文,
在论文后面贴了一些超分效果比较好的图片,请问这个方案在开源数据集上测试效果怎么样,比如 PSNR SSIM LPIPS 评估指标咋样
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.