vztu / maxim-pytorch Goto Github PK

[CVPR 2022 Oral] PyTorch re-implementation for "MAXIM: Multi-Axis MLP for Image Processing", with *training code*. Official Jax repo: https://github.com/google-research/maxim

License: Apache License 2.0

Python 98.27% MATLAB 1.71% Shell 0.03%

architecture computer-vision deblurring dehazing denoising deraining enhancement image image-enhancement image-processing

maxim-pytorch's Introduction

MAXIM: Multi-Axis MLP for Image Processing (CVPR 2022 Oral)

This repo is a PyTorch re-implementation of [CVPR 2022 Oral] paper: "MAXIM: Multi-Axis MLP for Image Processing" by Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li

Google Research, University of Texas at Austin

Disclaimer: This repo is currently working in progress. No timelines are guaranteed.

News

April 12, 2022: Initialize PyTorch repo for MAXIM.
March 29, 2022: The official JAX code and models have been released at [google-research/maxim]
March 29, 2022: MAXIM is selected for an ORAL presentation at CVPR 2022 🎉
March 3, 2022: Paper accepted at CVPR 2022.

Abstract: Recent progress on Transformers and multi-layer perceptron (MLP) models provide new network architectural designs for computer vision tasks. Although these models proved to be effective in many vision tasks such as image recognition, there remain challenges in adapting them for low-level vision. The inflexibility to support high-resolution images and limitations of local attention are perhaps the main bottlenecks. In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. MAXIM uses a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, MAXIM contains two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature conditioning. Both these modules are exclusively based on MLPs, but also benefit from being both global and `fully-convolutional', two properties that are desirable for image processing. Our extensive experimental results show that the proposed MAXIM model achieves state-of-the-art performance on more than ten benchmarks across a range of image processing tasks, including denoising, deblurring, deraining, dehazing, and enhancement while requiring fewer or comparable numbers of parameters and FLOPs than competitive models.

Architecture

Installation

TBD

Results and Pre-trained models

TBD

Demo

Results

Image Denoising (click to expand)

Image Deblurring (click to expand)


Synthetic blur	Realistic blur

Image Deraining (click to expand)


Rain streak	Rain drop

Image Dehazing (click to expand)

Image Enhancement (click to expand)

Citation

Should you find this repository useful, please consider citing:

@article{tu2022maxim,
  title={MAXIM: Multi-Axis MLP for Image Processing},
  author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
  journal={CVPR},
  year={2022},
}

Acknowledgement

This repository is built on Restormer. Our work is also inspired by HiT, MPRNet, and HINet.

maxim-pytorch's People

Contributors

Stargazers

Watchers

maxim-pytorch's Issues

Multiscale MAXIM

Hello,
First of all, thank you so much for sharing the maxim architecture in PyTorch. Unfortunately, the provided code does not work with a num_supervision_scales higher than 1. I have changed the code slightly but I have not been successful to solve the issues yet. For example, in line 1115, we have buttleneckblocks, while according to the paper we need backbone blocks (depth 4 in Table 9 of the paper). Where are these blocks?
According to issue #3, I have downloaded the weights, but the name of layers are different from your code. Therefore, the PyTorch model cannot load the weights properly.
Finally, according to the paper maxim has around 14 million parameters for image enhancement, but the PyTorch model has around 400 million parameters. What is the problem? I used the following code to calculate the number of parameters.
print(sum(p.numel() for p in maxim.parameters() if p.requires_grad))
Can you please let me know when the final version of the model will be available?

Code not finished yet, plz stay tuned. Will close this issue until mostly done

from basicsr.models import create_model in train.py

Hello dear...
is there a create_model in basicsr ?!

When will the pytorch version be released?

Where is the maxim? /maxim-pytorch/basicsr/models/archs/restormer?

Hello？ Is there really a maxim model here? What I found is a restomer model. If you can, please explain this part. Please.

Excuse me, are the GridGatingUnit and other modules not completed?

will the pretrained model be open sourced ?

How to load pretrained Enhancement model in maxim_torch.py

I downloaded weights from maxim repo
Then in maxim-pytorch repo i run jax2torch.py file with

python maxim_pytorch/jax2torch.py -c maxim_ckpt_Enhancement_FiveK_checkpoint.npz
It works and I get torch_weight.pth file

I then try to load it but im unable to understand if Im giving the wrong arguments or your code is wrong

from maxim_pytorch.maxim_torch import MAXIM_dns_3s
import torch
import cv2
import numpy as np
from torchvision import transforms

from pathlib import Path


# These params are from https://github.com/google-research/maxim/blob/3c8265171ffccc80c3c9124844aef0d381609956/maxim/models/maxim.py#L910
s2 = {
    "features": 32,
    "depth": 3,
    "num_stages": 2, #
    "num_groups": 2, # 
    "num_bottleneck_blocks": 2, #
    "block_gmlp_factor": 2,
    "grid_gmlp_factor": 2,
    "input_proj_factor": 2,
    "channels_reduction": 4,
}

model = MAXIM_dns_3s(features=32, depth=3, block_gmlp_factor=2, grid_gmlp_factor=2, input_proj_factor=2, channels_reduction=4, num_supervision_scales=2)
state = torch.load("torch_weight.pth")

model.load_state_dict(state)
model.eval()

I get error:

RuntimeError: Error(s) in loading state_dict for MAXIM_dns_3s:
	Unexpected key(s) in state_dict: "stage_1_output_conv_0.bias", "stage_1_output_conv_0.weight", "stage_1_output_conv_1.bias", "stage_1_output_conv_1.weight", "stage_1_output_conv_2.bias", "stage_1_output_conv_2.weight".

training.yml for LOL enhancement

Thank you for your excellent work about image filed firstly? In this repo， only Deblurring,Denoising,Deraining can be seen . Could you provide image enhancement training.py and .yml code? Looking forward to your reply.