GithubHelp home page GithubHelp logo

ip-superresolution / art_restoration_dm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from naagar/art_restoration_dm

0.0 0.0 0.0 73.95 MB

art restoration using diffusion model.

License: MIT License

Python 100.00%

art_restoration_dm's Introduction

Paper: arxiv.org/abs/2309.13655

Image restoration and super resolution using finetuned Diffusion Model.

AI Art Restoration - Reviving Cultural Heritage

Update

Demo on real-world SR

For more evaluation, please refer to our paper for details.

Demo on 4K Results

  • StableSR is capable of achieving arbitrary upscaling in theory, below is a 8x example with a result beyond 4K (5120x3680). The example image is taken from here.

  • We further directly test StableSR on AIGC and compared with several diffusion-based upscalers following the suggestions. A 4K demo is here, which is a 4x SR on the image from here. More comparisons can be found here.

Dependencies and Installation

  • Pytorch == 1.12.1
  • CUDA == 11.7
  • pytorch-lightning==1.4.2
  • xformers == 0.0.16 (Optional)
  • Other required packages in environment.yaml
# git clone this repository
git clone https://github.com/IceClear/StableSR.git
cd StableSR

# Create a conda environment and activate it
conda env create --file environment.yaml
conda activate stablesr

# Install xformers
conda install xformers -c xformers/label/dev

# Install taming & clip
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip
pip install -e .

Running Examples

Train

Download the pretrained Stable Diffusion models from [HuggingFace]

  • Train Time-aware encoder with SFT: set the ckpt_path in config files (Line 22 and Line 55)
python main.py --train --base configs/stableSRNew/v2-finetune_text_T_512.yaml --gpus GPU_ID, --name NAME --scale_lr False
  • Train CFW: set the ckpt_path in config files (Line 6).

You need to first generate training data using the finetuned diffusion model in the first stage. The data folder should be like this:

CFW_trainingdata/
    └── inputs
          └── 00000001.png # LQ images, (512, 512, 3) (resize to 512x512)
          └── ...
    └── gts
          └── 00000001.png # GT images, (512, 512, 3) (512x512)
          └── ...
    └── latents
          └── 00000001.npy # Latent codes (N, 4, 64, 64) of HR images generated by the diffusion U-net, saved in .npy format.
          └── ...
    └── samples
          └── 00000001.png # The HR images generated from latent codes, just to make sure the generated latents are correct.
          └── ...

Then you can train CFW:

python main.py --train --base configs/autoencoder/autoencoder_kl_64x64x4_resi.yaml --gpus GPU_ID, --name NAME --scale_lr False

Resume

python main.py --train --base configs/stableSRNew/v2-finetune_text_T_512.yaml --gpus GPU_ID, --resume RESUME_PATH --scale_lr False

Test directly

Download the Diffusion and autoencoder pretrained models from [HuggingFace | Google Drive | OneDrive | OpenXLab]. We use the same color correction scheme introduced in paper by default. You may change --colorfix_type wavelet for better color correction. You may also disable color correction by --colorfix_type nofix

  • Test on 128 --> 512: You need at least 10G GPU memory to run this script (batchsize 2 by default)
python scripts/sr_val_ddpm_text_T_vqganfin_old.py --config configs/stableSRNew/v2-finetune_text_T_512.yaml --ckpt CKPT_PATH --vqgan_ckpt VQGANCKPT_PATH --init-img INPUT_PATH --outdir OUT_DIR --ddpm_steps 200 --dec_w 0.5 --colorfix_type adain
  • Test on arbitrary size w/o chop for autoencoder (for results beyond 512): The memory cost depends on your image size, but is usually above 10G.
python scripts/sr_val_ddpm_text_T_vqganfin_oldcanvas.py --config configs/stableSRNew/v2-finetune_text_T_512.yaml --ckpt CKPT_PATH --vqgan_ckpt VQGANCKPT_PATH --init-img INPUT_PATH --outdir OUT_DIR --ddpm_steps 200 --dec_w 0.5 --colorfix_type adain
  • Test on arbitrary size w/ chop for autoencoder: Current default setting needs at least 18G to run, you may reduce the autoencoder tile size by setting --vqgantile_size and --vqgantile_stride. Note the min tile size is 512 and the stride should be smaller than the tile size. A smaller size may introduce more border artifacts.
python scripts/sr_val_ddpm_text_T_vqganfin_oldcanvas_tile.py --config configs/stableSRNew/v2-finetune_text_T_512.yaml --ckpt CKPT_PATH --vqgan_ckpt VQGANCKPT_PATH --init-img INPUT_PATH --outdir OUT_DIR --ddpm_steps 200 --dec_w 0.5 --colorfix_type adain
  • For test on 768 model, you need to set --config configs/stableSRNew/v2-finetune_text_T_768v.yaml, --input_size 768 and --ckpt. You can also adjust --tile_overlap, --vqgantile_size and --vqgantile_stride accordingly. We did not finetune CFW.

Test using Replicate API

import replicate
model = replicate.models.get(<model_name>)
model.predict(input_image=...)

You may see here for more information.

Citation

If our work is useful for your research, please consider citing:

@misc{nagar2023adaptation,
  title={Adaptation of the super resolution SOTA for Art Restoration in camera capture images}, 
  author={Sandeep Nagar},
  year={2023},
  eprint={2309.13655},
  archivePrefix={arXiv},
  primaryClass={cs.CV}

}

Acknowledgement

This project is based on StableSR, stablediffusion, latent-diffusion, SPADE, mixture-of-diffusers and BasicSR. Thanks for their awesome work.

Contact

If you have any questions, please feel free to reach me out at [email protected].

art_restoration_dm's People

Contributors

naagar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.