GithubHelp home page GithubHelp logo

macuper / marconet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from csxmli2016/marconet

0.0 0.0 0.0 18.86 MB

Learning Generative Structure Prior for Blind Text Image Super-resolution [CVPR 2023]

License: Other

Python 100.00%

marconet's Introduction

Xiaoming Li, Wangmeng Zuo, Chen Change Loy

S-Lab, Nanyang Technological University

Blind text image super-resolution (SR) is challenging as one needs to cope with diverse font styles and unknown degradation. To address the problem, existing methods perform character recognition in parallel to regularize the SR task, either through a loss constraint or intermediate feature condition. Nonetheless, the high-level prior could still fail when encountering severe degradation. The problem is further compounded given characters of complex structures, e.g., Chinese characters that combine multiple pictographic or ideographic symbols into a single character. In this work, we present a novel prior that focuses more on the character structure. In particular, we learn to encapsulate rich and diverse structures in a StyleGAN and exploit such generative structure priors for restoration. To restrict the generative space of StyleGAN so that it obeys the structure of characters yet remains flexible in handling different font styles, we store the discrete features for each character in a codebook. The code subsequently drives the StyleGAN to generate high-resolution structural details to aid text SR. Compared to priors based on character recognition, the proposed structure prior exerts stronger character-specific guidance to restore faithful and precise strokes of a designated character. Extensive experiments on synthetic and real datasets demonstrate the compelling performance of the proposed generative structure prior in facilitating robust text SR.

TODO

  • Release the inference code and model in April.
  • Release all the code before June.

Getting Start

git clone https://github.com/csxmli2016/MARCONet
cd MARCONet
conda create -n marconet python=3.8 -y
conda activate marconet
pip install -r requirements.txt
BASICSR_EXT=True pip install basicsr

Pre-trained Models

Download the pre-trained models from the following URL and put them into ./checkpoints/

python checkpoints/download_google.py
or
python checkpoints/download_github.py (Preferred)

Inference for SR

CUDA_VISIBLE_DEVICES=0 python test_sr.py 
# Parameters:
-i: LR path, default: ./Testsets/LQs
-o: save path, default: None will automatically make the saving dir with the format of '[LR path]_TIME_MARCONet'

Some restoration results on real-world LR text segments (From top to bottom: LR input, bounding box, SR result, and structure prior image)

  

  

More real-world LR Chinese Text Image Super-resolution

Manually correct the text recognition results

Since some characters easily have the wrong predictions when the degradation is severe, here we can manually provide the text labels.

For example, the following LR input with the text label from the transformer encoder:

By manually providing the text labels on the image name (format: '*_开发区雨虹电子有限公司.png'):

CUDA_VISIBLE_DEVICES=0 python test_sr.py -i ./Testsets/LQsWithText -m
# Parameters:
-i: LR path, default: ./Testsets/LQsWithText
-o: save path, default: None will automatically make the saving path with the format of '[LR path]_TIME_MARCONet'
-m: default: store_true, using text label from the LR image name

Then the SR results will be:

The W space controls the font style

CUDA_VISIBLE_DEVICES=0 python test_w.py
#Parameters
-w1: image path for extracting the font style w. Default: './Testsets/TestW/w1.png'
-w2: image path for extracting the font style w2. Default: './Testsets/TestW/w2.png'
-o: save path for the interpolation results. Default: './Testsets/TestW'

GIF for interpolating w predicted from two text images with different styles

GIF for interpolating w from two text images with different characters

GIF for interpolating w from two text images with different locations

License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

Acknowledgement

This project is built based on the excellent BasicSR and KAIR.

Citation

@InProceedings{li2023marconet,
author = {Li, Xiaoming and Zuo, Wangmeng and Loy, Chen Change},
title = {Learning Generative Structure Prior for Blind Text Image Super-resolution},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2023}
}

marconet's People

Contributors

csxmli2016 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.