GithubHelp home page GithubHelp logo

saicoco / gluon-psenet Goto Github PK

View Code? Open in Web Editor NEW
18.0 2.0 6.0 24.89 MB

mxnet-Gluon implementation of PSENet text detector (Shape Robust Text Detection with Progressive Scale Expansion Network)

License: GNU General Public License v3.0

Python 7.58% Makefile 0.05% C++ 92.36% Objective-C 0.02%
psenet mxnet-gluon text-detection

gluon-psenet's Introduction

Shape Robust Text Detection with Progressive Scale Expansion Network

A reimplement of PSENet with mxnet-gluon. Just train on ICPR.

  • Support TensorboardX
  • Support hybridize to depoly
  • Fast, 45ms/per_image when we resize max_side to 784

Thanks for the author's (@whai362) great work!

Requirements

  • Python 2.7

  • mxnet1.4.0

  • pyclipper

  • Polygon2

  • OpenCV 4+ (for c++ version pse)

  • TensorboardX

Introduction

To reimplement PSENet by Gluon, here are some problem that I occur.

Diceloss about kernels isn't convergence.

  • First, I doubt the label about kernel is not correct. However, I verify them again so that they are absolute right.
  • Second, I doubt the mx.nd.split cannot be backwarded. However the diceloss about score map by split is well. So it cannot be raise this problem.
  • Here the network is based on resnet50, and the output of FPN is input_size/4,so there may not be any text instance in min_kernel_map. So I set the number of kernels to 3

Maybe upsampling output to input_size is a good choice. I will try it in my spare time.

Evaluation

Dataset Recall Precision F1-score Speed
ICPR(max_side=784) 0.56 0.67 0.61 45ms/image

Usage

Pretrained-models

  • gluoncv_model_zoo:resnet50_v1b, you can replace it with others,the default path of pretrained-model in ~/.mxnet/

Also you can download maskrcnn_coco from gluoncv_model_zoo to get a warm start.

Make

cd pse
make

Here I add -Wl,-undefined,dynamic_lookup to avoid some compile error, which is different from original PSENet.

Train

python scripts/train.py $data_path $ckpt
  • data_path: path of dataset, which the prefix of image and annoation must be same, for example, a.jpg, a.txt
  • ckpt: the filename of pretrained-mdel

Loss curve:

image-20190614182216647 image-20190614182249280 image-20190614182313296 image-20190614182326647
Text loss Kernel loss All_loss Pixel_accuracy

Some Results

fusion_TB1vcxDLXXXXXb1XFXXunYpLFXX

Inference

python eval.py $data_path $ckpt $output_dir $gpu_or_cpu

TODO:

  • Upsamping to input_size
  • Train on ICDAR and evaluate

References

gluon-psenet's People

Contributors

saicoco avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gluon-psenet's Issues

pretrained model

@saicoco Hi, thanks for sharing your work
do you have any pre-trained model?
where I can download it?

Question on train data sample

hi,

I am wondering can you provide a train data sample format?
If I want to use my own dataset to fine-tune, I am wondering, when I label the data, should I label the text information on it or simply the BBOX and a class 'text' is fine.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.