GithubHelp home page GithubHelp logo

jdc08161063 / r2cnn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from beacandler/r2cnn

0.0 2.0 0.0 884 KB

caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Dockerfile 0.20% Shell 0.13% Python 26.97% Makefile 0.05% C++ 72.09% Cuda 0.57%

r2cnn's Introduction

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Abstract

This is a caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.

This project is modified from py-R-FCN, and inclined nms and generate rotated box component is imported from EAST project. Thanks for the author's(@zxytim @argman) help. Please cite this paper if you find this useful.

Contents

  1. Abstract
  2. Structor
  3. Installation
  4. Demo
  5. Test
  6. Train
  7. Experiments
  8. Furthermore

Structor

Code structor

.
├── docker-compose.yml
├── docker // docker deps file
├── Dockerfile // docker build file
├── model // model directory
│   ├── caffemodel // trained caffe model
│   ├── icdar15_gt // ICDAR2015 groundtruth
│   ├── prototxt // caffe prototxt file
│   └── imagenet_models // pretrained on imagenet
├── nvidia-docker-compose.yml
├── logs
│   ├── submit // original submit file
│   ├── submit_zip // zip submit file
│   ├── snapshots
│   └── train
│       ├── VGG16.txt.*
│       └── snapshots
├── README.md
├── requirements.txt // python package
├── src
│   ├── cfgs // train config yml
│   ├── data // cache file
│   ├── lib
│   ├── _init_path.py
│   ├── demo.py
│   ├── eval_icdar15.py // eval 2015 icdar dataset F-meaure
│   ├── test_net.py
│   └── train_net.py
├── demo.sh
├── train.sh
├── images // test images
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
└── test.sh // test script

Data structor

It should have this basic structure

ICDARdevkit_Root
.
├── ICDAR2013
├── merge_train.txt  // images list contains ICDAR2013+ICDAR2015 train dataset, then raw data augmentation the same as the paper
├── ICDAR2015
│   ├── augmentation // contains all augmented images
│   └── ImageSets/Main/test.txt // ICDAR2015 test images list

Installation

Install caffe

It is highly recommended to use docker to build environment. More about how to configure docker, see Running with Docker If you are familiar with docker, please run

    1. nvidia-docker-compose run --rm --service-ports rrcnn bash
    2. bash ./demo.sh

If you don't familiar with docker, please follow py-R-FCN to install caffe.

Build

    cd src/lib && make
    

Download Model

  1. please download VGG16 pre-trained model on Imagenet, place it to model/imagenet_models/VGG16.v2.caffemodel.
  2. please download VGG16 trained model by this project, place it model/caffemodel/TextBoxes-v2_iter_12w.caffemodel.

Demo

It is recommended to use UNIX socket to support GUI for docker, plesase open another terminal and type:

    xhost + # may be you need it when open a new terminal
    # docker-compose.yml: mount host  volume : /tmp/.X11-unix to docker volume: /tmp/.X11-unix  
    # pass DISPLAY variable to docker container so host X server can display image in docker
    docker exec -it -e DISPLAY=$DISPLAY ${CURRENT_CONTAINER_ID} bash
    bash ./demo.sh

Test

Single Test

    bash ./test.sh

Multi-scale Test

    # please uncomment two lines in src/cfgs/faster_rcnn_end2end.yml
    SCALES: [720, 1200]
    MULTI_SCALES_NOC: True
    # modify src/lib/datasets/icdar.py to find ICDAR2015 test data, please refer to commit @bbac1cf
    # then run
    bash ./test.sh

Train

Train data

  • Mine: ICDAR2013+ICDAR2015 train dataset, and raw data augmentation, at last got 15977 images.
  • Paper: ICDAR2015 + 2000 focused scene text images they collected.

Train commands

  1. Go to ./src/lib/datasets/icdar.py, modify images path to let train.py find merge_train.txt images list.
  2. Remove cache in src/data/*.pkl or you can load cached roidb data of this project, and place it to src/data/
    # Train for RRCNN4-TextBoxes-v2-OHEM
    bash ./train.sh

note: If you use USE_FLIPPED=True&USE_FLIPPED_QUAD=True, you will get almost 31200 roidb.

Experiments

Mine VS Paper

Approaches Anchor Scales Pooled sizes Inclined NMS Test scales(short side) F-measure(Mine VS paper)
R2CNN-2 (4, 8, 16) (7, 7) Y (720) 71.12% VS 68.49%
R2CNN-3 (4, 8, 16) (7, 7) Y (720) 73.10% VS 74.29%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720) 74.14% VS 74.36%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720, 1200) 79.05% VS 81.80%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720) 74.34% VS 75.34%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720, 1200) 78.70% VS 82.54%

Appendixes

Approaches Anchor Scales aspect ration Pooled sizes Inclined NMS Test scales(short side) F-measure
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720) 74.36%
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720, 1200) VS 81.80%
R2CNN-4-TextBoxes-OHEM (4, 8, 16, 32) (0.5, 1, 2, 3, 5, 7, 10) (7, 7) Y (720) 76.53%

Furthermore

You can try Resnet-50, Resnet-101 and so on.

r2cnn's People

Contributors

beacandler avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.