GithubHelp home page GithubHelp logo

hell-to-heaven / kuzushiji-recognition-3 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from statsu1990/kuzushiji-recognition

0.0 0.0 0.0 8.73 MB

The 15th place solution in the Kaggle Kuzushiji Recognition competition

License: MIT License

Python 100.00%

kuzushiji-recognition-3's Introduction

kuzushiji-recognition

The 15th place solution in the Kaggle Kuzushiji Recognition competition
https://www.kaggle.com/c/kuzushiji-recognition

The outline of my solution is as follows.

  • Detect with Centernet (HourglassNet backbone)
  • Classify character classes with Resnet base model The final private leaderboard score was 0.900.

mrc

Image preprocessing

  • to gray scale
  • gaussian filter
  • gamma correction
  • ben's preprocessing

Detection

Inference

Use the two-stage Centernet to detect the bounding box of the character by the following procedure.

  • Step 1: Resize the image to 512x512 and estimate bounding box 1 with the Centernet1.
  • Step 2: Use the bounding box 1 to remove the outside of the outermost bounding box in the image.
  • Step 3: Resize the image to 512x512 and estimate bounding box 2 with the Centernet2.
  • Step 4: Ensemble bounding boxes 1 and 2 to create the final bounding box.

Model Architecture

  • Centernet1 is an ensemble of two Centernets (based on one stack hourglassnet).
  • Centernet2 is an ensemble of two Centernets (based on one stack hourglassnet).

Training

About centernet1, it is as follows.

  • Training data: Use 80% of all data. (Create two models by changing the data division with random numbers.)
  • Data augmentation: horizontal movement, brightness adjustment Data expansion was essential to prevent overlearning.

About centernet2 is as follows.

  • Training data: Use 80% of all data. (Create two models by changing the data division with random numbers)
  • Data augmentation: Random erasing, horizontal movement, brightness adjustment The effect of horizontal data augmentation was weak because the input image was removed outside of the bounding box. Therefore, Random erasing was indispensable.

Classification

Inference

Use the following procedure to classify character labels using three ensemble models of Resnet base.

  • Step 1: Crop text image from original image using estimated bounding box and resize to 64x64.
  • Step 2: Classify text labels with 3 Resnet base models using test time augmentation (9 types of horizontal movement).
  • Step 3: Ensemble the classification results of the three models and estimate the final classification results.

Model Architecture

  • Resnet base1: Log(bounding box aspect ratio) is concatenated at FC layer.
  • Resnet base2: Changed Training data from Resnet base1.
  • Resnet base3: The architecture is the same as Resnet base1. A pseudo-labeled input from the above-mentioned Detection model, Resnet base1 and 2 ensemble models was added to training data.

Training

Each model is the same except that learning data is changed as described above and pseudo-labeling is used.

  • Learning data: Use 80% of all data.
  • Data expansion: horizontal movement, rotation, zoom, Random erasing

Hardware

All models were trained using one GTX 1080 on my home server.

kuzushiji-recognition-3's People

Contributors

statsu1990 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.