kuzushiji-recognition

The 15th place solution in the Kaggle Kuzushiji Recognition competition
https://www.kaggle.com/c/kuzushiji-recognition

The outline of my solution is as follows.

Detect with Centernet (HourglassNet backbone)
Classify character classes with Resnet base model The final private leaderboard score was 0.900.

Image preprocessing

to gray scale
gaussian filter
gamma correction
ben's preprocessing

Detection

Inference

Use the two-stage Centernet to detect the bounding box of the character by the following procedure.

Step 1: Resize the image to 512x512 and estimate bounding box 1 with the Centernet1.
Step 2: Use the bounding box 1 to remove the outside of the outermost bounding box in the image.
Step 3: Resize the image to 512x512 and estimate bounding box 2 with the Centernet2.
Step 4: Ensemble bounding boxes 1 and 2 to create the final bounding box.

Model Architecture

Centernet1 is an ensemble of two Centernets (based on one stack hourglassnet).
Centernet2 is an ensemble of two Centernets (based on one stack hourglassnet).

Training

About centernet1, it is as follows.

Training data: Use 80% of all data. (Create two models by changing the data division with random numbers.)
Data augmentation: horizontal movement, brightness adjustment Data expansion was essential to prevent overlearning.

About centernet2 is as follows.

Training data: Use 80% of all data. (Create two models by changing the data division with random numbers)
Data augmentation: Random erasing, horizontal movement, brightness adjustment The effect of horizontal data augmentation was weak because the input image was removed outside of the bounding box. Therefore, Random erasing was indispensable.

Classification

Inference

Use the following procedure to classify character labels using three ensemble models of Resnet base.

Step 1: Crop text image from original image using estimated bounding box and resize to 64x64.
Step 2: Classify text labels with 3 Resnet base models using test time augmentation (9 types of horizontal movement).
Step 3: Ensemble the classification results of the three models and estimate the final classification results.

Model Architecture

Resnet base1: Log(bounding box aspect ratio) is concatenated at FC layer.
Resnet base2: Changed Training data from Resnet base1.
Resnet base3: The architecture is the same as Resnet base1. A pseudo-labeled input from the above-mentioned Detection model, Resnet base1 and 2 ensemble models was added to training data.

Training

Each model is the same except that learning data is changed as described above and pseudo-labeling is used.

Learning data: Use 80% of all data.
Data expansion: horizontal movement, rotation, zoom, Random erasing

Hardware

All models were trained using one GTX 1080 on my home server.

hell-to-heaven / kuzushiji-recognition-3 Goto Github PK

kuzushiji-recognition-3's Introduction

kuzushiji-recognition

Image preprocessing

Detection

Inference

Model Architecture

Training

Classification

Inference

Model Architecture

Training

Hardware

kuzushiji-recognition-3's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs