GithubHelp home page GithubHelp logo

sergheidinu / card-segmentation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mimicheng/card-segmentation

0.0 0.0 0.0 1.22 MB

The repository contains a U-Net model for semantic segmentation of the documents using pytorch lightning.

Python 100.00%

card-segmentation's Introduction

card_edge_segmentation

The repository contains a model for semantic segmentation of the documents.

Dataset

The U-Net segmentation model is trained on MIDV-500: A Dataset for Identity Documents Analysis and Recognition on Mobile Devices in Video Stream. The model is trained with 50 subtypes of IDs including 17 types of ID cards, 14 types of passports, 13 types of driving licenses, and 6 other identity documents of various countries. Each subtype contains 10 unique photos with 30 different angle, blurriness, closeness, and focus. Total of 15,000 images are being used for our experiment.

Training

python midv500models/train.py -c midv500models/configs/2021-05-14.yaml \ -i <path to train>

Inference

python midv500models/inference.py -c midv500models/configs/2021-05-14.yaml \ -i <path to images> \ -o <path to save preidctions> -w <path to weights>

Example of training images

loss.png

Experiment Setup

Experiment Setup:

Model: Unet with Resnet34 backbone, encoder weights were pretrained on the Imagenet.

Optimizer: AdamW

Initial learning rate: 0.0001

Learning Rate Scheduler: PolyLR, for maximum iteration of 40

Gradient clipping is applied

The model is trained with total of 23 epochs, the model can be further trained as it has not reach overfitting phrase.

Training and Validating batch_size :32

We save the best weight based on the best validation IOU.

Losses: Jaccard loss with binary mode and focal loss.

The model is train on p3.x2large machine with 1 GPU.

Augmentation at training time:

  • Normalized with mean (0.485, 0.456,0.406) and std (0.229,0.224,0.225)
  • HorizontalFlip
  • RandomBrightnessContrast
  • RandomGamma
  • HueSaturationValue
  • Blur
  • JpegCompression
  • RandomRotate90
  • Augmentation at validation time:
  • LongestMaxSize
  • PadIfNeeded
  • Normalized with mean (0.485, 0.456,0.406) and std (0.229,0.224,0.225)
  • At training time, the augmentation has been set to probability of 0.5, while transformation at validating time has been set to probability of 1.

Augmentation at inference time:

  • Normalized with mean (0.485, 0.456,0.406) and std (0.229,0.224,0.225)

Training Curve

loss.png

Inference results

The model is able to generalize well with test images, which are completely different from the training images.

img.png

Reference

https://github.com/ternaus/midv-500-models

card-segmentation's People

Contributors

dependabot[bot] avatar mimicheng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.