GithubHelp home page GithubHelp logo

captcha-player's Introduction

captcha-player

Play with captchas: labeling, training, and evaluating.

Prerequisites

Install tesseract with training tools.

Assume that tesstrain is located in $TESSTRAIN_HOME.

Installation

pip install -r requirements.txt -c constraints.txt

Play

General usage:

Usage: play_captcha.py [OPTIONS] COMMAND [ARGS]...

  Play with CAPTCHA.

Options:
  --class TEXT       which captcha class to play with  [default: base]
  --lang TEXT        use a non-default tesseract language, e.g. eng, eng_best,
                     eng_fast
  --label-root TEXT  labeling data root folder path  [default: data/labeling]
  --help             Show this message and exit.

Commands:
  evaluate  Evaluate trained model through all labeling images.
  label     Crawl and label training data.
  test      Try recognize given image.
  truth     Build ground truth data for tesseract training.

Test / Recognize Single Image

Usage: play_captcha.py test [OPTIONS] IMAGE

  Try recognize given image.

Options:
  --preview / --no-preview  whether preview the image  [default: no-preview]
  --help                    Show this message and exit.

Labeling

Usage: play_captcha.py label [OPTIONS]

  Crawl and label training data.

Options:
  -n, --total INTEGER           number of new images to fetch and label (0 for
                                unlimited)  [default: 10]
  --overwrite / --no-overwrite  whether overwrite existing image for the same
                                captcha  [default: no-overwrite]
  --preview / --no-preview      whether show image automatically  [default:
                                preview]
  --help                        Show this message and exit.

Prepare Training Data

Usage: play_captcha.py truth [OPTIONS]

  Build ground truth data for tesseract training.

  It generates cleaned images and labeled transcripts.

  To list all possible characters appear in captcha, run: $ cat
  $TRAIN_ROOT/*.gt.txt | grep -o . | sort | uniq

Options:
  --train-root TEXT  training data root folder path  [default: data/training]
  --help             Show this message and exit.

When ground truth data is generated, check command output for how to train model.

Evaluating (with all labelled images)

Usage: play_captcha.py evaluate [OPTIONS]

  Evaluate trained model through all labeling images.

Options:
  --help  Show this message and exit.

captcha-player's People

Contributors

calfzhou avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.