GithubHelp home page GithubHelp logo

mukeshmk / image-audio-captcha Goto Github PK

View Code? Open in Web Editor NEW
2.0 3.0 4.0 150.42 MB

CNN Based Audio and Image Captcha Breaker Project

Python 100.00%
captcha-images neural-network generating-captchas cnn classifier dataset audio-captcha audio-dataset multi-tts tts-audio-dataset

image-audio-captcha's Introduction

CNN Based Audio and Image Captcha Breaker Project

TODO - to update the readme.md file!

Requirements

Required dependencies: python-captcha, opencv, python-tensorflow (CPU or GPU)

Generating captchas

python generate-audio-captcha.py --length 8 --symbols symbols.txt --count 3200 --output-dir training-images

This generates 3200 audio captchas with 8 characters per captcha, using the set of symbols in the symbols.txt file with the help of gTTS service. The captchas are stored in the folder training-images, which is created if it doesn't exist. The names of the captcha images are scrambled if passed the option.

Without the --scramble option, the name of the image is the captcha text.

To train and validate a neural network, we need two sets of data: a big training set, and a smaller validation set. The network is trained on the training set, and tested on the validation set, so it is very important that there are no audio that are in both sets.

To generate the training data, the "ground truth" classification for each training example audio must be known. This means that for training, the names of the captchas cannot be scrambled, because otherwise the training process has no way to check if the answer from the CNN for some captcha is right or wrong! Make sure not to use the --scramble option when generating the training or validation datasets.

Training the neural network

python train.py --width 128 --height 64 --length 8 --symbols symbols.txt --batch-size 4 --epochs 2 --output-model char8e6bs4 --train-dataset training_data --validate-dataset validation_data

Train the neural network for 2 epochs on the data specified. One epoch is one pass through the full dataset.

The suggested training dataset size for the initial training for captcha length of 4 symbols is 20000 images, with a validation dataset size of 4000 images.

Running the classifier

python classify.py  --model-name char8e6bs4 --captcha-dir test_data/ --output output.txt --symbols symbols.txt

With --model-name test the classifier script will look for a model called test.json with weights test.h5 in the current directory, and load the model up.

The classifier runs all the images in --captcha-dir through the model, and saves the file names and the model's guess at captcha contained in the image in the --output file.

Credits:

Base code taken and modified from: https://gitlab.com/andrewwja/captcha-demo

image-audio-captcha's People

Contributors

mukeshmk avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.