GithubHelp home page GithubHelp logo

aaronshan / 12306-captcha Goto Github PK

View Code? Open in Web Editor NEW
279.0 19.0 101.0 581 KB

基于深度学习的12306验证码识别

License: Apache License 2.0

Python 84.33% Shell 6.36% JavaScript 4.81% CSS 0.38% HTML 4.11%
12306 captcha recognizer deep-learning cnn-model

12306-captcha's Introduction

12306-captcha Author License Stars

12306验证码识别

1. 训练

1.1 准备工作

  • 下载caffe并编译, 具体可参考官方文档, 此处不再赘述.
  • 修改src/config.py中的caffe根目录和项目根目录.
  • pip安装easydict, skimage等.

1.2 数据

  • 通过运行src/tools/download_image.py, 会将12306验证码下载至data/download/all目录.
  • 下载完成后, 通过运行src/tools/cut_image.py, 会将其裁剪为图片和文字两部分, 分别放在data/download/image目录和data/download/words目录.
  • 修改src/image/scripts/words.py文件main方法中cut方法的参数(其参数为data/download/words中子目录的words_*中的数字), 它的目的是处理data/download/words中的所有子文件, 对多个词语进行分割并调整大小为固定值.
  • 然后手工对其进行分类, 分别放至data/imagedata/words目录. 可以将其分为两部分,分别放在对应的train和test目录.比如,一个示例目录如下:
    -image
    --test
    ---蜡烛
    ----1-1.jpg
    ---沙漠
    ----2-1.jpg
    --train
    ---蜡烛
    ----1-2.jpg
    ---沙漠
    ----2-2.jpg
    

图片部分

  • 运行src/image/scripts/create_data.py, 将会生成图片部分对应的train.txt和test.txt, 里面包含着训练和测试文件及其类别列表.
  • 运行src/image/scripts/create_lmdb.sh, 将会生成图片部分对应的lmdb文件.

文字部分

  • 运行src/words/scripts/create_data.py, 将会生成文字部分对应的train.txt和test.txt, 里面包含着训练和测试文件及其类别列表.
  • 运行src/words/scripts/create_lmdb.sh, 将会生成文字部分对应的lmdb文件.

1.3 参数

可以根据实际情况对src/image/model/image_solver.prototxtsrc/words/model/words_solver.prototxt文件进行修改.具体修改方法可参考其他模型.

1.4 开始训练

src/image/scripts/image_train.shsrc/image/scripts/image_finetune_train.sh脚本分别用来进行从头训练/微调训练, 训练方法可参考caffe模型训练方法.

同理:

src/words/scripts/words_train.shsrc/words/scripts/words_finetune_train.sh脚本分别用来进行从头训练/微调训练, 训练方法可参考caffe模型训练方法.

测试

src/web提供了一个web测试界面, 运行index.py即可. 运行前, 可以更改对应的模型文件名称. 一个简单示例如下:

web-demo

其他

  1. 在实际应用中, 会使用从百度/搜狗/谷歌等图片搜索引擎中爬取图片并做处理的方式来完成图片分类收集工作. 比如爬取关键词为档案袋的图片, 再进一步做处理. 以解决从12306下载并裁剪-手工分类效率太低及样本量不足的问题, 提升效率。

  2. 此外, 项目里对文字部分的分割也不是很完美. 对图片的分类也是裁剪并逐个进行的, 这样的响应效率不会很高. 可以使用目标检测的方式, 对整个验证码图片做目标检测, 同时检测8个图片及文字部分. 以加快检测速度.

12306-captcha's People

Contributors

aaronshan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

12306-captcha's Issues

测试集与训练集

老哥,你有区分测试集和训练集吗?还是直接全部直接用来训练?

创建模型Check失败是什么原因

hello,
我按照您的步骤训练后,船舰模型的时候不成功报一下错误,想请教下这个是怎么解决法,多谢
I0514 15:33:19.444087 3925 net.cpp:84] Creating Layer vgg-train-data
I0514 15:33:19.444136 3925 net.cpp:380] vgg-train-data -> data
I0514 15:33:19.444213 3925 net.cpp:380] vgg-train-data -> label
F0514 15:33:19.444285 3925 data_transformer.cpp:465] Check failed: datum_channels > 0 (0 vs. 0)
*** Check failure stack trace: ***
@ 0x7fcbcd0c3daa (unknown)
@ 0x7fcbcd0c3ce4 (unknown)
@ 0x7fcbcd0c36e6 (unknown)
@ 0x7fcbcd0c6687 (unknown)
@ 0x7fcbcd5aac4b caffe::DataTransformer<>::InferBlobShape()
@ 0x7fcbcd52bb25 caffe::DataLayer<>::DataLayerSetUp()
@ 0x7fcbcd585d64 caffe::BasePrefetchingDataLayer<>::LayerSetUp()
@ 0x7fcbcd4a3cf5 caffe::Net<>::Init()
@ 0x7fcbcd4a5c02 caffe::Net<>::Net()
@ 0x7fcbcd4af126 caffe::Solver<>::InitTrainNet()
@ 0x7fcbcd4b0183 caffe::Solver<>::Init()
@ 0x7fcbcd4b045f caffe::Solver<>::Solver()
@ 0x7fcbcd487241 caffe::Creator_SGDSolver<>()
@ 0x40d6a9 caffe::SolverRegistry<>::CreateSolver()
@ 0x407f5d train()
@ 0x405c5c main
@ 0x7fcbcc0cbf45 (unknown)
@ 0x4064cb (unknown)
@ (nil) (unknown)

模型测试识别率低

在训练模型的时候accuracy = 0.93, 但是测试的时候成功率特别低,差不多每个分类都是平均的概率,你们有没有遇到过这个情况啊。文字和图像部分的模型时一样的吗

手工分类?

可不可以这样。比如:提示“档案袋”,如果一张图在“档案袋”多次出现,就给这张图加入‘档案袋’标签

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.