GithubHelp home page GithubHelp logo

jsyqrt / hwr-address Goto Github PK

View Code? Open in Web Editor NEW
33.0 2.0 19.0 112.31 MB

(Deprecated) deep learning based Chinese handwriting character recognition, 基于深度学习的手写汉字地址识别

License: GNU General Public License v3.0

Python 100.00%
handwritten-text-recognition handwriting-recognition

hwr-address's Introduction

hwr-address !!! This is Deprecated!!!

基于深度学习的手写汉字地址识别

  1. 汉字单字识别

    • 使用 8 层CNN和 2 层全连接识别 3755 个 GB2312 一级汉字 [1], 鉴于3755个类训练时间过长,调整为1056个类(县级以上地址的字符与一级汉字的交集)。

    • 使用 2 层CNN和 1 层全连接识别 15 的中文地址关键字, 省,市,县,区,乡,镇,村,巷,弄,路, 街, 社, 组, 队, 州

    • 使用 HCL2000 中文汉字手写数据库,对每个汉字 包含 700 个训练样本和 300 个测试样本 [2]

  2. 地址树构建

    • 使用 民政部 《2013年中华人民共和国县以下行政区划代码》作为知识库, 构建形如

      
       **
       	上海市
       		浦东新区
       			张江镇
       		黄浦区
       		...
       	...
      
      

      的地址树,用于验证识别的汉字。

  3. 综合识别

    • 输入: 单行图片格式中文手写汉字地址

    • 输出: 识别结果(中文字符串)

  4. 模型,框架,工具

    • python语言

    • tensorflow,keras两种实现,相较之下,keras的封装实现效率更高

    • 目录

    
    /src 			# source code
    /data/ 			# dataset and train result
    	address/	# address tree
    	hcl/		# HCL2000 数据库
    	result/		# train result
    	sample/		# test samples,jpg pictures
    /test
    	test_recognize.py # just run it, it will show you the result.
    /tools/			# some io tools
    
    
  5. 运行

    cd test
    python test_recognize.py
    
    
  6. 参考文献

    1. Zhang X Y, Bengio Y, Liu C L. Online and Offline Handwritten Chinese Character Recognition: A Comprehensive Study and New Benchmark[J]. Pattern Recognition, 2016, 61(61):348-360.

    2. Zhang H, Guo J, Chen G, et al. HCL2000 - A Large-scale Handwritten Chinese Character Database for Handwritten Character Recognition[C]// International Conference on Document Analysis and Recognition. IEEE Computer Society, 2009:286-290.

  7. 工具的安装

    • tensorflow, Ubuntu可以直接通过sudo apt-get install tensorflow安装, GPU版本及cuda的安装参见 这里

    • keras, keras是基于 tensorflow 或 theano 的对深度学习模型的更高一层封装,相对 tensorflow 直接实现效率更高

    # keras依赖库
    sudo apt-get install liblapack-dev gfortran python-scipy cython libhdf5-dev python-h5py
    sudo pip install keras
    
    • 注意,在 keras 中如果以 tensorflow 作为后端,数据输入格式为 (channel, row, col)时,需要修改.keras目录下的json配置文件,将 image_dim_order 改为 'th'(theano模式,即Numpy数组的默认模式)。

hwr-address's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

hwr-address's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.