GithubHelp home page GithubHelp logo

hongxi233 / trietree Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zj463261929/trietree

0.0 0.0 0.0 12.26 MB

该代码是基于字典树对word的识别结果进行矫正,使用于中英文混合的字典。字典树(trietree):常用应用于大量字符串的保存、统计、查找等操作。

Shell 0.86% Python 99.14%

trietree's Introduction

trietree

该代码是基于字典树对word的识别结果进行矫正,使用于中英文混合的字典。字典树(trietree):常用应用于大量字符串的保存、统计、查找等操作。

src:矫正word识别结果

trietree_correct.py是主要代码文件;
矫正word识别结果函数:correct_word("复合", 1, trieTree.trie)
第一个参数是待矫正word;
第二个参数是编辑距离,一般取3,包含3;
第三个参数是根据字典txt文件构建的字典树。

dict.txt等txt文件是含有汉字、英文的字典;每行包含词、词频,用空格隔开;
test.py是测试文件。

备注:也用n元语言模型

_trietree.py 这个是针对汉字、英文的

test_dict_chines.py 测试代码

wordFrequency:统计词频

stopword_path = r'stopwords.dat' :停词,每行存放一个忽略的词,可以是标点符号等。
inputpath = r'words.txt' : 输入,格式是分过词的,每个词用空格分开。
outputpath = r'dict_new.txt' :输出,格式是每行词、词频,用空格隔开,也就是trietree_correct.py需要的字典。

参考文献:

trietree: http://stevehanov.ca/blog/index.php?id=114
中英文统一编码: http://blog.csdn.net/qinbaby/article/details/23201883

trietree's People

Contributors

zj463261929 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.