GithubHelp home page GithubHelp logo

easy_textcnn_rnn's Introduction

Easy_TextCnn_Rnn

tensorflow TxetCnn TextRNN 预训练词向量 分别用Textcnn/Textrnn对中文文本分类

本文博客地址:

Text_Cnn

Text_Rnn

数据集:

本实验是使用THUCNews的一个子集进行训练与测试,数据集请自行到THUCTC:一个高效的中文文本分类工具包下载,请遵循数据提供方的开源协议;

文本类别涉及10个类别:categories = ['体育', '财经', '房产', '家居', '教育', '科技', '时尚', '时政', '游戏', '娱乐'],每个分类6500条数据;

cnews.train.txt: 训练集(5000*10)

cnews.val.txt: 验证集(500*10)

cnews.test.txt: 测试集(1000*10)

训练所用的数据,以及训练好的词向量可以下载:链接: https://pan.baidu.com/s/1daGvDO4UBE5NVrcLaCGeqA 提取码: 9x3i

1.利用TextCnn 进行文本分类

模型参数

parameters.py

预处理

预训练词向量进行embedding

对句子分词,去标点符号

去停用词

文字转数字

padding等

程序在data_processing.py

运行步骤

Training.py

train and test result

predict.py 模型用来对验证文本进行预测

evalutaing result

验证结果表明,5000条文本准确率达96.58%,取前10条语句的测试结果与原标签对比。

网络结构与本文博客图片基本一致

2.利用RNN进行文本分类

1.利用双层RNN进行文本分类

模型参数

parameters_rnn.py

预处理

预训练词向量进行embedding

对句子分词,去标点符号

去停用词

文字转数字

padding

计算每个batch中句子真实长度等

程序在data_processing_rnn.py

运行步骤

Training.py

train and test result

predict.py 模型用来对验证文本进行预测

evalutaing result

验证结果表明,5000条文本准确率达96.7%,取前10条语句的测试结果与原标签对比。

参考

1.Convolutional Neural Networks for Sentence Classification

2.https://github.com/cjymz886/text-cnn

3.http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow

easy_textcnn_rnn's People

Contributors

x-jun-0130 avatar

Stargazers

 avatar Jemma avatar Brandon Stark avatar  avatar  avatar  avatar  avatar Filbert avatar  avatar 小黑黑讲AI avatar  avatar Toughq avatar 郑玉莹 avatar Kaizan.wyl avatar jinxu avatar Astrosolar avatar  avatar  avatar  avatar  avatar baibao avatar Aries avatar Gloria avatar  avatar  avatar  avatar kafka0102 avatar  avatar  avatar Jean Lee avatar  avatar  avatar SimonJYang avatar  avatar wurentidai avatar xiaolin_peter avatar  avatar  avatar ZhangXinqian avatar l李小满 avatar Xiutao Liu avatar Yoruko avatar Mqh avatar  avatar wknet avatar zhaozengbin avatar kailin zhang avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar 赵星 avatar  avatar  avatar

easy_textcnn_rnn's Issues

寻求数据集合

你好,请问还能提供更多类似高质量的数据集吗?
现在的数据大约是5W,
我想要更多一点,谢谢哦

词向量映射

代码中的有用到词向量映射吗,好像是直接初始化的吧?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.