GithubHelp home page GithubHelp logo

thompsonhe / easy_textcnn_rnn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from x-jun-0130/easy_textcnn_rnn

0.0 0.0 0.0 123.63 MB

tensorflow TxetCnn TextRNN 使用Textcnn、Textrnn对文本进行分类

Python 100.00%

easy_textcnn_rnn's Introduction

Easy_TextCnn_Rnn

tensorflow TxetCnn TextRNN 预训练词向量 分别用Textcnn/Textrnn对中文文本分类

本文博客地址:

Text_Cnn

Text_Rnn

数据集:

本实验是使用THUCNews的一个子集进行训练与测试,数据集请自行到THUCTC:一个高效的中文文本分类工具包下载,请遵循数据提供方的开源协议;

文本类别涉及10个类别:categories = ['体育', '财经', '房产', '家居', '教育', '科技', '时尚', '时政', '游戏', '娱乐'],每个分类6500条数据;

cnews.train.txt: 训练集(5000*10)

cnews.val.txt: 验证集(500*10)

cnews.test.txt: 测试集(1000*10)

训练所用的数据,以及训练好的词向量可以下载:链接: https://pan.baidu.com/s/1daGvDO4UBE5NVrcLaCGeqA 提取码: 9x3i

1.利用TextCnn 进行文本分类

模型参数

parameters.py

预处理

预训练词向量进行embedding

对句子分词,去标点符号

去停用词

文字转数字

padding等

程序在data_processing.py

运行步骤

Training.py

train and test result

predict.py 模型用来对验证文本进行预测

evalutaing result

验证结果表明,5000条文本准确率达96.58%,取前10条语句的测试结果与原标签对比。

网络结构与本文博客图片基本一致

2.利用RNN进行文本分类

1.利用双层RNN进行文本分类

模型参数

parameters_rnn.py

预处理

预训练词向量进行embedding

对句子分词,去标点符号

去停用词

文字转数字

padding

计算每个batch中句子真实长度等

程序在data_processing_rnn.py

运行步骤

Training.py

train and test result

predict.py 模型用来对验证文本进行预测

evalutaing result

验证结果表明,5000条文本准确率达96.7%,取前10条语句的测试结果与原标签对比。

参考

1.Convolutional Neural Networks for Sentence Classification

2.https://github.com/cjymz886/text-cnn

3.http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow

easy_textcnn_rnn's People

Contributors

x-jun-0130 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.