GithubHelp home page GithubHelp logo

dali-yy / bert_news_classfication Goto Github PK

View Code? Open in Web Editor NEW
37.0 2.0 6.0 39.19 MB

通过python爬虫获取人民网、新浪等网站新闻作为训练集,基于BERT构建新闻文本分类模型,并结合node.js + vue完成了一个可视化界面。

Python 45.78% JavaScript 31.63% HTML 1.08% EJS 0.38% Vue 21.14%

bert_news_classfication's Introduction

-BERT-

通过python爬虫获取人民网、新浪等网站新闻作为训练集,基于BERT构建新闻文本分类模型,并结合node.js + vue完成了一个可视化界面。

一、新闻爬取

1、python依赖

pip install demjson==2.2.4  // 解析json数据
pip install requests==2.25.1  // 请求网页
pip install Beautifulsoup4==0.0.1  // 解析HTML源码
pip install loguru==0.5.3
pip install install openpyxl==3.0.7

2、爬取过程

主要依赖request库请求网页源码或JavaScript数据,然后通过bs4s识别标签进行解析,获取网页中的新闻内容。不同网页代码框架不同,四个文件针对不同网站的HTML代码进行解析。爬取的新闻内容通过openpyxl保存在excel文件中

二、基于BERT的新闻文本分类模型

1、python 依赖

pip install pytorch-pretrained-bert==0.6.2

CUDA-10.2 PyTorch builds are no longer available for Windows, please use CUDA-11.3

有关pytorch和cuda 10.2 的安装可参见官网:

pytorch:https://pytorch.org/

cuda10.2:https://developer.nvidia.com/cuda-10.2-download-archive

2、代码使用说明

(1)代码中使用的bert预训模型来自哈工大https://github.com/ymcui/Chinese-BERT-wwm

(2)可运行 main.py 文件自行训练模型,使用的数据集类似于 dataset 中的文本格式即可

(3)one_predict.py 可实现对单条新闻文本的分类

(4)file_predict.py 可实现对文件中所有的新闻进行分类

三、web端应用

1、前端

基于 Vue 框架

项目热启动于http://localhost:8080

cd frontend
npm install
npm run dev

2、后端

基于 node.js 中的 express 框架

项目热启动于http://localhost:3000

cd backend
npm install
npm run dev

注意:在使用的时候需要将训练好的分类模型放在 \backend\src\bert_trained 文件夹中,否则无法实现新闻文本分类功能

bert_news_classfication's People

Contributors

dali-yy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

bert_news_classfication's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.