段清华DEAN's Projects
图书名语料库。含部分电影、游戏名称。
A book about Text-to-Speech (TTS) in Chinese.
Bot Friday Club - BOT5
Microsoft Bot Framework v4 Adapter for Wechat Individual Account
🤖 A JavaScript framework to create conversational UIs
Question answering over knowledge graphs
Caffe: a fast framework for deep learning. For the most recent version checkout the dev branch. For the latest stable release checkout the master branch.
A framework for creating HTML5 Canvas RPG-type games, build on top of EaselJS
ChainKnowledgeGraph, 产业链知识图谱包括A股上市公司、行业和产品共3类实体,包括上市公司所属行业关系、行业上级关系、产品上游原材料关系、产品下游产品关系、公司主营产品、产品小类共6大类。 上市公司4,654家,行业511个,产品95,559条、上游材料56,824条,上级行业480条,下游产品390条,产品小类52,937条,所属行业3,946条。
漢語拆字字典
Dialogs for training or setting up a chatbot
ChatterBot is a machine learning, conversational dialog engine.
Tools and resources for Chinese texts preprocessing. Validated in two papers, one CCF C, EI indexing and one CCF B, SCI indexing.
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text
中文人名语料库。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。
最全中华古诗数据库, 唐宋两朝近一万四千古诗人, 接近5.5万首唐诗加26万宋诗.
:orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。
A Chinese sentiment dataset may be useful for sentiment analysis.
Predict Chinese sentence's emotion. 中文情绪识别
CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras
中文版ai地牢
10W首中文歌词数据库
中文命名实体识别,实体抽取,tensorflow,pytorch,BiLSTM+CRF
中文自然语言处理数据集,平时做做实验的材料。欢迎补充提交合并。
yolo3+ocr
超轻量级中文ocr,支持竖排文字识别, 支持ncnn推理 , psenet(8.5M) + crnn(6.3M) + anglenet(1.5M) 总模型仅17M
常用的中文停用词表