GithubHelp home page GithubHelp logo

zhoufangquan / poetry Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xiu-ze/poetry

0.0 0.0 0.0 265.43 MB

爬取自互联网的古诗词语料库,包含先秦至当代诗词,共计1014508首诗

poetry's Introduction

Chinese-Poetry-Corpus

本语料库收集自互联网,包含了从先秦到当代的古诗词数据,以CSV格式进行存储。经过去重后,包含诗词共计1014508首。

数据说明

  1. 古诗词按朝代进行划分,存储于文件夹"诗歌数据集"下,命名规则为"朝代.csv"。诗词数量多的朝代被分成多个文档,以避免单个文档过大;
  2. 跨朝代的诗人,以诗人出生的朝代进行划分,如若诗人出生于明末,生活在清初,则该诗人的作品仅收录在文件"明.csv"中;
  3. 每首诗词数据包含五个字段,分别为"标题"、"朝代"、"作者"、"体裁"、"内容"。其中"体裁"字段记录着该诗的文学体裁,如"五言绝句"、"词"、"古风"……;
  4. 针对词这种体裁,词的标题一般由"词牌名"和"题目"构成,本语料库在整理时将词标题的命名格式统一成"词牌名[空格]题目"。PS:(1)某些词数据不包含题目,而仅由词牌名构成,针对该情况,本语料库将标题统一成"词牌名"。(2)某些词同一个词牌和标题下会有好几首,作为区分诗人会添加"其一"/"其二"等标识,本语料库将该情况下的标题信息统一成"词牌名[空格]其X[空格]题目",其中X代表一二三……;

各朝代诗词数量统计信息

朝代 数量
先秦 576
9
753
魏晋 2425
南北朝 4705
1266
54156
268665
25
8357
70574
294587
246698
近现代 30372
当代 31340

先秦至清朝诗词统计信息

先秦至清朝的诗词曲等体裁共计952816首,其中词、五言绝句、五言律诗、七言绝句、七言律诗具体信息如下表所示

体裁 数量
83364
五言绝句 35574
五言律诗 145068
七言绝句 196356
七言律诗 217215

poetry's People

Contributors

xiu-ze avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.