GithubHelp home page GithubHelp logo

poetry's Introduction

古诗词数据库

这个古诗词数据库是2017年从古诗文网爬取下来的,目前的数据总量虽不及古诗文网,但其诗词数据进行了一定的清洗整理和格式化,方便给需要的人做研究或者做些创意。这个古诗词数据库目前主要有73281首古诗词和3156个诗人的详细数据,并且已经应用在两个应用上:诗鲸Android客户端和诗鲸微信小程序

image

数据说明

1.gushiwen 文件夹

这个文件夹下面是爬虫爬取的原始内容,其中 view 文件夹里面是一首首古诗,author 文件夹里面是一个个诗人,ju 文件夹里面是一些诗词名句。

2.image 文件夹

这个文件夹下面是是人的头像图片,image_xxx.jpg表示这是编号为xxx的诗人的头像URL地址。

3.data 文件夹

这个文件夹是目前最新的整理数据,其中子目录 poetry 里面是一首首古诗,poet 里面是一个个诗人,aio (all in one) 存放的文件是将诗人和诗词数据整理到一个文件的结果。

4.其他文件夹

其他文件夹中的内容是为了应用向下兼容而保留的旧版本的整理数据,可以不用关注。

数据结构

1.诗词数据

id是诗词在古诗文网上的索引(最近古诗文网改版了,已经不是用id了),name是诗词的名称,content是诗词的内容,dynasty是诗词的朝代,star是数据爬取时这首诗词的点赞人数,poet是诗人的信息,fanyi是诗词的注释和释义等数据,shangxi是诗词的赏析,about是关于这首诗词的其他内容,例如诗人的创作背景等,在古诗文网凡不是诗词释义和诗词赏析的内容都会归总到关于诗词的内容中。

{
  "about": "创作背景\n\n  唐玄宗天宝初年,李白xxx",
  "content": "君不见,黄河之水天上来,奔流到海不复回。xxx",
  "dynasty": "唐代",
  "fanyi": "译文\n你难道看不见那黄河之水从天上奔腾而来,波涛翻滚直奔东海,从不再往回流。xxx",
  "id": 7722,
  "name": "将进酒",
  "poet": {
    "desc": "李白(701年-762年),字太白,号青莲居士,唐朝浪漫主义诗人,被后人誉为“诗仙”。xxx",
    "id": 247,
    "image": "https://raw.githubusercontent.com/hujiaweibujidao/poetry/master/image/image_247.jpg",
    "name": "李白",
    "star": 0
  },
  "shangxi": "鉴赏\n\n  将进酒,唐代以前乐府歌曲的一个题目,内容大多咏唱饮酒放歌之事。xxx",
  "star": 32615,
  "tags": [
    "乐府",
    "唐诗三百首",
    "咏物",
    "抒情",
    "哲理",
    "宴饮"
  ]
}

2.诗人数据

id是诗人在古诗文网上的索引(最近古诗文网改版了,已经不是用id了),name是诗人的姓名,desc是诗人的简介,content是诗人的详细介绍,dynasty是诗人的朝代,star是数据爬取时这个诗人的点赞人数。

{
  "content": "轶事典故\n\n姓名由来\nxxx",
  "desc": "李白(701年-762年),字太白,号青莲居士,唐朝浪漫主义诗人,被后人誉为“诗仙”。xxx",
  "dynasty": "唐代",
  "id": 247,
  "image": "https://raw.githubusercontent.com/hujiaweibujidao/poetry/master/image/image_247.jpg",
  "name": "李白",
  "star": 4895
}

LICENSE

GNU General Public License version 3

Copyright (c) 2018 Javayhu. All rights reserved.

poetry's People

Contributors

javayhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

poetry's Issues

请问使用的是什么爬虫?

首先感谢你的无私奉献,这个数据很棒,谢谢!
其次想问一下爬虫相关的技术是如何实现的?使用的是什么?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.