GithubHelp home page GithubHelp logo

novel-kg's Introduction

novel-kg

金庸小说人物关系图谱构建


环境

Python 3.6+
MongoDB
Neo4j

⚠️ 请先启动 MongoDB 和 Neo4j

目录结构

|- 
  |- crawl-baike  爬取百度百科
  |- crawl-novel  爬取小说
  |- kgqa  知识图谱文档
  |- mongo2neo  mongo 数据导入 neo4j

操作说明

1.爬取金庸小说数据

启动 MongoDB 进程,执行爬虫文件 xiaoshuo_spider.py ,得到小说文本存入MongoDB。

cd crawl-baike
scrapy crawl spider_xiaoshuo

2.爬取小说人物关系

  • 执行转换脚本 convert.py,将 MongoDB 中的小说数据转成文本存到本地。
cd crawl-novel
python convert.py
  • 执行 extract_persons.py ,对小说文本进行词法分析,提取出人名
python extract_persons.py
  • 执行爬虫,根据人名爬取百度百科相关的属下和关系,存入MongoDB。
scrapy crawl person_spider

3.MongoDB 转 Neo4j

执行转换脚本 mongo2neo.py,将 MongoDB 中数据导入 Neo4j 。

cd mongo2neo
python mongo2neo.py

效果

人物关系知识图谱

全部人物关系图 persons relations

“张无忌”的人物关系图 张无忌

图谱问答系统

cd kgqa
python app.py

系统架构 wenda index

关于张无忌的问答 wenda zhangwuji

关于周芷若的问答 wenda zhouzhiruo

novel-kg's People

Contributors

liuyuzhangolvz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

novel-kg's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.