GithubHelp home page GithubHelp logo

byronthecoder / novel-kg Goto Github PK

View Code? Open in Web Editor NEW

This project forked from liuyuzhangolvz/novel-kg

0.0 0.0 0.0 23.53 MB

金庸小说人物关系图谱构建

License: MIT License

Python 64.40% CSS 3.56% HTML 32.04%

novel-kg's Introduction

novel-kg

金庸小说人物关系图谱构建


环境

Python 3.6+
MongoDB
Neo4j

⚠️ 请先启动 MongoDB 和 Neo4j

目录结构

|- 
  |- crawl-baike  爬取百度百科
  |- crawl-novel  爬取小说
  |- kgqa  知识图谱文档
  |- mongo2neo  mongo 数据导入 neo4j

操作说明

1.爬取金庸小说数据

启动 MongoDB 进程,执行爬虫文件 xiaoshuo_spider.py ,得到小说文本存入MongoDB。

cd crawl-baike
scrapy crawl spider_xiaoshuo

2.爬取小说人物关系

  • 执行转换脚本 convert.py,将 MongoDB 中的小说数据转成文本存到本地。
cd crawl-novel
python convert.py
  • 执行 extract_persons.py ,对小说文本进行词法分析,提取出人名
python extract_persons.py
  • 执行爬虫,根据人名爬取百度百科相关的属下和关系,存入MongoDB。
scrapy crawl person_spider

3.MongoDB 转 Neo4j

执行转换脚本 mongo2neo.py,将 MongoDB 中数据导入 Neo4j 。

cd mongo2neo
python mongo2neo.py

效果

人物关系知识图谱

全部人物关系图 persons relations

“张无忌”的人物关系图 张无忌

图谱问答系统

cd kgqa
python app.py

系统架构 wenda index

关于张无忌的问答 wenda zhangwuji

关于周芷若的问答 wenda zhouzhiruo

novel-kg's People

Contributors

liuyuzhangolvz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.