GithubHelp home page GithubHelp logo

chunxi-alpc / textgrapher-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from liuhuanyong/textgrapher

2.0 1.0 1.0 4.2 MB

Text Content Grapher based on keyinfo extraction by NLP method。输入一篇文档,将文档进行关键信息提取,进行结构化,并最终组织成图谱组织形式,形成对文章语义信息的图谱化展示。

Python 100.00%

textgrapher-1's Introduction

TextGrapher

Text Content Grapher based on keyinfo extraction by NLP method。输入一篇文档,将文档进行关键信息提取,进行结构化,并最终组织成图谱组织形式,形成对文章语义信息的图谱化展示。

项目介绍

如何用图谱和结构化的方式,即以简洁的方式对输入的文本内容进行最佳的语义表示是个难题。 本项目将对这一问题进行尝试,采用的方法为:输入一篇文档,将文档进行关键信息提取,并进行结构化,并最终组织成图谱组织形式,形成对文章语义信息的图谱化展示。  

使用方式

from text_grapher import *
content = '你要分析的文本'
handler = CrimeMining()
handler.main(content)

结果保存在graph.html文件当中。  

事件举例

1) 中兴事件  image

  1. 魏则西事件  image

  2. 雷洋事件  image

  3. 同学杀人事件  image

总结

1)如何用图谱和结构化的方式,即以简洁的方式对输入的文本内容进行最佳的语义表示是个难题。
2)本项目采用了高频词,关键词,命名实体识别,主谓宾短语识别等抽取方式,并尝试将三类信息进行图谱组织表示,这种表示方式是一种尝试。
3)命名实体识别以及关键信息抽取受限于NLP的性能,在算法和方式上还存在多处不足。

Question?

原版的总有一些问题:关于乱码,空列表等问题本版本得以解决

关于环境搭建

1)python3.6

2)使用pyltp-0.2.1-cp36-cp36m-win_amd64.whl 安装好pyltp。

3)http://ltp.ai/download.html 使用3.3.1 。

https://github.com/chunxi-alpc/TextGrapher-1/blob/master/image/版本说明.png

1)新建文件夹ltp 2)把ltp_data文件夹,ltp-3.3.1-win-x86-Release.zip解压出来的内容放到ltp 3)sentence_parser.py 中修改LTP_DIR 为新建文件夹ltp

textgrapher-1's People

Contributors

liuhuanyong avatar chunxi-alpc avatar

Stargazers

CherieSylvia avatar  avatar

Watchers

James Cloos avatar

Forkers

jaceho

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.