This project aims to reveal the entity relations by randomly selecting data from from nltk.corpus import reuters
data sets and transferring the relationships that emerged after the obtained text information extraction pipeline to the graphical database in neo4j.
Text Cleaning : Text cleaning is the process of preparing raw text for NLP (Natural Language Processing) so that machines can understand human language.
Named-entity Recognition : (NER)is an information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
Coreference Resolution : Coreference resolution is high useful task which is refer to the same entity in a text. It is an important part for the NLP projects to understanding such as document summarization, question answering, and information extraction
Entity Linking : Entity linking is the name we give as long as we find the connection between entities such as names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, which emerge after performing named-entity recognition.
Relationship Extraction : Relationship extraction is the task of extracting semantic relationships from a text. Extracted relationships usually occur between two or more entities of a certain type (e.g. Person, Organisation, Location) and fall into a number of semantic categories (e.g. married to, employed by, lives in)
Graph : The knowledge graph is created by Neo4j. Neo4j stores and manages data in its more natural, connected state, maintaining data relationships that deliver lightning-fast queries, deeper context for analytics and hepls us to understand relations.
You can find out how to use the data by clicking this link. Reuters
- This dataset contains the text of 10,788 news documents totaling 1.3 million words and is publicaly available.
- Pull requests are welcome.
- or clone
https://github.com/yusufakcakaya/Algorythm-NLP-Entity-Recognition.git
Algorythm-NLP-Entity-Recognition
โ
โ
โ
โ__ __ __ datasets : datasets for try
โ
โ__ __ 01.Data_exploration.ipynb : get Retuters data from nltk
|
|__ __ 02.I.E.Pipeline : get entity relations
|
โ__ __ 03.B.K.G_using_NER.ipynb : creating .csv for neo4j
โ
โ__ __ 04.neo4j.ipynb : .py version of neo4j
โ
โ__ __ KeatingDataset.csv : .csv file for neo4j
โ
|__ __ README.md : explains the project
|
โ__ __ data.csv : .csv file for KeatingDataset.csv
โ
โ__ __ requirements.txt
โ
Same visualisations:
It shows relation between source and target.
We can see easily relations between entities.
Design and construction phase of the project was made by 3 collaborators.(Arfa Meher, Nichelle Pinto Machado, Yusuf Akcakaya)
- Type of Challenge: Learning
- Duration: 2 weeks
- Deadline: 20/01/2021 4:30 PM
- Team challenge : group