ereynrs / sciarticlesrecommender Goto Github PK

View Code? Open in Web Editor NEW

Data is processed, transformed, and loaded into the Neo4j graph database. Using the cleaned and modelled data, authors are disambiguated, recommendations in terms of what authors could review the incoming publications are made, and the most influential authors are identified.

Jupyter Notebook 93.54% Python 6.46%

sciarticlesrecommender's Introduction

README

Overview

The sample data files in CSV format:

publications.csv,
authors.csv,
topics.csv,
publications_incoming.csv.

Objectives

Draft the initial data model (nodes, relationship and labels) and ETL strategy in order to load into Neo4j graph data is relevant included in publications, authors and topics csv files.
Clean the datasets up, considering , for example, possible duplicated authors.
Recommend a group of people to review the incoming publications.
Depict the more influential authors.

Folders and file structure

Assignment for Knowledge Graph Engineer.pdf file depicting the assessment.
assignment_slideck.pdf file is the slide deck depicting the solution process.
graphDB_model.svg depicts the graph data model.
data/ folder. Contains the data CSV files.
notebooks/ folder. Contains the Jupyter notebooks:
- to perform initial data exploration (exploration.ipynb),
- to run the graph data science algorithms (analysis.ipynb)
src/ folder. Contains the Python script to load the data into the graph db (etl_pandas.py).

Recommend Projects