GithubHelp home page GithubHelp logo

neo4j_example's Introduction

Neo4j Graph Database for Olympic Athletes

This repository contains the input csv and Cypher queries to create and manipulate a graph database about olympic athletes in Neo4j.

The original csv is a Kaggle Dataset available here, which was scraped from a sports enthusiasts website. The dataset is included in this present repository as (1) some irrelevant and/or duplicate rows were removed, (2) a row id is added as a column to be used a the unique property on the participation node (see below).

A sample of the csv is as follows:

ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
4 Edgar Lindenau Aabye M 34 NA NA Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Gold
5 Christine Jacoba Aaftink F 21 185 82 Netherlands NED 1988 Winter 1988 Winter Calgary Speed Skating Speed Skating Women's 500 metres NA
5 Christine Jacoba Aaftink F 21 185 82 Netherlands NED 1988 Winter 1988 Winter Calgary Speed Skating Speed Skating Women's 1,000 metres NA

Contents

  1. sample_creation.cyp is a script that loads a small sample of the data with simplified relations, for illustration.

  2. queries.cyp contains a series of queries to learn about Neo4j's query language Cypher. Most of them (up until and including query 17) can be performed on the sample graph database created with the script above. Queries illustrate the following:

    • aggregation functions (e.g., count, collect),
    • nodes identifiers,
    • adding and removing properties (with SET and REMOVE),
    • recursive paths,
    • arithmetic operations,
    • the difference between MERGE and CREATE
    • the uniqueness constraint,
    • conjunction and disjunction operators,
    • negation and existence operators
    • LIMIT and DISTINCT keywords.
    • using the PageRank graph algorithm.
  3. deleting.cyp contains some basic queries for deleting relations and nodes, as well as (in a comment) how to delete the entire graph database.

  4. bulk_loading.cyp contains the sequence of queries to load the entire csv.

  5. The zipped csv with the athletes' information.

Data modeling

Nodes

The following types of nodes are created:

  • Athlete: an athlete with their basic theoretically "immutable" information such as name, sex, height and weight.
  • Team: e.g., "Denmark"
  • Game: e.g., "Summer 1992"
  • Event: e.g., "Sailing Women's Windsurfer"
  • Sport: e.g.,"Sailing"
  • Medal: with only 3 possible values: Gold, Silver and Bronze.
  • Participation: see second paragraph below.

Searching information that is a property is more expensive than if it is a node. Also, for graph embeddings, what counts are nodes and edges, not properties. Therefore, for example, Medal is modelled as a node rather than a property.

The last type of node Participation represents the participation of an athlete in an event at a game for a team with an optional medal. This information is presented as relations to the appropriate nodes and also as properties.

A unique property constraint is created on team, game, event and sport's name property; medal type property; athlete identifier and participation identifier. Both athlete and participation identifiers come from the csv and are different from Neo4j's internal node identifier.

Relations

The following relations are created:

  • HAS_SPORT: from Event to Sport node, e.g., from "Sailing Women's Windsurfer" to "Sailing".
  • HAS_ATHLETE, HAS_GAME, HAS_TEAM and HAS_MEDAL: from Participation to Athlete, Game, Team and Medal node.

Some properties were assigned to relations. These are the age of the athlete on the HAS_ATHLETE relation, and the city on the HAS_GAME relation because a game can occur in different cities (thinking about it, this modelling is not optimal since the property is duplicated for every athlete's participation).

neo4j_example's People

Contributors

nadjet avatar

Forkers

ankitavasudevan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.