GithubHelp home page GithubHelp logo

stellargraph / stellargraph Goto Github PK

View Code? Open in Web Editor NEW
2.9K 63.0 424.0 94.74 MB

StellarGraph - Machine Learning on Graphs

Home Page: https://stellargraph.readthedocs.io/

License: Apache License 2.0

Python 98.78% Shell 1.04% Dockerfile 0.19%
graphs machine-learning machine-learning-algorithms graph-convolutional-networks networkx geometric-deep-learning saliency-map interpretability heterogeneous-networks graph-neural-networks

stellargraph's Issues

Understand HIN GraphSage algorithm

Description

Kevin and Yuriy have implemented a HIN GraphSAGE algorithm. I'd like to understand the implementation.

There are are other heterogeneous GCN-like algorithms in the literature, read and understand them. How do they compare? Which algorithms could we implement for the ML library? Can we obtain code and test them on different problems? What input sampling strategies are required for each algorithm? How do training and prediction differ?

Done Checklist (Research)

  • Add different algorithms to documentation on Google Docs
  • Add sampling strategies to documentation on Google Docs

Tune node2vec parameters for link prediction demo

Description

Currently, the link prediction demo uses fixed parameter values, e.g., p=q=1 and several other parameters, for node2vec. We need to allow for these parameters to be tuned for improved link prediction performance.

User Story

As a: Research Engineer
I want: to tune the hyper-parameters of the node2vec algorithm
so that: I can achieve the highest performance in link prediction

Done Checklist (Development)

  • Code to tune node2vec hyper-parameters
  • Pull request

Write simple data feed to Graphsage for StellarML library

Description

Currently the Graphsage code from Kevin runs well, but requires a redis database and is slow for simple single computer testing.

As part of the StellarML library we want to pass data into tensorflow fast. Having a simple in-memory graphsage sampler would be a good start.

Done Checklist (Development)

  • Assumptions of the user story met
  • Produced code for required functionality
  • Branch and Pull Request build on CI
  • Branch and Pull Request pass unit tests on CI
  • Branch and Pull Request pass integration tests on CI
  • Version number reflects new status
  • Peer Code Review Performed
  • Code well commented
  • Documentation on Google Docs
  • Documentation in repo
  • Team demo
  • Mini-meetup talk
  • Stakeholder sign-off

Done Checklist (Research)

  • Code Review
  • Documentation on Google Docs
  • Documentation in repo
  • Team talk
  • Mini-meetup talk

Done Checklist (Bug)

  • Bug fixed
  • Branch and Pull Request build on CI
  • Branch and Pull Request pass unit tests on CI
  • Branch and Pull Request pass integration tests on CI
  • Version number reflects new status
  • Peer Code Review Performed
  • Code well commented

Extend HinSAGE for Link Prediction

Description

Use the documentation for HinSAGE link prediction to create a working link prediction example using the Paradise Papers dataset from the Data team.

User Story

As a: data scientist
I want: to use GraphSAGE layers for link prediction
so that: I can run scalable link prediction

Done Checklist (Development)

  • Working example with Alzheimer data
  • Code well commented
  • Documentation in repo
  • Peer Code Review Performed

Prepare YOW Data Experiment

Description

We need to get started with an interesting demonstration of machine learning on graphs for the YowData! conference in mid-May.
First, we need to decide on a dataset and problem.

Checklist

  • Write-up of dataset and problem options on google docs
  • Initial implementation of solving the selected problem on the dataset

Improve speed of 'local' sampling method for link prediction

Description

Sampling negative edges for link prediction using the nodes' local neighbourhood structure currently uses BFS that runs very slow if target nodes more than 5 edges away need to be sample. This issue is about replacing BFS with DFS to speed up the sampling algorithm.

User Story

As a: Research Engineer
I want: to run link experiments as fast as possible
so that: I maximise my efficiency.

Done Checklist (Development)

  • Updated source code to replace BFS for target nodes with DFS.
  • Pull request

YowData: Investigate NetFlix prediction using N2V

Description

For the YowData conference, I'd like to present a recommender example. The Netflix prize dataset is well known, and a large amount of effort has been spent on getting results on this dataset. Good performance on this dataset would be impressive.

Recommender systems are often not thought about in terms of graphs. Therefore, posing this in a graph framework and solving it would be interesting. We can start by using node2vec to extract node embeddings and trying to predict the scores from this.

Done Checklist (Research)

  • Notes or slides on recommedations for movielens with node2vec
  • Code for recommendations for movielens with node2vec

[Dynamic Node2Vec] Investigate temporal updates of skipgram model

Description

Hooman is performing through experiments in how his dynamic random walk methods perform as a part of an end-to-end dynamic node2vec algorithm. There are difficulties with how the skip-gram model interacts with random walk updates. To get a good publication we need a description of the skip-gram model and some explanation of how different training update schemes will affect the model.

Done Checklist (Research)

  • Skip-gram model understanding
  • Documentation of skip-gram model
  • Documentation of skip-gram model update techniques

Create baseline skeleton library

Description

Create initial dummy library using the documentation and pseudo code already accumulated

Done Checklist

  • Code
  • Pull Request
  • Unit Tests

Test link prediction demo on HIN

Description

The link prediction demo works on homogeneous datasets. We want to test whether it also works on heterogeneous datasets under the assumption that the latter will be treated as homogeneous.

User Story

As a: Research Enigneer
I want: to make sure that the link prediction demo work for both homogeneous and heterogeneous networks
so that: I can tackle more general analytics problems.

Done Checklist (Development)

  • Determine heterogeneous dataset for testing
  • Link prediction demo works with input the selected heterogeneous network.
  • Pull request

Scalable Node Attribute Inference for Graphs

Description

Build a scalable implementation of node attribute inference (NAI) for graphs, that works for at least 10M node graphs.

Value

Besides satisfying stakeholders' requirements for scalable attribute inference tasks on large graph datasets (thus expanding the NAI capability of Release 1), this should allow us to find an optimal scalable architecture for other ML tasks on graphs, such as link prediction and classification, recommendations, etc.

Investigate Metapath2Vec paper for link prediction on heterogeneous graphs

Description

I want to understand the Metapath2Vec algorithm for representation learning in heterogeneous graphs.

User Story

As a: Research Engineer
I want: to understand the MetaPath2Vec algorithm for representation learning on heterogeneous graphs
so that: I can use it for node attribute inference and link prediction.

Done Checklist (Development)

  • Document differences between Metapath2Vec and Node2Vec algorithms
  • Determine scalability issues that are unique to Metapath2Vec
  • Search for reference implementation and, if found, run some experiments on test graphs to better understand its performance.

Extend HinSAGE demo code for unsupervised learning

Description

Implement the wrappers / additional layers for unsupervised learning around the HinSAGE demo code. Create a working example using the risk net dataset.

User Story

As a: data scientist
I want: to run unsupervised learning using GraphSAGE
so that: I can transform my large dataset into node embeddings

Done Checklist

  • Peer Code Review Performed
  • Code well commented
  • Documentation in repo

PoC for Unsupervised GraphSAGE

Description

Currently the GraphSage unsupervised method is not in the library. This task is to add a simple unsupervised GraphSage module in the stellar-ml library.

User Story

As a: Data Scientist
I want: everything that GraphSAGE offers
so that: I have freedom for my unsupervised method experiments

Done Checklist (Development)

  • Branch and Pull Request build on CI
  • Code well commented

Write YOWData! presentation

Description

I'll be presenting at YOWData! on the 15th (at 5pm) so I need to prepare some slides!

Done Checklist

  • Slides on Google Docs
  • Give YOWData! presentation

Explore Aboleth for library design choices

Description

Can we borrow some design choices, e.g.: base classes & inheritance, layer compositions, pipelining
Link: https://github.com/data61/aboleth

Checklist

  • List of Aboleth base clases + description, perhaps indicating which base classes can be borrowed into our library
  • pseudo code for node2vec+logistic workflow with the graph ML library (as we imagine it)
  • pseudo code for GraphSAGE

Improve data splitting code for link prediction

Description

The node splitter developed for the link prediction demo of issue #8 needs to be improved such that negative samples are more challenging, i.e., should not be randomly selected out of all pairs of disconnected nodes but rather of disconnected nodes that are nearby in the graph.

User Story

As a: Research Engineer
I want: to use my data to correctly evaluate my link prediction algorithm
so that: I am confident about its performance on unseen data.

Done Checklist (Development)

  • Edge splitter class with improved sampling algorithm
  • Integration of new edge splitter class with baseline link prediction demo
  • Pull Request

Create stellar-ml library structure, and populate with base classes

Description

  • Define the library's structure, base classes, methods, some helper functions, etc.
  • Create unit tests for all the library's base classes and helper functions

User Story

As a: developer of the library
I want: to see a clear structure of base classes to inherit from, their methods, and examples of composing workflows from them.

Done Checklist (Development)

  • Assumptions of the user story met
  • Produced code for required functionality
  • Branch and Pull Request pass unit tests on CI
  • Peer Code Review Performed
  • Code well commented
  • Documentation in repo

Clean up Movielens using HIN Graphsage and move to demos

Description

The movielens recommender demo developed for YOWData could be useful for other problems (Anna would like to try it out to see if it will work for the medicare dataset.

Currently the code is rough and ready, so I'd like to tidy it up, add documentation and have a quick-to-run test case (say on movielens 100k).

Done Checklist (Research)

  • Code Review
  • Documentation in repo
  • Code well commented

Investigate message passing for node2vec

Description

Node2vec can be implemented in a message-passing framework. However, this is strictly only true for prediction. Can we also place training in a message passing framework?

User Story

As a: developer of the graphml library
I want: to train and predict using node2vec in a message-passing framework
so that: i can train node2vec in a one-step scalable fashion.

Done Checklist (Research)

  • Code Review
  • Documentation on Google Docs

Run HIN Graphsage for Movielens 1M dataset with node attributes

Description

Kevin has written code for HIN GraphSage, I'd like to use this to make predictions on the Movielens 1M dataset with the same train/test split as other examples and using intrinsic user/movie features.

Done Checklist

  • Obtain performance numbers for node2vec features
  • Obtain performance numbers for intrinsic features
  • Documentation on Google Docs
  • Code Review

Unit tests for link prediction demo

Description

We need unit tests for the link prediction utility classes.

User Story

As a: Research Engineer
I want: to make sure that changes to the link prediction code are not breaking existing functionality
so that: I can be certain that my code works correctly as it is expanded and improved.

Done Checklist (Development)

  • Create test directory for link prediction demo
  • Add test for link prediction code
  • Pull request

Prepare for GraphSAGE/HinSAGE usage during Hackathon

Description

Prepare for the Spotify hackathon to allow everyone to use GraphSAGE/HinSAGE with ease on the day. Investigate the dataset and prepare notes on any requirements such as AWS setup, input batch preparation code, etc.

User Story

As a: Hackathoner
I want: to run Stellar's graph ML algorithms during the Hackathon
so that: we can win the Spotify competition

Done Checklist (Development)

  • Documentation on Google Docs
  • Documentation in repo

Improve link prediction demo to handle non-integer node IDs

Description

The current implementation of the link prediction demo assumes that node IDs are integers. This is a restrictive assumption because for some datasets the node IDs are not integers. This causes the link prediction demo to fail with an Exception. We need to generalise the code so that it handles non-integer node IDs.

User Story

As a: Research Engineer
I want: to perform link prediction on a variety of network datasets stored in valid EPGM format
so that: I can be certain of the link prediction algorithms generalisation

Done Checklist (Development)

  • Update implementation to handle non-integer node IDs
  • Add unit tests
  • Pull request

Collaborate with Platform Team on caching architecture

Description

The platform team is building an experimental stack. We need to ensure that this meets the needs of the ML and Data teams.

User Story

As a: IA ML dev
I want: ensure that I'm building in sync with the platform team
so that: there is no wasted effort

Done Checklist (Development)

  • Documentation on Google Docs

Graph Machine Learning library that is easy to use and contribute to

Description

Create a machine learning library in Python that is simple to use and simple to contribute too. The library should focus on the deep learning on graph algorithms, and not attempt to duplicate existing algorithms e.g. community detection, random forests etc.

Value

This library will allow Data Scientists and Researchers to create models over network datasets with minimal overhead. The goal is to allow a fast experiment cycle time, with minimal assumed knowledge. For Researchers, it should be a place to add new algorithms, get their algorithms seen, and supply functions for building new deep learning models on graphs.

YowData: Prepare spammers example

Description

I want to present an example a the YowData conference. The spammers dataset is an interesting case for applying graph ML. I want to prepare the spammers dataset and run node attribute inference on it.

Note:
Anna has done some investigation into using GraphSage and node2vec, so I will find out what has been done so far.

Done Checklist (Research)

  • Gave presentation at YowData

Investigate Turi for graph processing in the ML library and platform

Description

Apple has open-sourced Turi which is a powerful graph processing framework. We should evaluate this tech with the following critera:

  • functionality
  • easy of use
  • easy of scalability
  • performance

This task would be to ingest a > 1M edge dataset and perform a set of graph tasks e.g. BFS/DFS, graph traversal, grabbing neighbours, random sampling.

User Story

As a: data scientist
I want: the graph processing part of the library to be fast and have lots of functionality
so that: I can move on to my tensorflow part to build my model

Done Checklist (Development)

  • Small experiment setup to run the evaluation, e.g. python script file
  • Documentation on Google Docs
  • Team demo

Update EPGM class to use networkx v2.*

Description

Currently, our graph processing module requires an earlier version of networkx, e.g., 1.. Newer versions of networkx, namely 2., have changed how nodes and edges are returned to the user. We need to updated our code to work with the newer version of networkx because it is becoming more common and often causes problems.

User Story

As a: Research Engineer
I want: my network analytics library to work with the latest version of python modules
so that: I can make use of the latest developments and improvements in these modules

Done Checklist (Development)

  • Produced code for required functionality
  • Unit tests updated and new ones added as necessary
  • Pull request

Git workflow demo

Description

Use example git repositories to understand the git and github workflow.

User Story

As a: Research engineer
I want: to understand git and github workflows
so that: I can work with the rest of the team to develop the ML library.

Done Checklist (Development)

  • Create test repos
  • Document the workflow for forking a repo, developing new code, and putting the code back to original repo via a pull request

Investigate Apache Tinkerpop and write GraphSAGE input preparation in gremlin-python

Description

Investigate Gremlin's viability to efficiently prepare inputs for GraphSAGE from a graph database as well as from local memory.

User Story

As a: data scientist
I want: to use gremlin to prepare inputs for my graph ML tasks.
so that: I can efficiently prepare batch inputs from various graph data sources.

Done Checklist (Development)

  • Code well commented
  • Documentation
  • Peer Code Review Performed
  • Mini-meetup talk

Build inductive NAI with GCN

Steps:
Given a full graph G:

  1. Randomly select a test set of nodes {V}_test, remove them from G, resulting in G_train = G - {V}_test
  2. Evaluate \hat{A}=\hat{A_train}, X_train from G_train
  3. Train GCN on G_train (feeding \hat{A_train}, X_train), save the trained model
  4. Evaluate \hat{A}, X for the full graph G, ensuring that the order of nodes in the intersection of G and G_train is preserved. I.e., update \hat{A_train}, X_train with test nodes to obtain the full graph's \hat{A}, X.
  5. Do a forward pass of the updated \hat{A}, X through the trained GCN model, predicting attributes for test nodes.
  6. Evaluate predictions by comparing them with true test node attributes

Repeat steps 1-5 to obtain average prediction metrics.

Investigate feature alignment for link prediction

Description

Investigate whether link features obtained from G_train and G_test are aligned, and whether/how this affects performance of the link prediction classifier.

We need to remove the confluence docs, it would be good to get the link prediction code from there.

Done Checklist (Research)

  • Experimental code/visualisations in 'alignment' branch in stellar-ml-sandbox/link-prediction
  • Documentation on Google Docs

Start moving code from link-prediction/utils to stellar ML library

Description

Some of the code in link-prediction/utils is mature enough to be integrated into the stellar ML library.

User Story

As a: Research Engineer
I want: to transfer mature code from demo code into the stellar graph ML library
so that: it can be re-used by other IA member and properly unit tested with CI.

Done Checklist (Development)

  • Produced code for required functionality
  • Branch and Pull Request build on CI
  • Branch and Pull Request pass unit tests on CI
  • Peer Code Review Performed
  • Code well commented

Organise reference datasets

Description

Organise the reference datasets with a readme.

User Story

As an: IA team member
I want: to have easy access to well defined data sets
So that: I can test my code and minimise dataset confusion

Done Checklist (Bug)

  • Documented dataset procedure
  • Document

Graph splitting based on edge type to predict

Description

Data splitter for link prediction should be able to split the graph based on the type of the edge to predict. Also, it should be able to split based on an edge property. For example, we should be able to split based on timestamps if edge have it as a property.

User Story

As a: Research Engineer
I want: to prepare my data
so that: I can perform link prediction on HINs based on edge types and properties

Done Checklist (Research)

  • Implement data splitting based on edge type and/or edge property, e.g., timestamp
  • Pull request
  • Unit tests

Organize external engadgements with Jia and Jesse

Description

To engage successfully with the research groups led by Jia and Jesse, we need to map out the research interests of both groups and match them with research questions of relevance to us.

User Story

As a: researcher collaborating with the Stellar project
I want: to research graph technologies that are of interest to Stellar
so that: we can get publications for our research and support from Stellar.

Done Checklist (Research)

  • Documentation of ongoing engadgements on Google Docs
  • Schedule of meetings with Jesse and Jia
  • Outline of scope of research.
  • AC review

Setup Travis for stellar-ml

Description

Setup continuous integration for automated tests in the library. Create a buildkite.yml file

Done Checklist

  • Triggered commits for BuildKite
  • Writing to build-bots

Graphsage demo in StellarML Library

Description

Currently StellarML Library has some base classes. We have working Graphsage code from Kevin. We should implement a demo using the classes from StellarML, moving code to this style as required.

Done Checklist (Development)

  • Assumptions of the user story met
  • Produced code for required functionality
  • Branch and Pull Request build on CI
  • Branch and Pull Request pass unit tests on CI
  • Branch and Pull Request pass integration tests on CI
  • Version number reflects new status
  • Peer Code Review Performed
  • Code well commented
  • Documentation on Google Docs
  • Documentation in repo
  • Team demo
  • Mini-meetup talk
  • Stakeholder sign-off

Done Checklist (Research)

  • Code Review
  • Documentation on Google Docs
  • Documentation in repo
  • Team talk
  • Mini-meetup talk

Done Checklist (Bug)

  • Bug fixed
  • Branch and Pull Request build on CI
  • Branch and Pull Request pass unit tests on CI
  • Branch and Pull Request pass integration tests on CI
  • Version number reflects new status
  • Peer Code Review Performed
  • Code well commented

Write YOWData! presentation

Description

I'll be presenting at YOWData! on the 15th (at 5pm) so I need to prepare some slides!

Done Checklist

  • Slides on Google Docs
  • Give YOWData! presentation

Move link prediction demo from stellar-ml-sandbox to stellar-ml repo

Description

We need to move the demo for link prediction from the stellar-ml-sandbox repo to here.

User Story

As a: Research Engineer
I want: to have all my code relating to graph ML in one place
so that: I can more effectively develop the graph-ml library and share code with my team

Done Checklist (Development)

  • Moved code from stellar-ml-sandbox to stellar-ml repo
  • Pull request

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.