GithubHelp home page GithubHelp logo

yilinjuang / github-repo-recommender Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 0.0 145 KB

Github Repo Recommender System. 2017 Network Science Final Project.

Python 79.72% C++ 19.62% Makefile 0.66%
github network-science recommender-system recommendation-system repository

github-repo-recommender's Introduction

Octomender

Octomender = Octopus (GitHub) + Recommender

Github Repo Recommender System.

2017 Network Science Final Project with J. C. Liang.

Requirement

  • python3
  • NetworkX: High-productivity software for complex networks.
  • NumPy
  • SciPy
  • OpenMP>=4.0: C/C++ API that supports multi-platform shared memory multiprocessing programming.

Dataset

Github Archive

Preprocessing

Parse raw json data files into three pickle data files.

  • output-data-basename.user: map of user id (str) to user name (str)
  • output-data-basename.repo: map of repo id (int) to repo name (str)
  • output-data-basename.edge: list of tuples of user-repo edge (str, int)
Usage: parse.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename>
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05

Refer raw json data format to GitHub API v3.

Ditto, but run with multiprocessing. Default number of processes is 16.

Usage: parse.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename> [n-process]
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
  n-process         number of processes when multiprocessing.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05 32

Merge multiple pickle data files into one.

Usage: mergedata.py <input-data-dir> <output-data-basename>
Ex:    mergedata.py data/2016-010203/ data/2016-Q1

Generate bipartite graph and project to unipartite graph (optional).

Usage: generate.py <input-data-basename> <output-graph-basename> [-p|--project]
  -p, --project     project to unipartite graph (multigraph).
Ex:    generate.py data/2017-05 graph/2017-05
Ex:    generate.py data/2016-Q1 graph/2016-Q1 -p

Refer implementation of bipartite graph to algorithms.bipartite of NetworkX.

Filter multigraph to single graph with different mode.

Usage: filter.py {-m|-t|-p} <input-unipartite-nxgraph> <output-filtered-nxgraph>
  -m                filtering mode: Multiplicity > 1.
  -t                filtering mode: Top % of multiplicity.
  -p                filtering mode: Multiplicity proportion > threshold.
Ex:    filter.py -m graph/2017-05_user.nxgraph graph/2017-05_user_m.nxgraph
Ex:    filter.py -t graph/2016-Q1_repo.nxgraph graph/2016-Q1_repo_t.nxgraph

Convert NetworkX Graph object (.nxgraph) to edge list.

Usage: nxgraph2edgelist.py <input-nxgraph> <output-edgelist-basename>
Ex:    nxgraph2edgelist.py graph/2017-05_bi.nxgraph graph/2017-05_bi

SVD Predictor

Octomender

Build

make

Run

Usage: ./octomender <input-edgelist>
Ex:    ./octomender graph/2017-05_bi.edgelist

Or direct output to file.

Usage: ./octomender <input-edgelist> > output.log
Ex:    ./octomender graph/2017-05_bi.edgelist > log/2017-05.log

Convert log file to readable format including interpretation of repo id to repo name.

Usage: whatsthisrepoid.py <input-log-file> <input-repo-data-file>
Ex:    whatsthisrepoid.py log/2017-05.log data/2017-05.repo

Look up the corresponding id/name of user/repo to name/id of it.

Usage: lookup.py <input-data-file> <query>
Ex:    lookup.py data/2017-05.user frankyjuang
Ex:    lookup.py data/2017-05.user 6175880
Ex:    lookup.py data/2017-05.repo tensorflow/tensorflow
Ex:    lookup.py data/2017-05.repo 45717250

github-repo-recommender's People

Contributors

yilinjuang avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.