GithubHelp home page GithubHelp logo

hui-cc / smallk Goto Github PK

View Code? Open in Web Editor NEW

This project forked from smallk/smallk

0.0 1.0 0.0 147.92 MB

High-performance Non-negative Matrix Factorizations (NMF) - Python/C++

Dockerfile 0.30% Makefile 3.40% C++ 81.59% Python 10.92% Shell 2.27% CSS 0.07% HTML 0.08% TeX 0.20% MATLAB 0.67% Ruby 0.51%

smallk's Introduction

Notes for smallk release 2017/07/21:

  1. All code compiled with gcc 7.1.0.
  2. Vagrant installation upgraded.
  3. Docker installation available.
  4. OSX Sierra SIP (system integrity protection) issue resolved.

dblp: computer science bibliography ground truth data for graph analytics

We provide new data sets of the DBLP computer science bibliography network with richer metadata and verifiable ground-truth knowledge, which can foster future research in community finding and interpretation of communities in large networks.

There are six files in total:

dblp15_graph.mtx The adjacency matrix of the graph
dblp15_graph_weighted.mtx Weighted adjacency matrix, the weight means how many times two authors have collaborated
dblp15_ground_truth.mtx ground truth matrix,  where the (i,j) entry equaling 1 means that author i published in venue j
dblp15_ground_truth_split.mtx split ground truth matrix, where the original ground truth communities are split into connected components
dblp15_authors.txt list of author names, as appeared in the dblp.xml file, the order of which is consistent with all the matrices
dblp15_venues.txt list of venue keys, as described in the paper, the order of which is consistent with the matrix in dblp15_ground_truth.mtx

Community discovery is an important task for revealing structures in large networks. The massive size of contemporary social networks poses a tremendous challenge to the scalability of traditional graph clustering algorithms and the evaluation of discovered communities. Our methodology uses a divide-and-conquer strategy to discover hierarchical community structure, non-overlapping within each level. Our algorithm is based on the highly efficient Rank-2 Symmetric Nonnegative Matrix Factorization. We solve several implementation challenges to boost its efficiency on modern CPU architectures, specifically for very sparse adjacency matrices that represent a wide range of social networks. Empirical results have shown that our algorithm has competitive overall efficiency, and that the non-overlapping communities found by our algorithm recover the ground-truth communities better than state-of-the-art algorithms for overlapping community detection. These results are part of an upcoming publication cited below.

  1. Rundong Du, Da Kuang, Barry Drake and Haesun Park, Georgia Institute of Technology, "Hierarchical Community Detection via Rank-2 Symmetric Nonnegative Matrix Factorization", submitted 2017.

smallk's People

Contributors

ascripka avatar bldrake avatar courtarro avatar surban3 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.