GithubHelp home page GithubHelp logo

yaobaiwei / gminer Goto Github PK

View Code? Open in Web Editor NEW
61.0 6.0 24.0 5.06 MB

An efficient large-scale graph mining framework.

License: Apache License 2.0

CMake 2.35% Makefile 7.59% C++ 76.69% Shell 0.01% C 13.36%

gminer's Introduction

G-Miner

G-Miner is a general distributed system aimed at graph mining over large-scale graphs.

Graph mining is one of the most important areas in data mining. However, scalable solutions for graph mining are still lacking as existing studies focus on sequential algorithms. While many distributed graph processing systems have been proposed in recent years, most of them were designed to parallelize computations such as PageRank and Breadth-First Search that keep states on individual vertices and propagate updates along edges. Graph mining, on the other hand, may generate many subgraphs whose number can far exceed the number of vertices. This inevitably leads to much higher computational and space complexity rendering existing graph systems inefficient. We propose G-Miner, a distributed system with a new architecture designed for general graph mining. G-Miner adopts a unified programming framework for implementing a wide range of graph mining algorithms. We model subgraph processing as independent tasks, and design a novel task pipeline to streamline task processing for better CPU, network and I/O utilization.

Feature Highlights

  • General Graph Mining Schema: G-Miner aims to provide a unified programming framework for implementing distributed algorithms for a wide range of graph mining applications. To design this framework, we have summarized common patterns of existing graphmining algorithms.

  • Task Model: G-Miner supports asynchronous execution of various types of operations (i.e., CPU, network, disk) and efficient load balancing by modeling a graph mining job as a set of independent tasks. A task consists of three fields: sub-graph, candidates and context.

  • Task-Pipeline: G-Miner provides the task-pipeline, which is designed to asyn-chronously process the following three major operations in G-Miner: (1) CPU computation to process the update operation on each task, (2) network communication to pull candidates from remote machines, and (3) disk writes/reads to buffer intermediate tasks on local disk of every machine.

Getting Started

  • Dependencies Install

To install G-Miner's dependencies (G++, MPI, JDK, HDFS), please follow the instructions in our project webpage.

[New] We used ZMQ lib to support asynchronously communication in v1.1.0, please also install libzmq according to this instruction.

  • Build

Please manually MODIFY the dependency path for MPI/HDFS/ZMQ in CMakeLists.txt at the root directory.

$ export GMINER_HOME=/path/to/gminer_root  # must configure this ENV
$ cd $GMINER_HOME
$ ./auto-build.sh

Academic Paper

[Eurosys 2018] G-Miner: An Efficient Task-Oriented Graph Mining System. Hongzhi Chen, Miao Liu, Yunjian Zhao, Xiao Yan, Da Yan, James Cheng.

[SIGMOD DEMO 2019] Large Scale Graph Mining with G-Miner. Hongzhi Chen, Xiaoxi Wang, Chenghuan Huang, Juncheng Fang, Yifan Hou, Changji Li, James Cheng

Acknowledgement

The subgraph-centric vertex-pulling API is attributed to our prior work G-thinker.

License

Copyright 2018 Husky Data Lab, CUHK

gminer's People

Contributors

wuyifan18 avatar yaobaiwei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gminer's Issues

a minor bug in the graph partition module.

Hi @yaobaiwei, thanks for the great work.
I have read the source code of the graph partition module, however, I found a minor bug when calculating the priority to allocate blocks. This part of the code is as follows:

priority = cmIter->second * (1 - assigned[j] / capacity); //calculate the priority of each work for current block

Here, the integer type should be converted to a double type in the division operation.
I have fixed the bug and pull a request.

OGBNPapers100M

I am trying to run normal_bdg partitioning for ogbnpapers100M dataset. However, I am getting this error.

2024-04-05 02:07:14,842 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using bu
iltin-java classes where applicable                                               
[bafs-01:61282:0:61282] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
BFD: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0)
==== backtrace (tid:  61282) ====                                                                                                                                   
 0 0x0000000000beb3d9 os::Linux::chained_handler()  ???:0                 
 1 0x0000000000bf10eb JVM_handle_linux_signal()  ???:0
 2 0x0000000000be3c8c signalHandler()  ???:0
 3 0x0000000000014420 __funlockfile()  :0 
 4 0x0000000000048b3a strtoul()  ???:0
 5 0x000000000001a9d5 normal_BDGPartitioner::to_vertex()  ???:0
 6 0x0000000000026593 Driver<normal_BDGPartVertex>::load_graph()  ???:0
 7 0x0000000000030ecf BPartitioner<normal_BDGPartVertex>::run()  ???:0
 8 0x000000000001b579 partitioner_exec()  ???:0
 9 0x000000000001608b main()  ???:0
10 0x0000000000024083 __libc_start_main()  ???:0
11 0x00000000000162be _start()  ???:0
=================================
[bafs-01:61282] *** Process received signal ***
[bafs-01:61282] Signal: Segmentation fault (11)
[bafs-01:61282] Signal code:  (-6)
[bafs-01:61282] Failing at address: 0xef62
[bafs-01:61282] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fb41d223420]
[bafs-01:61282] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x48b3a)[0x7fb41cc8eb3a]
[bafs-01:61282] [ 2] /root/GMiner//release/partition(_ZN21normal_BDGPartitioner9to_vertexEPc+0xd5)[0x56156c3079d5]
[bafs-01:61282] [ 3] /root/GMiner//release/partition(_ZN6DriverI20normal_BDGPartVertexE10load_graphEPKc+0xb3)[0x56156c313593]
[bafs-01:61282] [ 4] /root/GMiner//release/partition(_ZN12BPartitionerI20normal_BDGPartVertexE3runERK12WorkerParams+0xbf)[0x56156c31decf]
[bafs-01:61282] [ 5] /root/GMiner//release/partition(_Z16partitioner_execiPPcRK12WorkerParams+0xf9)[0x56156c308579]
[bafs-01:61282] [ 6] /root/GMiner//release/partition(main+0xcb)[0x56156c30308b]
[bafs-01:61282] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fb41cc6a083]
[bafs-01:61282] [ 8] /root/GMiner//release/partition(_start+0x2e)[0x56156c3032be]
[bafs-01:61282] *** End of error message ***

I modified the normal sample file like

0       1 2 3 40
1       3 2 4 5
2       3 1 3 5
3       4 2 4 5 13
4       4 1 3 5 9
5       4 1 2 3 4
6       3 7 9 10
7       4 6 8 10 11
8       3 7 9 10
9       4 4 6 8 10
10      4 6 7 8 9
11      4 7 12 14 15
12      3 11 13 15
13      4 3 12 14 15
14      3 11 13 15
15      4 11 12 13 14 16
16      1
40      2 3 4

by adding nodes 16 and 40 in the graph. But this also gives the same error. Is there a requirement for the graph structure, if so is it possible to run on ogbnpapers100M dataset?

loadFileSystems error

hi,
when I run this command, I get an error as below:

mpiexec -n 1 $GMINER_HOME/release/put /root/GMiner/sample-datasets/normal_sample.adj /hdfs/dataset/

loadFileSystems error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
hdfsBuilderConnect(forceNewInstance=0, nn=master, port=9000, kerbTicketCachePath=(NULL), userName=(NULL)) error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
Failed to connect to HDFS!
Attempting to use an MPI routine before initializing MPICH

My hadoop version is 2.7.5. When I use hadoop fs -put, it can upload successfully.
Could you please help me to figure out what's wrong? Thanks for your help.

put dataset returns error

Dear author, thanks for the great work.
When I try to put graph data to hdfs, it reports the error as below:

$ mpiexec -n 1 $GMINER_HOME/release/put $GMINER_HOME/sample-datasets/normal_sample.adj /GMiner/normal_sample
readDirect: FSDataInputStream#read error:
java.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream
	at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:146)

Could you help me figure out what's going wrong? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.