GithubHelp home page GithubHelp logo

pkumod / accelerating-tc Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 3.0 641 KB

Source code of "Accelerating triangle counting on GPU", accepted by SIGMOD'21 - By Lin Hu, Prof. Lei Zou, Yu Liu

C++ 33.84% Makefile 0.22% Cuda 8.67% C 57.01% Shell 0.27%

accelerating-tc's People

Contributors

pku-icst-db avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

accelerating-tc's Issues

In TriCore's implementation, why we compare m and n in this way?

The code firstly does not compare whether m=n or not, and implement a warp_binary_kernel. Then have a if, m!=n, and then again have a warp_binary_kernel. Next time, of course m!=n, and we have warp_binary_kernel again, and then we..

I am wondering the correctness of current implementation....

I guess it could be rewrite like:
if (m==n){
// ...
}
else{
//....
}

By the way, although I have prompted lot of questions, I am still very appreciated to your code! It is a good start point for me to learn the TC problem!! Thanks!!

In Tricore's implementation, the binary search has some problem?

Here, when we store the edge_list into "local" shared memory list, we use: local[p * 32 + i] = a[i * degree_m / 32];
Notice here we are not storing the first several layers of the binary search tree, but it is simply arithmetic progression, just like, if we have degree_m=4, we will store in: [0, 0, 0, 0, 0, 0, 0, 0(eight fist value), 1, 1, 1, 1, 1, 1, 1, 1(eight second value), ...] and it does not correspond to the later search index, like, Y = local[p * 32 + r];, what we want here is the r-th value in edge_list, but what we really have is not, is just from arithmetic progression.

__global__ void warp_binary_kernel(const uint32_t* __restrict__ edge_m, const uint32_t* __restrict__ node_index_m, uint32_t edge_m_count, uint32_t* __restrict__ adj_m, uint32_t start_node_n, const uint32_t* __restrict__ node_index_n, uint32_t node_index_n_count, uint32_t* __restrict__ adj_n, uint64_t *results) {
    //phase 1, partition
    uint64_t count = 0;
    __shared__ uint32_t local[BLOCKSIZE];

    uint32_t i = threadIdx.x % 32;
    uint32_t p = threadIdx.x / 32;
    for (uint32_t tid = (threadIdx.x + blockIdx.x * blockDim.x) / 32; tid < edge_m_count; tid += blockDim.x * gridDim.x / 32) {
        uint32_t node_m = edge_m[tid];
        uint32_t node_n = adj_m[tid];
        if (node_n < start_node_n || node_n >= start_node_n + node_index_n_count) {
            continue;
        }

        uint32_t degree_m = node_index_m[node_m + 1] - node_index_m[node_m];
        uint32_t degree_n = node_index_n[node_n + 1 - start_node_n] - node_index_n[node_n - start_node_n];
        uint32_t* a = adj_m + node_index_m[node_m];
        uint32_t* b = adj_n + node_index_n[node_n - start_node_n];
        if(degree_m < degree_n){
            uint32_t temp = degree_m;
            degree_m = degree_n;
            degree_n = temp;
            uint32_t *aa = a;
            a = b;
            b = aa;
        }

        //initial cache
        local[p * 32 + i] = a[i * degree_m / 32];
        __syncthreads();

        //search
        uint32_t j = i;
        while(j < degree_n){
            uint32_t X = b[j];
            uint32_t Y;
            //phase 1: cache
            int32_t bot = 0;
            int32_t top = 32;
            int32_t r;
            while(top > bot + 1){
                r = (top + bot) / 2;
                Y = local[p * 32 + r];
                if(X == Y){
                    count++;
                    bot = top + 32;
                }
                if(X < Y){
                    top = r;
                }
                if(X > Y){
                    bot = r;
                }
            }
            //phase 2
            bot = bot * degree_m / 32;
            top = top * degree_m / 32 - 1;
            while(top >= bot){
                r = (top + bot) / 2;
                Y = a[r];
                if(X == Y){
                    count++;
                }
                if(X <= Y){
                    top = r - 1;
                }
                if(X >= Y){
                    bot = r + 1;
                }
            }
            j += 32;
        }
        __syncthreads();
    }
    results[blockDim.x * blockIdx.x + threadIdx.x] = count;
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.