GithubHelp home page GithubHelp logo

cuda_brute_triangle's Introduction

CUDA_brute_triangle

brute force examines all possible n choose 3 triangles only once

UPDATE: Optimized both implementations, so new times posted.Run using the --use_fast_math flag, and must have a GPU with compute capability >=3.5. Best performance is max_register set to 32.

This code goes through every possible 3 point combination of a set of points, makes a triangle of those three points, then evalutes how many other points are within that triangle.

In this simple example case, the objective is to find which triangle of the possible set contains within its borders the greatest number of other points(exclusive). The two different CPU and GPU functions return that max number of points, and the INDEXES of the three points which created that optimal triangle.

NOTE: Since the GPU version does not proceed in a serial fashion, if there is more than one combination associated with the optimal value it will return a valid combination, BUT not necessarly the same(first seen) combination the CPU version returned. The overall answer will be correct, but a hueristic needs to be added for the CUDA version to return a specific arrangement if more than one does indeed exist.

While many CUDA GPU implementations of algorithms many only be 10-100 times faster than a single core CPU implementation, this problem has a much greater difference in performance.

The larger the data set of points, the greater the outperformance of the CUDA GPU implementation. For data sets of points >=400 the GPU CUDA implementation was at least 400x times faster than a 3.9 Ghz CPU implementation(including all host-device, device-device and device-host memory copies).

No overlocking of GPU, is running at stock 706 Mhz for older Kepler Tesla K20c

Optimal Triangle Running Time comparison:

Number of pointsIntel I-3770K 3.9 Ghz CPU time Tesla K20c GPU time CUDA Speedup
300 13,198 ms 32 ms 412.438x
400 42,710 ms 99 ms 431.32x
500 103,731 ms 240 ms 432.2x
700 411,401 ms 912 ms 451.01x
1300 5,075,524 ms 10,788 ms 470.47x

New Times for GTX 980 and GTX 780ti

Number of pointsIntel I-3770K 3.9 Ghz CPU time GTX 980 time GTX 780ti time GTX Titan X time
700 411,401 ms 716 ms 655 ms 573 ms
1300 5,075,524 ms 8,422 ms 7,720 ms 6,735 ms

The running time is apx (N choose 3)*N, where N is the number of 2-D points in the array.

What makes this even more impressive is that the GPU actually has to do over 10x more work for the same answer (this is apparent in code, compare CPU version to GPU).

AMD users, post your times and code. I would like to compare results. Same goes to the multi-core CPU crowd who thinks a Xeon can beat a GPU at this type of problem. Prove it ..

<script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-60172288-1', 'auto'); ga('send', 'pageview'); </script>

cuda_brute_triangle's People

Contributors

olegkonings avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

hush-z nkapgate

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.