GithubHelp home page GithubHelp logo

cutropicalgemm.jl's People

Contributors

arrogantgao avatar giggleliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cutropicalgemm.jl's Issues

Failure of BenchmarkTools

BenchmarkTools are not working correctly:

julia> using TropicalNumbers, CUDA, BenchmarkTools, LinearAlgebra, CuTropicalGEMM

julia> a = Tropical.(CUDA.randn(4096, 4096));

julia> @btime $a * $a;
  3.375 μs (7 allocations: 256 bytes)

julia> @benchmark $a * $a
BenchmarkTools.Trial: 158 samples with 8 evaluations.
 Range (min … max):   3.554 μs …    1.733 s  ┊ GC (min … max): 0.00%0.07%
 Time  (median):      3.976 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.475 ms ± 137.779 ms  ┊ GC (mean ± σ):  0.06% ± 0.01%

  █                                                          ▄
  █▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▄
  3.55 μs       Histogram: log(frequency) by time      13.5 ms <

 Memory estimate: 256 bytes, allocs estimate: 7.

Comparing to results directly from the C-CUDA tests, the result of @ benchmark is correct.

Optimizations for long and narrow matrices

In this package we are using a padding stragety to handle the boundary elements as that of GEMM, and the minimum size of the block is set as $64 \times 32$ and $32 \times 64$ for matrix A and matrix B.
So that for narrow matrices which are widely used in tensor network calculations, there will be tons of useless calculations.
For example, when the size of the matrices are $4 \times 4 \times 10^6$, what is actually calculated are matrices with size of $64 \times 32 \times 10^6$, and only $\frac{1}{128}$ of these calculation are useful.

Optimizations for such long and narrow matrices are needed.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Polish README

A good template to follow is (it just underwent a strict review of JOSS):
https://github.com/TensorBFS/TensorInference.jl

Some aspects that could be improved:

  • The benchmark should be less messy, like using @btimes instead of @benchmark. The benchmark plot is not very good (following the bad example in TropicalGEMM.). I like more with the previous bar plot.
  • Missing section: How to contribute.
  • Warn users that this package is under GPL license and explain why.

Investigate the performance issues and consider moving to GemmKernels.jl

          Sorry for the previous chaos, I thought these parts will not be publish as part of the package.

The following changes have been made:

  • The .so file is uploaded to gist as an artifact, so that there no more binary in the repo now.
  • I relocated all the files into folder src, test and benchmark.
  • Scripts used for benchmarks are given, including the fall back implementation in CUDA.jl. However I found something strange: it seems that CUDA.@sync do not work when using the function from a .so lib, so I failed the benchmark our code in julia.

The new benchmark result is show here:
image

Originally posted by @ArrogantGao in #1 (comment)

Unstable result in the GenericTensorNetwork example

          I see. I notice that although the code is much faster and all tests pass, the current version still can not produce the correct result in the following test case.

The output is probabilistic, hence it is very likely you did not sync threads after some computation. Can you please help me to make the following code produce correct result?

using GenericTensorNetworks, GenericTensorNetworks.Graphs
using CUDA
g = Graphs.random_regular_graph(200, 3)
optimizer = TreeSA(ntrials=3)
gp = IndependentSet(g; optimizer=optimizer)
contraction_complexity(gp)
@time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
using CuTropicalGEMM
# If you run the following line multiple times, the result changes.
@time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)

Originally posted by @GiggleLiu in #9 (comment)

Cleanup repo

Some files are not used.

  • travis.yml
  • Artifacts.toml

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.