una-dinosauria / rayuela.jl Goto Github PK

Code for my PhD thesis. Library of quantization-based methods for fast similarity search in high dimensions. Presented at ECCV 18.

License: MIT License

Julia 96.97% Python 3.03%

computer-vision eccv-18 julia nearest-neighbor-search

rayuela.jl's People

Contributors

Stargazers

Watchers

Forkers

zgornel paopaoshimeng philipbadams jingtaozhan

rayuela.jl's Issues

Port ChainQ encoding to GPU?

Would make the pipeline faster so. Maybe?

LoadError: context should be active

Hello, occasionally I get this error whenever using CUDA.

Training a chain quantizer
 -2 2.394080e+04... 0.24 secs updating C
ERROR: LoadError: context should be active
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] device at /home/xhanko1/.julia/packages/CUDAdrv/JWljj/src/context.jl:165 [inlined]
 [3] (::getfield(CuArrays.CUBLAS, Symbol("##3#5")))() at /home/xhanko1/.julia/packages/CuArrays/PD3UJ/src/blas/CUBLAS.jl:25
 [4] get!(::getfield(CuArrays.CUBLAS, Symbol("##3#5")), ::Dict{CUDAdrv.CuContext,Ptr{Nothing}}, ::CUDAdrv.CuContext) at ./dict.jl:453
 [5] handle at /home/xhanko1/.julia/packages/CuArrays/PD3UJ/src/blas/CUBLAS.jl:20 [inlined]
 [6] macro expansion at /home/xhanko1/.julia/packages/CuArrays/PD3UJ/src/blas/error.jl:43 [inlined]
 [7] gemm!(::Char, ::Char, ::Float32, ::CuArrays.CuArray{Float32,2}, ::CuArrays.CuArray{Float32,2}, ::Float32, ::CuArrays.CuArray{Float32,2}) at /home/xhanko1/.julia/packages/CuArrays/PD3UJ/src/blas/wrappers.jl:888
 [8] gemm at /home/xhanko1/.julia/packages/CuArrays/PD3UJ/src/blas/wrappers.jl:903 [inlined]
 [9] quantize_chainq_cuda!(::Array{Int16,2}, ::Array{Float32,2}, ::Array{Array{Float32,2},1}, ::Array{Array{Float32,2},1}, ::UnitRange{Int64}) at /home/xhanko1/.julia/dev/Rayuela/src/ChainQ.jl:239
 [10] quantize_chainq(::Array{Float32,2}, ::Array{Array{Float32,2},1}, ::Bool, ::Bool) at /home/xhanko1/.julia/dev/Rayuela/src/ChainQ.jl:325
 [11] train_chainq(::Array{Float32,2}, ::Int64, ::Int64, ::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Int64, ::Bool) at /home/xhanko1/.julia/dev/Rayuela/src/ChainQ.jl:401
 [12] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /home/xhanko1/.julia/dev/Rayuela/demos/demos_train_query_base.jl:57
 [13] top-level scope at /home/xhanko1/.julia/dev/Rayuela/demos/demos_train_query_base.jl:171 [inlined]
 [14] top-level scope at ./none:0
 [15] include at ./boot.jl:326 [inlined]
 [16] include_relative(::Module, ::String) at ./loading.jl:1038
 [17] include(::Module, ::String) at ./sysimg.jl:29
 [18] include(::String) at ./client.jl:403
 [19] top-level scope at none:0
in expression starting at /home/xhanko1/.julia/dev/Rayuela/demos/demos_train_query_base.jl:170

What is weird is that sometimes it lets me train both ChainQ and LSQ, sometimes I get this error. Does anyone have any pointers what could possibly be the error?

Port fast codebook update to GPU?

Maybe? I mean, it's already very very fast :S

zero-based indexing for codes

Would allow us to use UInt8s instead of UInt16s, cutting memory usage in half.

https://github.com/JuliaArrays/OffsetArrays.jl

Travis does not like GPU dependencies

I am not sure what to do about this. We need a GPU to compile the code and GPU CI is pretty expensive.

Figure out ChainQ encoding in C++ with OMP and CUDA

Conditionally load GPU code

https://github.com/FourierFlows/FourierFlows.jl/blob/2bad3de6716f4848cb1b78a848b1d35f9ca3683d/src/FourierFlows.jl#L100-L106

Seems to be an interesting way to do this

SLSQ2 with tau 0.4 does not reproduce results

It seems like the actual value is 0.9

We should change the lsq repo

Residual and Enhanced residual VQ

Make data download reproducible

Perhaps using https://github.com/oxinabox/DataDeps.jl

LSQ training got stuck

Hi, we're trying to reproduce ECCV'18 paper.

The trainer got stuck in this stage:

Running CUDA LSQ training... 
**********************************************************************************************
Training LSQ GPU with 7 codebooks, 4 perturbations, 4 icm iterations and random order = true
**********************************************************************************************
Doing fast bin codebook update... done in 0.129 seconds.
 -2 1.913506e+04 
Creating 100000 random states... done in 0.15 seconds
^^^ stuck on this stage for 3 hours ^^^^^^

We checked the GPU utilization and found it was zero.
Is this expected?

Reproduction of ECCV'18 paper

Hi!

I would like to reproduce the recall@1 vs. time plots in the LSQ++ paper. Thanks for the open-sourcing of the corresponding code!

Installing Julia 0.7 and the Rayuela.jl was relatively painless. However, it is not obvious what the entry point in the code should be.

For example, trying to use a function from demo.jl gives

https://gist.github.com/mdouze/05161b06a3c524cd0955e99a378507a0

--> loading the data is fine and running the training is ok but there seems to be a missing function qerror, which makes me think that this probably not the right entry point.

Sorry, I am not familiar with Julia.

Any help is appreciated!

Get rid of CUBLAS

CUBLAS.jl Does not seem to be supported anymore, we should probably switch to CuArrays.jl

It also freaks out when there are more than 1 julia processes using Julia and you call using CUBLAS.

From worker 10:  [7] (::Base.Distributed.##105#107{Base.Distributed.CallMsg{:call_fetch},Base.Distributed.MsgHeader,TCPSocket})() at ./event.jl:73
        From worker 18:  [7] (::Base.Distributed.##105#107{Base.Distributed.CallMsg{:call_fetch},Base.Distributed.MsgHeader,TCPSocket})() at ./event.jl:73
        From worker 12:  [7] (::Base.Distributed.##105#107{Base.Distributed.CallMsg{:call_fetch},Base.Distributed.MsgHeader,TCPSocket})() at ./event.jl:73
        From worker 8:   [5] run_work_thunk(::Base.Distributed.##106#108{Base.Distributed.CallMsg{:call_fetch}}, ::Bool) at ./distributed/process_messages.jl:56
        From worker 8:   [6] macro expansion at ./distributed/process_messages.jl:268 [inlined]
        From worker 8:   [7] (::Base.Distributed.##105#107{Base.Distributed.CallMsg{:call_fetch},Base.Distributed.MsgHeader,TCPSocket})() at ./event.jl:73
WARNING: Node state is inconsistent: node 9 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 3 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 21 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 25 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 23 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 15 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 7 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 6 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 13 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 10 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 18 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 20 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 17 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 12 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS
WARNING: Node state is inconsistent: node 8 failed to load cache from /home/julieta/.julia/lib/v0.6/CUBLAS.ji. Got:
WARNING: InitError: "cublas not initialized"
during initialization of module CUBLAS

Why is LSQ on the CPU slower on ada?

The machines are supposed to be much better. Maybe we should look into this?

LSQ++ in 16x4 (nbits=4) - Does NOT scale up to large training sets

Hello,

I have run LSQ++ for codebook of size M=16 (number of subspaces) and codes encoded in nbits=4 for BigANN1M and Deep1M. When increasing the size of the training set, I observe a drop in recall (@1, @10, @100) for both datasets. Please find attached graphics that illustrate the problem.

I have used for LSQ++ the FAISS implementation (faiss.LocalSearchQuantizer(d, M, nbits)). @mdouze

Have you experienced this issue when testing LSQ++16x4?
I did a gridsearch on niter_train and niter_ils_train but have observed no difference in the drop...

Cheers
@k-amara

Hope to continue development,and support julia v1.6

There is no faiss.jl, so this repo is an important Ann package.

Documentation

We are writing our documentation under https://github.com/una-dinosauria/Rayuela.jl/tree/master/docs. Waiting on CI (#27) to deploy documentation once tests pass

Error in update_center!

I get the following error when running demos_train_query_base.jl

Training an optimized product quantizer
  0 3.384649e+04... ERROR: LoadError: MethodError: no method matching update_centers!(::Array{Float32,2}, ::Nothing, ::Array{Int64,1}, ::Array{Bool,1}, ::Array{Float32,2}, ::Array{Float64,1})
Closest candidates are:
  update_centers!(::AbstractArray{#s110,2} where #s110<:Real, ::Nothing, ::Array{Int64,1}, ::Array{Bool,1}, ::AbstractArray{#s109,2} where #s109<:AbstractFloat, ::Array{Int64,1}) at /home/s2eghbal/.julia/packages/Clustering/YmmQw/src/kmeans.jl:269
  update_centers!(::AbstractArray{#s110,2} where #s110<:Real, ::Array{W<:Real,1}, ::Array{Int64,1}, ::Array{Bool,1}, ::AbstractArray{#s109,2} where #s109<:Real, ::Array{W<:Real,1}) where W<:Real at /home/s2eghbal/.julia/packages/Clustering/YmmQw/src/kmeans.jl:316
  update_centers!(::Any, ::Any, ::Any, ::Any) at /home/s2eghbal/.julia/packages/Clustering/YmmQw/src/fuzzycmeans.jl:42
Stacktrace:
 [1] train_opq(::Array{Float32,2}, ::Int64, ::Int64, ::Int64, ::String, ::Bool) at /home/s2eghbal/.julia/dev/Rayuela/src/OPQ.jl:121
 [2] experiment_opq(::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{UInt32,1}, ::Int64, ::Int64, ::String, ::Int64, ::Int64, ::Bool) at /home/s2eghbal/.julia/dev/Rayuela/src/OPQ.jl:155
 [3] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /home/s2eghbal/.julia/dev/Rayuela/demos/demos_train_query_base.jl:35
 [4] top-level scope at /home/s2eghbal/.julia/dev/Rayuela/demos/demos_train_query_base.jl:171 [inlined]
 [5] top-level scope at ./none:0
 [6] include at ./boot.jl:326 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1038
 [8] include(::Module, ::String) at ./sysimg.jl:29
 [9] include(::String) at ./client.jl:403
 [10] top-level scope at none:0
in expression starting at /home/s2eghbal/.julia/dev/Rayuela/demos/demos_train_query_base.jl:170

Stochastic relaxations

Dimensions mismatch in ERVQ

Some experiments overnight crashed with this error.

=== Iteration 88 / 100 ===
Updating codebook 1... ERROR: LoadError: DimensionMismatch("tried to assign 0-element array to 784×1 destination")
Stacktrace:
 [1] throw_setindex_mismatch(::Array{Int64,1}, ::Tuple{Int64,Int64}) at ./indices.jl:94
 [2] setindex_shape_check(::Array{Int64,1}, ::Int64, ::Int64) at ./indices.jl:146
 [3] macro expansion at ./multidimensional.jl:554 [inlined]
 [4] _unsafe_setindex!(::IndexLinear, ::Array{Float32,2}, ::Array{Int64,1}, ::Base.Slice{Base.OneTo{Int64}}, ::Int64) at ./multidimensional.jl:549
 [5] macro expansion at ./multidimensional.jl:541 [inlined]
 [6] _setindex! at ./multidimensional.jl:537 [inlined]
 [7] setindex!(::Array{Float32,2}, ::Array{Int64,1}, ::Colon, ::Int64) at ./abstractarray.jl:968
 [8] train_ervq(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Int64, ::Int64, ::Int64, ::Bool) at /home/julieta/.julia/v0.6/Rayuela/src/ERVQ.jl:70
 [9] experiment_ervq_query_base(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Float32,2}, ::Array{UInt32,1}, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool) at /home/julieta/.julia/v0.6/Rayuela/src/ERVQ.jl:160
 [10] run_demos_query_base(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /home/julieta/.julia/v0.6/Rayuela/demos/demos.jl:210
 [11] macro expansion at /home/julieta/.julia/v0.6/Rayuela/demos/demos.jl:350 [inlined]
 [12] anonymous at ./<missing>:?
 [13] include_from_node1(::String) at ./loading.jl:576
 [14] include(::String) at ./sysimg.jl:14
 [15] process_options(::Base.JLOptions) at ./client.jl:305
 [16] _start() at ./client.jl:371
while loading /home/julieta/.julia/v0.6/Rayuela/demos/demos.jl, in expression starting on line 346

Reproducing ECCV 18 plots

This is a meta-issue of #26

We want to make it easy for people to reproduce our main results. These are the things that are missing for push-button reproducibility.

Make sure code works on Julia 1.0
Add a flag to use the GPU or use CPU only
Expose flag to choose codebook update method for LSQ/LSQ++
Homogeneize API of GPU and CPU methods (currently they are methods with different names)
Make demo code return and save time spent (currently only returns and saves results)
Add plotting code

MethodError: no method matching update_centers! when running demo

Hi, thank you for this library!

I follow the instructions and run the first demo, but I got an error during training opq:

ERROR: LoadError: MethodError: no method matching update_centers!(::Array{Float32,2}, ::Nothing, ::Array{Int64,1}, ::Array{Bool,1}, ::Array{Float32,2}, ::Array{Float64,1})
Closest candidates are:
  update_centers!(::AbstractArray{#s108,2} where #s108<:Real, ::Nothing, ::Array{Int64,1}, ::Array{Bool,1}, ::AbstractArray{#s107,2} where #s107<:AbstractFloat, ::Array{Int64,1}) at /home/zjt/.julia/packages/Clustering/tt9vc/src/kmeans.jl:283
  update_centers!(::AbstractArray{#s108,2} where #s108<:Real, ::Array{W<:Real,1}, ::Array{Int64,1}, ::Array{Bool,1}, ::AbstractArray{#s107,2} where #s107<:Real, ::Array{W<:Real,1}) where W<:Real at /home/zjt/.julia/packages/Clustering/tt9vc/src/kmeans.jl:330
  update_centers!(::Any, ::Any, ::Any, ::Any) at /home/zjt/.julia/packages/Clustering/tt9vc/src/fuzzycmeans.jl:48
Stacktrace:
 [1] train_opq(::Array{Float32,2}, ::Int64, ::Int64, ::Int64, ::String, ::Bool) at /home/zjt/.julia/dev/Rayuela/src/OPQ.jl:121
 [2] experiment_opq(::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{UInt32,1}, ::Int64, ::Int64, ::String, ::Int64, ::Int64, ::Bool) at /home/zjt/.julia/dev/Rayuela/src/OPQ.jl:155
 [3] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /home/zjt/.julia/dev/Rayuela/demos/demos_train_query_base.jl:35
 [4] top-level scope at /home/zjt/.julia/dev/Rayuela/demos/demos_train_query_base.jl:171 [inlined]
 [5] top-level scope at ./none:0
 [6] include at ./boot.jl:317 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1044
 [8] include(::Module, ::String) at ./sysimg.jl:29
 [9] include(::String) at ./client.jl:392
 [10] top-level scope at none:0
in expression starting at /home/zjt/.julia/dev/Rayuela/demos/demos_train_query_base.jl:170

Could you help me solve it?

Search code in Julia

Without multithreading and due to JuliaLang/julia#939, implementing the lookup-table-based search of MCQ in Julia is simply not competitive. We should revisit this once Julia gets decent multithreading and faster partial sort.

LSQ training on the GPU

We kind of already have all the components, we just have to glue them together and we should be able to train on big-ish datasets in reasonable times

Make SLS search work with encoded norms

For research purposes, whether we implement this or not does not matter

Packages that may be of interest

Posting this for anyone interested in ANN search and quantization
Two packages I have been working on recently

IVFADC.jl
QuantizedArrays.jl some code taken from here; at the moment, orthogonal PQ is broken

Future plans include implementing IVFOADC and LOPQ

LoadError: type CuArray has no field buf

I follow the instructions and run the first demo, but I got an error during training ChainQ:

ERROR: LoadError: type CuArray has no field buf
Stacktrace:
 [1] getproperty(::Any, ::Symbol) at ./sysimg.jl:18
 [2] quantize_chainq_cuda!(::Array{Int16,2}, ::Array{Float32,2}, ::Array{Array{Float32,2},1}, ::Array{Array{Float32,2},1}, ::UnitRange{Int64}) at /home/zjt/.julia/dev/Rayuela/src/ChainQ.jl:242
 [3] quantize_chainq(::Array{Float32,2}, ::Array{Array{Float32,2},1}, ::Bool, ::Bool) at /home/zjt/.julia/dev/Rayuela/src/ChainQ.jl:325
 [4] train_chainq(::Array{Float32,2}, ::Int64, ::Int64, ::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Int64, ::Bool) at /home/zjt/.julia/dev/Rayuela/src/ChainQ.jl:401
 [5] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /home/zjt/.julia/dev/Rayuela/demos/demos_train_query_base.jl:57
 [6] top-level scope at /home/zjt/.julia/dev/Rayuela/demos/demos_train_query_base.jl:171 [inlined]
 [7] top-level scope at ./none:0
 [8] include at ./boot.jl:317 [inlined]
 [9] include_relative(::Module, ::String) at ./loading.jl:1044
 [10] include(::Module, ::String) at ./sysimg.jl:29
 [11] include(::String) at ./client.jl:392
 [12] top-level scope at none:0
in expression starting at /home/zjt/.julia/dev/Rayuela/demos/demos_train_query_base.jl:170

Thank you!

Error in CuArrays.CuArray

When running demos_train_query_base.jl, I get the following error in line 65 of encode_icm_cuda_single:

WARNING: CuArrays.BLAS is deprecated, use CUBLAS instead.
  likely near /home/sepehr/mcq/demos_train_query_base.jl:174
ERROR: LoadError: MethodError: no method matching CuArrays.CuArray{Float32,N} where N(::Int64)
Closest candidates are:
  CuArrays.CuArray{Float32,N} where N(!Matched::AbstractArray{S,N}) where {T, N, S} at /home/sepehr/.julia/packages/CuArrays/PD3UJ/src/array.jl:93
  CuArrays.CuArray{Float32,N} where N(!Matched::LinearAlgebra.UniformScaling, !Matched::Tuple{Int64,Int64}) at /home/sepehr/.julia/packages/GPUArrays/t8tJB/src/construction.jl:30
  CuArrays.CuArray{Float32,N} where N(!Matched::LinearAlgebra.UniformScaling, !Matched::Integer, !Matched::Integer) at /home/sepehr/.julia/packages/GPUArrays/t8tJB/src/construction.jl:34
  ...
Stacktrace:
 [1] encode_icm_cuda_single(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Bool) at /home/sepehr/.julia/dev/Rayuela/src/LSQ_GPU.jl:65
 [2] encode_icm_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Int64, ::Bool) at /home/sepehr/.julia/dev/Rayuela/src/LSQ_GPU.jl:231
 [3] train_lsq_cuda(::Array{Float32,2}, ::Int64, ::Int64, ::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Int64, ::Int64, ::Int64, ::Bool, ::Int64, ::Int64, ::Bool) at /home/sepehr/.julia/dev/Rayuela/src/LSQ_GPU.jl:300
 [4] experiment_lsq_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{UInt32,1}, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool) at /home/sepehr/.julia/dev/Rayuela/src/LSQ_GPU.jl:345
 [5] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /home/sepehr/mcq/demos_train_query_base.jl:76
 [6] top-level scope at /home/sepehr/mcq/demos_train_query_base.jl:175 [inlined]
 [7] top-level scope at ./none:0
 [8] include at ./boot.jl:326 [inlined]
 [9] include_relative(::Module, ::String) at ./loading.jl:1038
 [10] include(::Module, ::String) at ./sysimg.jl:29
 [11] exec_options(::Base.JLOptions) at ./client.jl:267
 [12] _start() at ./client.jl:436
in expression starting at /home/sepehr/mcq/demos_train_query_base.jl:174

Divide data according to GPU memory

Either ask the user how many partitions to use, or determine this depending on how much memory the GPU has.

Use CUDAapi for building artefacts

Rayuela.jl/deps/build.jl

Lines 58 to 59 in a3a1bed

 `/usr/local/cuda/bin/nvcc -ptx ../src/cudautils.cu -o cudautils.ptx -arch=compute_35` 

 `/usr/local/cuda/bin/nvcc --shared -Xcompiler -fPIC -shared ../src/cudautils.cu -o cudautils.so -arch=compute_35`

While looking at https://discourse.julialang.org/t/freeing-memory-in-the-gpu-with-cudadrv-cudanative-cuarrays/10946, I ran into issues building the package because you can't assume nvcc lives there and you might need to pass -ccbin options. CUDAapi does that for you, see eg. https://github.com/JuliaGPU/CUDAnative.jl/blob/1833651e180fa71157a31f0b6d2588a0ad338c7e/test/perf/launch_overhead/build.jl

Same with the arch_ options, better figure that out accurately by looking at CUDAdrv, for maximal compatibility with user GPUs.

Whether to sample with or without replacement when perturbing the solution
Whether to initialize with OPQ/OTQ or neither.
Whether to reset the Bs after each iteration during OTQ/LSQ training.

chain quantizer GPU buffer error

In running the chain quantizer I get the following error:

Training a chain quantizer
 -2 2.394428e+04... 0.60 secs updating C
ERROR: LoadError: ArgumentError: cannot take the CPU address of a GPU buffer
Stacktrace:
 [1] unsafe_convert(::Type{Ptr{Float32}}, ::CUDAdrv.Mem.Buffer) at /home/s2eghbal/.julia/packages/CUDAdrv/lu32K/src/memory.jl:20
 [2] macro expansion at /home/s2eghbal/.julia/packages/CUDAdrv/lu32K/src/execution.jl:171 [inlined]
 [3] #_cudacall#24(::Int64, ::Tuple{Int64,Int64}, ::Int64, ::CUDAdrv.CuStream, ::typeof(CUDAdrv._cudacall), ::CUDAdrv.CuFunction, ::Type{Tuple{Ptr{Float32},Ptr{Float32},Int32,Int32}}, ::Tuple{CUDAdrv.Mem.Buffer,CUDAdrv.Mem.Buffer,Int32,Int32}) at /home/s2eghbal/.julia/packages/CUDAdrv/lu32K/src/execution.jl:154
 [4] (::getfield(CUDAdrv, Symbol("#kw##_cudacall")))(::NamedTuple{(:blocks, :threads),Tuple{Int64,Tuple{Int64,Int64}}}, ::typeof(CUDAdrv._cudacall), ::CUDAdrv.CuFunction, ::Type, ::Tuple{CUDAdrv.Mem.Buffer,CUDAdrv.Mem.Buffer,Int32,Int32}) at ./none:0
 [5] #cudacall#23 at /home/s2eghbal/.julia/packages/CUDAdrv/lu32K/src/execution.jl:146 [inlined]
 [6] (::getfield(CUDAdrv, Symbol("#kw##cudacall")))(::NamedTuple{(:blocks, :threads),Tuple{Int64,Tuple{Int64,Int64}}}, ::typeof(CUDAdrv.cudacall), ::CUDAdrv.CuFunction, ::Type, ::CUDAdrv.Mem.Buffer, ::CUDAdrv.Mem.Buffer, ::Int32, ::Int32) at ./none:0
 [7] vec_add(::Int64, ::Tuple{Int64,Int64}, ::CUDAdrv.Mem.Buffer, ::CUDAdrv.Mem.Buffer, ::Int32, ::Int32) at /home/s2eghbal/.julia/dev/Rayuela/src/CudaUtilsModule.jl:75
 [8] quantize_chainq_cuda!(::Array{Int16,2}, ::Array{Float32,2}, ::Array{Array{Float32,2},1}, ::Array{Array{Float32,2},1}, ::UnitRange{Int64}) at /home/s2eghbal/.julia/dev/Rayuela/src/ChainQ.jl:242
 [9] quantize_chainq(::Array{Float32,2}, ::Array{Array{Float32,2},1}, ::Bool, ::Bool) at /home/s2eghbal/.julia/dev/Rayuela/src/ChainQ.jl:325
 [10] train_chainq(::Array{Float32,2}, ::Int64, ::Int64, ::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Int64, ::Bool) at /home/s2eghbal/.julia/dev/Rayuela/src/ChainQ.jl:401
 [11] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /home/s2eghbal/.julia/dev/Rayuela/demos/demos_train_query_base.jl:60
 [12] top-level scope at /home/s2eghbal/.julia/dev/Rayuela/demos/demos_train_query_base.jl:174 [inlined]
 [13] top-level scope at ./none:0
 [14] include at ./boot.jl:326 [inlined]
 [15] include_relative(::Module, ::String) at ./loading.jl:1038
 [16] include(::Module, ::String) at ./sysimg.jl:29
 [17] include(::String) at ./client.jl:403
 [18] top-level scope at none:0
in expression starting at /home/s2eghbal/.julia/dev/Rayuela/demos/demos_train_query_base.jl:173

Lint

Perhaps with Lint.jl.

CUDA out of memory issue

Sorry to bother again.
What is the minimal memory requirement for the GPU?

Creating 500000 random states... done in 4.35 seconds
ERROR: LoadError: CUDA error: out of memory (code #2, ERROR_OUT_OF_MEMORY)
Stacktrace:
 [1] macro expansion at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [3] alloc at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [4] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/memory.jl:251
 [5] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [6] macro expansion at ./util.jl:213 [inlined]
 [7] alloc(::Int64) at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [8] CuArrays.CuArray{Float32,2}(::Tuple{Int64,Int64}) at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [9] similar at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/array.jl:61 [inlined]
 [10] gemm at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/blas/wrap.jl:903 [inlined]
 [11] encode_icm_cuda_single(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:71
 [12] encode_icm_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:249
 [13] experiment_lsq_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{UInt32,1}, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:352
 [14] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:72
 [15] top-level scope at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:171 [inlined]
 [16] top-level scope at ./none:0
 [17] include at ./boot.jl:317 [inlined]
 [18] include_relative(::Module, ::String) at ./loading.jl:1038
 [19] include(::Module, ::String) at ./sysimg.jl:29
 [20] include(::String) at ./client.jl:398
 [21] top-level scope at none:0
in expression starting at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:170

ERROR: LoadError: MethodError: no method matching repick_unused_centers(::Array{Float32,2}, ::Array{Float32,1}, ::Array{Float32,2}, ::Array{Int64,1})
Closest candidates are:
  repick_unused_centers(::AbstractArray{#s108,2} where #s108<:Real, ::Array{#s107,1} where #s107<:Real, ::AbstractArray{#s106,2} where #s106<:AbstractFloat, ::Array{Int64,1}, ::Distances.SemiMetric) at /home/zjt/.julia/packages/Clustering/tt9vc/src/kmeans.jl:377
Stacktrace:
 [1] quantize_rvq(::Array{Float32,2}, ::Array{Array{Float32,2},1}, ::Bool) at /home/zjt/.julia/dev/Rayuela/src/RVQ.jl:51
 [2] experiment_rvq(::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{UInt32,1}, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool) at /home/zjt/.julia/dev/Rayuela/src/RVQ.jl:142
 [3] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /home/zjt/.julia/dev/Rayuela/demos/demos_train_query_base.jl:41
 [4] top-level scope at /home/zjt/.julia/dev/Rayuela/demos/demos_train_query_base.jl:171 [inlined]
 [5] top-level scope at ./none:0
 [6] include at ./boot.jl:317 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1044
 [8] include(::Module, ::String) at ./sysimg.jl:29
 [9] include(::String) at ./client.jl:392
 [10] top-level scope at none:0
in expression starting at /home/zjt/.julia/dev/Rayuela/demos/demos_train_query_base.jl:170

Get LSQ allocations down.

16 Million allocations for LSQ on SIFT1M is waaaay to much.

Remove unused CUDA kernels

Some CUDA code is never called from julia, which is confusing, eg

Rayuela.jl/deps/src/cudautils.cu

Lines 85 to 95 in ccf22ab

 __device__ void _veccost( 

 float *d_rx, // data to use (X) 

 float *d_codebooks, // codebooks (C) 

 unsigned char *d_codes, // the codes (B) 

 float *d_veccost, // where to save the cost 

 int m, // number of codebooks 

 int n) { // number of vectors in X 

 // FIXME hard-coding 256 entries in each codebook, and 128 dimensions 

 const int H = 256; // size of each codebook 

 const int D = 128; // dimensionality of each vector

as reported in #39. We should remove unused code in general.

WARNING: CuArrays.BLAS is deprecated, use CUBLAS instead.
  likely near /home/sepehr/mcq/demos_train_query_base.jl:172
┌ Warning: implicit `dims=2` argument now has to be passed explicitly to specify that distances between columns should be computed
│   caller = ip:0x0
└ @ Core :-1
┌ Warning: implicit `dims=2` argument now has to be passed explicitly to specify that distances between columns should be computed
│   caller = _kmeans!(::Array{Float32,2}, ::Nothing, ::Array{Float32,2}, ::Array{Int64,1}, ::Array{Float32,1}, ::Array{Int64,1}, ::Array{Float64,1}, ::Int64, ::Float64, ::Int64, ::Distances.SqEuclidean) at kmeans.jl:115
└ @ Clustering ~/.julia/packages/Clustering/pvAp6/src/kmeans.jl:115
ERROR: LoadError: MethodError: no method matching update_centers!(::Array{Float32,2}, ::Nothing, ::Array{Int64,1}, ::Array{Bool,1}, ::Array{Float32,2}, ::Array{Float32,1})
Closest candidates are:
  update_centers!(::AbstractArray{T<:Real,2}, ::Nothing, ::Array{Int64,1}, ::Array{Bool,1}, ::AbstractArray{T<:Real,2}, !Matched::Array{Float64,1}) where T<:Real at /home/sepehr/.julia/packages/Clustering/pvAp6/src/kmeans.jl:247
  update_centers!(::AbstractArray{T<:Real,2}, !Matched::AbstractArray{#s106,1} where #s106<:Real, ::Array{Int64,1}, ::Array{Bool,1}, ::AbstractArray{T<:Real,2}, !Matched::Array{Float64,1}) where T<:Real at /home/sepehr/.julia/packages/Clustering/pvAp6/src/kmeans.jl:295
  update_centers!(::Any, ::Any, ::Any, ::Any) at /home/sepehr/.julia/packages/Clustering/pvAp6/src/fuzzycmeans.jl:30
Stacktrace:
 [1] train_opq(::Array{Float32,2}, ::Int64, ::Int64, ::Int64, ::String, ::Bool) at /home/sepehr/.julia/dev/Rayuela/src/OPQ.jl:121
 [2] experiment_opq(::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{UInt32,1}, ::Int64, ::Int64, ::String, ::Int64, ::Int64, ::Bool) at /home/sepehr/.julia/dev/Rayuela/src/OPQ.jl:155
 [3] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /home/sepehr/mcq/demos_train_query_base.jl:37
 [4] top-level scope at /home/sepehr/mcq/demos_train_query_base.jl:173 [inlined]
 [5] top-level scope at ./none:0
 [6] include at ./boot.jl:326 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1038
 [8] include(::Module, ::String) at ./sysimg.jl:29
 [9] exec_options(::Base.JLOptions) at ./client.jl:267
 [10] _start() at ./client.jl:436
in expression starting at /home/sepehr/mcq/demos_train_query_base.jl:172

	`/usr/local/cuda/bin/nvcc -ptx ../src/cudautils.cu -o cudautils.ptx -arch=compute_35`
	`/usr/local/cuda/bin/nvcc --shared -Xcompiler -fPIC -shared ../src/cudautils.cu -o cudautils.so -arch=compute_35`

	__device__ void _veccost(
	float *d_rx, // data to use (X)
	float *d_codebooks, // codebooks (C)
	unsigned char *d_codes, // the codes (B)
	float *d_veccost, // where to save the cost
	int m, // number of codebooks
	int n) { // number of vectors in X

	// FIXME hard-coding 256 entries in each codebook, and 128 dimensions
	const int H = 256; // size of each codebook
	const int D = 128; // dimensionality of each vector

una-dinosauria / rayuela.jl Goto Github PK

rayuela.jl's People

Contributors

Stargazers

Watchers

Forkers

rayuela.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs