GithubHelp home page GithubHelp logo

cqldriver.jl's Introduction

CQLdriver

Build Status

This Julia package is an interface to ScyllaDB / Cassandra and is based on the Datastax CPP driver implementing the CQL v3 binary protocol. The package is missing very many features, but it does two things quite well:

  • write very many rows quickly
  • read very many rows quickly

Now, it's probably easy to extend this package to enable other features, but I haven't taken the time to do so. If you find this useful but are missing a small set of features I can probably implement them if you file an issue. CQLdriver is compatible and depends on DataFrames and JuliaDB.

Currently the following data-types are supported:

Julia Type CQL type
Vector{UInt8} BLOB
String TEXT
String VARCHAR
Date DATE
Int8 TINYINT
Int16 SMALLINT
Int32 INTEGER
Int64 BIGINT
Int64 COUNTER
Bool BOOLEAN
Float32 FLOAT
Float64 DOUBLE
DateTime TIMESTAMP
UUID UUID
UUID TIMEUUID

Example use

Starting / Closing a session

cqlinit() will return a tuple with 2 pointers and a UInt16 error code which you can check. If the returned value is 0 then you're in good shape. It also lets you tune some performance characteristics of your connection.

julia> session, cluster, err = cqlinit("192.168.1.128, 192.168.1.140")
julia> const CQL_OK = 0x0000
julia> @assert err == CQL_OK
julia> cqlclose(session, cluster)

julia> hosts = "192.168.1.128, 192.168.1.140"
julia> session, cluster, err = cqlinit(hosts, threads = 1, connections = 2, 
                                       queuesize = 4096, bytelimit = 65536, requestlimit = 256,
                                       username="admin", password="s3cr!t")
julia> cqlclose(session, cluster)

The driver tries to be smart about detecting all the nodes in the cluster and keeping the connection alive.

Writing data

cqlwrite() takes a DataFrame with named columns, or a JuliaDB table. Make sure that the column names in your DataFrame are the same as those in table you are writing to. By default it will write 1000 rows per batch and will make 5 attemps at writing each batch.

For appending new rows to tables:

julia> table = "data.refrigerator"
julia> data = DataFrame(veggies = ["Carrots", "Broccoli"], amount = [3, 5])
julia> err = cqlwrite(session, table, data)

For updating a table you must provide additional arguments. Consider the following statement which updates a table that uses counters: UPDATE data.car SET speed = speed + ?, temp = temp + ? WHERE partid = ? The query below is analogous to the statement above:

julia> table = "data.car"
julia> data = DataFrame(speed=[1,2], temp=[4,5], partid=["wheel1","wheel2"])
julia> err = cqlwrite(session, 
                      table, 
                      data[:,[:speed, :total]],
                      update = data[:,[:partid]],
                      batchsize = 10000,
                      retries = 6,
                      counter = true)

Reading data

cqlread() pulls down data in 10000-row pages by default. It will do 5 retries per page and collate everything into a DataFrame with typed and named columns.

julia> query = "SELECT * FROM data.car"
julia> err, output = cqlread(session, query)

(0x0000, 2×3 DataFrames.DataFrame
│ Row │ speed │ temp │ partid   │
├┼┼┼┤
│ 1   │ 1     │ 4    │ "wheel1" │
│ 2   │ 2     │ 5    │ "wheel2" │)

Changing the page size might affect performance. You can also increase the number of characters allowed for string types.

julia> query = "SELECT * FROM data.bigtable LIMIT 1000000"
julia> err, output = cqlread(session, 
                             query, 
                             pgsize = 15000, 
                             retries = 6, 
                             strlen = 1024)

You can send in an array of different queries and the driver will execute them asynchronously and return an array of resulting dataframes.

julia> query = ["SELECT * FROM data.bigtable WHERE driver=124","SELECT * FROM data.smalltable WHERE car=144"]
julia> err, output = cqlread(session, 
                             query, 
                             concurrency=500, 
                             timeout = 12000)

Executing commands

cqlexec() runs your command on the database and returns a 0x0000 if everything went OK.

julia> cmd = "CREATE TABLE test.example (id int, data text, PRIMARY KEY (id));"
julia> err = cqlexec(session, cmd)

cqldriver.jl's People

Contributors

chrisalexander-kensho avatar garethhu avatar lukemerrick avatar r3tex avatar racinmat avatar raikao avatar s439 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cqldriver.jl's Issues

Imports fix and uuid support

Hi, I have got two local branches, one where I have added missing dependencies for this project, and a second where I have added uuid support. If you add me to the repo I can raise a PR for these.

ERROR: LoadError: UndefVarError: is_linux not defined when building

(v1.3) pkg> build CQLdriver
  Building CQLdriver → `~/.julia/packages/CQLdriver/QjN09/deps/build.log`
┌ Error: Error building `CQLdriver`: 
│ ERROR: LoadError: UndefVarError: is_linux not defined
│ Stacktrace:
│  [1] top-level scope at /home/sgao/.julia/packages/CQLdriver/QjN09/deps/build.jl:1
│  [2] include at ./boot.jl:328 [inlined]
│  [3] include_relative(::Module, ::String) at ./loading.jl:1105
│  [4] include(::Module, ::String) at ./Base.jl:31
│  [5] include(::String) at ./client.jl:424
│  [6] top-level scope at none:5
│ in expression starting at /home/sgao/.julia/packages/CQLdriver/QjN09/deps/build.jl:1
└ @ Pkg.Operations /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Pkg/src/backwards_compatible_isolation.jl:649

Please provide authentication for CQL connection

My issue is as below.

julia> session, cluster, err = cqlinit("192.168.0.60")
Error in CQL operation: Session Connect
Authentication required but no auth provider set
(Ptr{CQLdriver.CassSession} @0x00000000048c8a90, Ptr{CQLdriver.CassCluster} @0x00000000047bf8b0, 0x0100)

How to provide authentication? Please suggest if I am missing something.

Error: could not load library "libcassandra.so.2"

Looks like the driver is looking for a shared object file from libcassandra, which does not seem to be present at any of the usual locations.

julia> session, cluster, err = cqlinit("foo, bar, baz")
ERROR: could not load library "libcassandra.so.2"
dlopen(libcassandra.so.2.dylib, 0x0001): tried: '/Applications/Julia-1.7.app/Contents/Resources/julia/lib/julia/libcassandra.so.2.dylib' (no such file), '/Applications/Julia-1.7.app/Contents/Resources/julia/bin/../lib/libcassandra.so.2.dylib' (no such file), 'libcassandra.so.2.dylib' (no such file), '/usr/local/lib/libcassandra.so.2.dylib' (no such file), '/usr/lib/libcassandra.so.2.dylib' (no such file)
Stacktrace:
 [1] cql_cluster_new
   @ ~/.julia/packages/CQLdriver/4R8iQ/src/cqlwrapper.jl:107 [inlined]
 [2] cqlinit(hosts::String; username::String, password::String, whitelist::String, blacklist::String, threads::Int64, connections::Int64, queuesize::Int64, bytelimit::Int64, requestlimit::Int64)
   @ CQLdriver ~/.julia/packages/CQLdriver/4R8iQ/src/CQLdriver.jl:55
 [3] cqlinit(hosts::String)
   @ CQLdriver ~/.julia/packages/CQLdriver/4R8iQ/src/CQLdriver.jl:55
 [4] top-level scope
   @ REPL[2]:1

I verified my cassandra installation is correct and even tried installing the C++ drivers that this library seems to be based on:

$ brew info cassandra
cassandra: stable 4.0.5 (bottled)
Eventually consistent, distributed key-value store
https://cassandra.apache.org
/opt/homebrew/Cellar/cassandra/4.0.5 (1,168 files, 70.5MB) *
  Poured from bottle on 2022-08-01 at 14:22:30
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/cassandra.rb
License: Apache-2.0
==> Dependencies
Build: libcython ✔
Required: openjdk@11 ✔, [email protected] ✔, six ✔
==> Caveats
To restart cassandra after an upgrade:
  brew services restart cassandra
Or, if you don't want/need a background service you can just run:
  /opt/homebrew/opt/cassandra/bin/cassandra -f
==> Analytics
install: 2,862 (30 days), 8,196 (90 days), 27,313 (365 days)
install-on-request: 2,860 (30 days), 8,191 (90 days), 27,301 (365 days)
build-error: 0 (30 days)
$ brew info cassandra-cpp-driver
cassandra-cpp-driver: stable 2.16.2 (bottled), HEAD
DataStax C/C++ Driver for Apache Cassandra
https://docs.datastax.com/en/developer/cpp-driver/latest
/opt/homebrew/Cellar/cassandra-cpp-driver/2.16.2 (10 files, 2.5MB) *
  Poured from bottle on 2022-08-02 at 10:31:25
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/cassandra-cpp-driver.rb
License: Apache-2.0
==> Dependencies
Build: cmake ✔
Required: libuv ✔, [email protected] ✔
==> Options
--HEAD
	Install HEAD version
==> Analytics
install: 29 (30 days), 56 (90 days), 209 (365 days)
install-on-request: 29 (30 days), 56 (90 days), 209 (365 days)
build-error: 2 (30 days)

It seems to me that I might simply be missing a symlink to the shared object or that Julia is just looking for it in the wrong places because the Cassandra drivers seem to work fine with cqlsh or with Python on my computer. Any idea where I can find the missing file or if I am missing a dependency for this library?

Making JuliaDB optional dependency or making new package for adding support for JuliaDB

JuliaDB is currently going through time of frozen dependencies, they experience obscure bug and in order to catch it, all dependiencies have upper bounds on their versions, some of them being several years old.
This makes this package almost unusable with newer Julia or newer package versions even when user does not need JuliaDB.

Although it's always posiible to fork it and use modified version on your own,
this is the currently the only one registered package with CQL driver, so I think it's worth to make it more accessible.

I can make necessary PR and create the bridge package between this and JuliaDB if it makes sense.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.