GithubHelp home page GithubHelp logo

cash's Introduction

hive-cassandra-handler

This handler provides hive support for cassandra.

Check out and Installation

check out the project from https://github.com/tuplejump/cash.git.

> git clone https://github.com/tuplejump/cash.git

Go to branch cas-support-cql

> git checkout cas-support-cql

Go to cassandra-handler folder run the maven package command.

> mvn package

This generates a hive-cassandra-x.x.x.jar file in target folder and all other project dependencies are downloaded to target/dependency.

Copy the target/hive-cassandra-x.x.x.jar to the hive lib directory

Copy target/dependency/cassandra-all-x.x.x.jar and target/dependency/cassandra-thrift-x.x.x.jar to hive lib directory.

Running

Start hive

> bin/hive

Create a database in hive (or use an existing one)

To create a cql3 table in cassandra from hive, execute the following command

 hive> CREATE EXTERNAL TABLE test.messages(message_id string, author string, body string)
    STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler'
    WITH SERDEPROPERTIES ("cql.primarykey" = "message_id, author", "comment"="check", "read_repair_chance" = "0.2",
    "dclocal_read_repair_chance" = "0.14", "gc_grace_seconds" = "989898", "bloom_filter_fp_chance" = "0.2",
    "compaction" = "{'class' : 'LeveledCompactionStrategy'}", "replicate_on_write" = "false", "caching" = "all")

where 'test' is the keyspace in cassandra. The above query also creates a column family in cassandra if does not exist.

Note: By default hive tries to use cassandra running on localhost at port 9160. To change these specify cassandra.host and cassandra.port in the SerDeProperties while creating a table. Hive connects to the specified cassandra instance for further queries to the table.

To create a keyspace that does not exist in cassandra execute the following query

hive> CREATE EXTERNAL TABLE test.messages(row_key string, col1 string, col2 string)
STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES("cql.primarykey" = "row_key")
TBLPROPERTIES ("cassandra.ks.name" = "mycqlks", "cassandra.ks.stratOptions"="'DC':1, 'DC2':1",
"cassandra.ks.strategy"="NetworkTopologyStrategy");

Note: For brevity, only minimal SERDEPROPERTIES are given in the above query.

If 'test' keyspace does not exist in cassandra it will be created.

Inserting values into CQL3 table through hive:

hive> insert into table messages select * from tweets;

The values from tweets table are appended to messages table.

Note: With Cassandra INSERT OVERWRITE is same as INSERT INTO as Cassandra merges changes if keys are same.

Retrieving values from a CQL3 table using hive:

hive> select * from messages;

Note: If local mode execution is not enabled, hive compiler generates map-reduce jobs for most queries. These jobs are then submitted to the Map-Reduce cluster. The map-reduce jobs need hive-cassandra-handler, cassandra-all and cassandra-thrift jars. Point the HIVE_AUX_JARS_PATH environment variable to the location containing these jars to run those jobs successfully.

While CqlStorageHandler is used to create/access cql3 tables in cassandra, CassandraStorageHandler can be used to create/access thrift tables in cassandra.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.