GithubHelp home page GithubHelp logo

stratio / deep-spark Goto Github PK

View Code? Open in Web Editor NEW
197.0 115.0 42.0 6.22 MB

Connecting Apache Spark with different data stores [DEPRECATED]

Home Page: http://stratio.github.io/deep-spark

License: Apache License 2.0

Java 97.79% Scala 0.11% Shell 2.10%

deep-spark's Introduction

*Disclaimer: As of 01/06/2015 this project has been deprecated. Thank you for your understanding and continued help throughout the project's life.

What is Deep?

Deep is a thin integration layer between Apache Spark and several NoSQL datastores. We actually support Apache Cassandra, MongoDB, Elastic Search, Aerospike, HDFS, S3 and any database accessible through JDBC, but in the near future we will add support for sever other datastores.

Install ojdbc driver

In order to compile the deep-jdbc module is necessary to add the Oracle ojdbc driver into your local repository. You can download it from the URL: http://www.oracle.com/technetwork/database/features/jdbc/default-2280470.html. When you are on the web you must click in "Accept License Agreement" and later downlad ojdbc7.jar library. You need a free oracle account to download the official driver.

To install the ojdbc driver in your local repository you must execute the command below:

mvn install:install-file -Dfile= -DgroupId=com.oracle -DartifactId=ojdbc7 -Dversion=12.1.0.2 -Dpackaging=jar

Compiling Deep

After that you can compile Deep executing the following steps:

cd deep-parent

mvn clean install

Creating Deep Dristribution

If you want to create a Deep distribution you must execute the following steps:

cd deep-scripts

make-distribution-deep.sh

During the creation you'll see the following question:

What tag want to use for Aerospike native repository?

You must type 0.7.0 and press enter.

Apache Cassandra integration

The integration is not based on the Cassandra's Hadoop interface.

Deep comes with an user friendly API that lets developers create Spark RDDs mapped to Cassandra column families. We provide two different interfaces:

  • The first one will let developers map Cassandra tables to plain old java objects (POJOs), just like if you were using any other ORM. We call this API the 'entity objects' API. This abstraction is quite handy, it will let you work on RDD (under the hood Deep will transparently map Cassandra's columns to entity properties). Your domain entities must be correctly annotated using Deep annotations (take a look at deep-core example entities in package com.stratio.deep.core.entity).

  • The second one is a more generic 'cell' API, that will let developerss work on RDD<com.stratio.deep.entity.Cells> where a 'Cells' object is a collection of com.stratio.deep.entity.Cell objects. Column metadata is automatically fetched from the data store. This interface is a little bit more cumbersome to work with (see the example below), but has the advantage that it doesn't require the definition of additional entity classes. Example: you have a table called 'users' and you decide to use the 'Cells' interface. Once you get an instance 'c' of the Cells object, to get the value of column 'address' you can issue a c.getCellByName("address").getCellValue(). Please, refer to the Deep API documentation to know more about the Cells and Cell objects.

We encourage you to read the more comprehensive documentation hosted on the Openstratio website.

Deep comes with an example sub project called 'deep-examples' containing a set of working examples, both in Java and Scala. Please, refer to the deep-example project README for further information on how to setup a working environment.

MongoDB integration

Spark-MongoDB connector is based on Hadoop-mongoDB.

Support for MongoDB has been added in version 0.3.0.

We provide two different interfaces:

  • ORM API, you just have to annotate your POJOs with Deep annotations and magic will begin, you will be able to connect MongoDB with Spark using your own model entities.

  • Generic cell API, you do not need to specify the collection's schema or add anything to your POJOs, each document will be transform to an object "Cells".

We added a few working examples for MongoDB in deep-examples subproject, take a look at:

Entities:

  • com.stratio.deep.examples.java.ReadingEntityFromMongoDB
  • com.stratio.deep.examples.java.WritingEntityToMongoDB
  • com.stratio.deep.examples.java.GroupingEntityWithMongoDB

Cells:

  • com.stratio.deep.examples.java.ReadingCellFromMongoDB
  • com.stratio.deep.examples.java.WritingCellToMongoDB
  • com.stratio.deep.examples.java.GroupingCellWithMongoDB

You can check out our first steps guide here:

First steps with Deep-MongoDB

We are working on further improvements!

ElasticSearch integration

Support for ElasticSearch has been added in version 0.5.0.

Aerospike integration

Support for Aerospike has been added in version 0.6.0.

Examples:

Entities:

  • com.stratio.deep.examples.java.ReadingEntityFromAerospike
  • com.stratio.deep.examples.java.WritingEntityToAerospike
  • com.stratio.deep.examples.java.GroupingEntityWithAerospike

Cells:

  • com.stratio.deep.examples.java.ReadingCellFromAerospike
  • com.stratio.deep.examples.java.WritingCellToAerospike
  • com.stratio.deep.examples.java.GroupingCellWithAerospike

JDBC integration

Support for JDBC has been added in version 0.7.0.

Examples:

Entities:

  • package com.stratio.deep.examples.java.ReadingEntityWithJdbc
  • package com.stratio.deep.examples.java.WritingEntityWithJdbc

Cells:

  • package com.stratio.deep.examples.java.ReadingCellWithJdbc
  • package com.stratio.deep.examples.java.WritingCellWithJdbc

Requirements

  • Cassandra, we tested versions from 1.2.8 up to 2.0.11 (for Spark <=> Cassandra integration).
  • MongoDB, we tested the integration with MongoDB versions 2.2, 2.4 y 2.6 using Standalone, Replica Set and Sharded Cluster (for Spark <=> MongoDB integration).
  • ElasticSearch, 1.3.0+
  • Aerospike, 3.3.0+
  • Spark 1.1.1
  • Apache Maven >= 3.0.4
  • Java 1.7
  • Scala 2.10.3

Configure the development and test environment

  • Clone the project

  • To configure a development environment in Eclipse: import as Maven project. In IntelliJ: open the project by selecting the deep-parent POM file

  • Install the project in you local maven repository. Enter deep-parent subproject and perform: mvn clean install (add -DskipTests to skip tests)

  • Put Deep to work on a working cassandra + spark cluster. You have several options:

    • Download a pre-configured Stratio platform VM Stratio's BigData platform (SDS). This VM will work on both Virtualbox and VMWare, and comes with a fully configured distribution that also includes Stratio Deep. We also distribute the VM with several preloaded datasets in Cassandra. This distribution will include Stratio's customized Cassandra distribution containing our powerful open-source lucene-based secondary indexes, see Stratio documentation for further information. Once your VM is up and running you can test Deep using the shell. Enter /opt/sds and run bin/stratio-deep-shell.

    • Install a new cluster using the Stratio installer. Please refer to Stratio's website to download the installer and its documentation.

    • You already have a working Cassandra server on your development machine: you need a spark+deep bundle, we suggest to create one by running:

      cd deep-scripts

      ./make-distribution-deep.sh

    this will build a Spark distribution package with StratioDeep and Cassandra's jars included (depending on your machine this script could take a while, since it will compile Spark from sources). The package will be called spark-deep-distribution-X.Y.Z.tgz, untar it to a folder of your choice, enter that folder and issue a ./stratio-deep-shell, this will start an interactive shell where you can test StratioDeep (you may have noticed this is will start a development cluster started with MASTER="local").

    • You already have a working installation os Cassandra and Spark on your development machine: this is the most difficult way to start testing Deep, but you know what you're doing you will have to

      1. copy the Stratio Deep jars to Spark's 'jars' folder ($SPARK_HOME/jars).
      2. copy Cassandra's jars to Spark's 'jar' folder.
      3. copy Datastax Java Driver jar (v 2.0.x) to Spark's 'jar' folder.
      4. start spark shell and import the following:

      import com.stratio.deep.commons.annotations._

      import com.stratio.deep.commons.config._

      import com.stratio.deep.commons.entity._

      import com.stratio.deep.core.context._

      import com.stratio.deep.cassandra.config._

      import com.stratio.deep.cassandra.extractor._

      import com.stratio.deep.mongodb.config._

      import com.stratio.deep.mongodb.extractor._

      import com.stratio.deep.es.config._

      import com.stratio.deep.es.extractor._

      import com.stratio.deep.aerospike.config._

      import com.stratio.deep.aerospike.extractor._

      import org.apache.spark.rdd._

      import org.apache.spark.SparkContext._

      import org.apache.spark.sql.api.java.JavaSQLContext

      import org.apache.spark.sql.api.java.JavaSchemaRDD

      import org.apache.spark.sql.api.java.Row

      import scala.collection.JavaConversions._

Once you have a working development environment you can finally start testing Deep. This are the basic steps you will always have to perform in order to use Deep:

First steps with Spark and Cassandra

  • Build an instance of a configuration object: this will let you tell Deep the Cassandra endpoint, the keyspace, the table you want to access and much more. It will also let you specify which interface to use (the domain entity or the generic interface). We have a factory that will help you create a configuration object using a fluent API. Creating a configuration object is an expensive operation. Please take the time to read the java and scala examples provided in 'deep-examples' subproject and to read the comprehensive documentation at OpenStratio website.
  • Create an RDD: using the DeepSparkContext helper methods and providing the configuration object you've just instantiated.
  • Perform some computation over this RDD(s): this is up to you, we only help you fetching the data efficiently from Cassandra, you can use the powerful Spark API.
  • (optional) write the computation results out to Cassandra: we provide a way to efficiently save the result of your computation to Cassandra. In order to do that you must have another configuration object where you specify the output keyspace/column family. We can create the output column family for you if needed. Please, refer to the comprehensive Stratio Deep documentation at Stratio website.

First steps with Spark and MongoDB

  • Build an instance of a configuration object: this will let you tell Stratio Deep the MongoDB endpoint, the MongoDB database and collection you want to access and much more. It will also let you specify which interface to use (the domain entity). We have a factory that will help you create a configuration object using a fluent API. Creating a configuration object is an expensive operation. Please take the time to read the java and scala examples provided in 'deep-examples' subproject and to read the comprehensive Deep documentation at OpenStratio website.
  • Create an RDD: using the DeepSparkContext helper methods and providing the configuration object you've just instantiated.
  • Perform some computation over this RDD(s): this is up to you, we only help you fetching the data efficiently from MongoDB, you can use the powerful Spark API.
  • (optional) write the computation results out to MongoDB: we provide a way to efficiently save the result of your computation to MongoDB.

Migrating from version 0.2.9

From version 0.4.x, Deep supports multiple datastores, in order to correctly implement this new feature Deep has undergone an huge refactor between versions 0.2.9 and 0.4.x. To port your code to the new version you should take into account a few changes we made.

New Project Structure

From version 0.4.x, Deep supports multiple datastores, in your project you should import only the maven dependency you will use: deep-cassandra, deep-mongodb, deep-elasticsearch or deep-aerospike.

Changes to 'com.stratio.deep.entity.Cells'

  • Until version 0.4.x the 'Cells' was implicitly associated to a record coming from a specific table. When performing a join in Spark, 'Cell' objects coming from different tables are mixed into an single 'Cells' object. Deep now keeps track of the original table a Cell object comes from, changing the internal structure of 'Cells', where each 'Cell' is associated to its 'table'.
    1. If you are a user of 'Cells' objects returned from Deep, nothing changes for you. The 'Cells' API keeps working as usual.
    2. If you manually create 'Cells' objects you can keep using the original API, in this case each Cell you add to your Cells object is automatically associated to a default table name.
    3. You can specify the default table name, or let Deep chose an internal default table name for you.
    4. We added a new constructor to 'Cells' accepting the default table name. This way the 'old' API will always manipulate 'Cell' objects associated to the specified default table.
    5. For each method manipulating the content of a 'Cells' object, we added a new method that also accepts the table name: if you call the method whose signature does not have the table name, the table action is performed over the Cell associated to the default table, otherwise the action is performed over the 'Cell'(s) associated to the specified table.
    6. size() y isEmpty() will compute their results taking into account all the 'Cell' objects contained.
    7. size(String tableName) and isEmpty(tableName) compute their result taking into account only the 'Cell' objects associated to the specified table.
    8. Obviously, when dealing with Cells objects, Deep always associates a Cell to the correct table name.

Examples:

Cells cells1 = new Cells(); // instantiate a Cells object whose default table name is generated internally.
Cells cells2 = new Cells("my_default_table"); // creates a new Cells object whose default table name is specified by the user
cells2.add(new Cell(...)); // adds to the 'cells2' object a new Cell object associated to the default table
cells2.add("my_other_table", new Cell(...)); // adds to the 'cells2' object a new Cell associated to "my_other_table"  

Changes to objects hierarchy

  • IDeepJobConfig interface has been splitted into ICassandraDeepJobConfig and IMongoDeepJobConfig sub-interfaces. Each sub-interface exposes only the configuration properties that make sense for each data base. com.stratio.deep.config.DeepJobConfigFactory's factory methods now return the proper subinterface.
  • DeepSparkContext has been splitted into CassandraDeepSparkContext and MongoDeepSparkContext.
  • DeepJobConfigFactory has been renamed to ConfigFactory (to reduce verbosity).

RDD creation

Methods used to create Cell and Entity RDD has been merged into one single method:

  • DeepSparkContext: createRDD(...)

deep-spark's People

Contributors

aagea avatar albertoperezsanz avatar aperez-stratio avatar darroyo-stratio avatar dgomezperez avatar formacionmf avatar hrodriguez-stratio avatar josegom avatar mariomgal avatar opuertas avatar pmadrigal avatar rcrespodelosreyes avatar robertomorandeira avatar smola avatar stratioadmin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-spark's Issues

All steps to Aerospike interaction does not work properly

Hi,

As Stratio Manager does not support Aerospike yet. Vagrant sandbox doesn't contain Aerospike bundles either. I had to build deep-spark from sources.

bin/stratio-deep-shell does not see anything in com.stratio.*. So, I had to manually launch Spark via bin/spark-shell --jars $(echo lib/*.jar | tr ' ' ',').

Then First Steps with Stratio Deep and Aerospike does not work as there are no MessageTestEntity and WordCount in compiled jars. And I haven't managed to implement those is plain Scala to paste it in REPL. (Something about NoSuchMethod exception and <init>).

Then I composed custom SBT project, and put all jars in lib dir to unmanaged dependencies. Now it complains:

[error] (run-main-f) java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
        at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
        at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
        at akka.actor.RootActorPath.$div(ActorPath.scala:159)
        at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:464)
        at akka.remote.RemoteActorRefProvider.<init>(RemoteActorRefProvider.scala:124)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  // ...

Where is the correct path in whole documentation and examples to launch deep-spark with Aerospike?

scala.MatchError: interface java.util.List

Hi,
i am trying to create a SchemaRDD of a cassandra table which contains a column with List as a datatype.
My bean class looks like this,

import org.apache.cassandra.db.marshal.ListType;
import java.util.List;

@deepfield(fieldName = "follow_user", validationClass = ListType.class)
private List follow_user;

To create schemaRDD i done the following

JavaSchemaRDD schemaSummaryPeople = sqlContext.applySchema(
inputSummaryRDD, bean.class);

But its giving me an error ,
Exception in thread "main" scala.MatchError: interface java.util.List (of class java.lang.Class)
at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:216)
at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:215)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:215)
at org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:100)

Am getting this same error when i try Set instead of List also.

ERROR: parameter class java.util.ArrayList does not have a Cassandra marshaller (v0.4.0)

This error occurs in version 0.4.0 when writing a Cells RDD to Cassandra that contains a collection (List, Map or Set). Here is the trace when trying to write a ArrayList:

com.stratio.deep.exception.DeepGenericException: parameter class java.util.ArrayList does not have a Cassandra marshaller
at com.stratio.deep.rdd.CassandraRDDUtils.marshallerInstance(CassandraRDDUtils.java:167)
at com.stratio.deep.entity.CellValidator.cellValidator(CellValidator.java:229)
at com.stratio.deep.entity.CassandraCell.getValueType(CassandraCell.java:139)
at com.stratio.deep.entity.CassandraCell.(CassandraCell.java:153)
at com.stratio.deep.entity.CassandraCell.create(CassandraCell.java:91)
at com.stratio.deep.entity.CassandraCell.create(CassandraCell.java:66)
at com.stratio.quantum.QuantumALS$1.call(QuantumALS.java:131)
at com.stratio.quantum.QuantumALS$1.call(QuantumALS.java:122)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:923)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at com.stratio.deep.rdd.CassandraRDDUtils$1.apply(CassandraRDDUtils.java:132)
at com.stratio.deep.rdd.CassandraRDDUtils$1.apply(CassandraRDDUtils.java:125)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

The code snippet:

    JavaRDD<Cells> outputRDD = features.map(new Function<Tuple2<Object, double[]>, Cells>() {
        @Override
        public Cells call(Tuple2<Object, double[]> t) throws Exception {
            List<Double> dl = new ArrayList<Double>();
            for (double d : t._2()) {
                dl.add(d);
            }
            Cell c1 = CassandraCell.create("id", t._1(), true, false);
            Cell c2 = CassandraCell.create("features", dl); <== ERROR 
            return new Cells(c1, c2);
        }
    });
    CassandraRDD.saveRDDToCassandra(outputRDD, featuresConfig);

json schema not retrieved

I used the strato mongodb connector to retrieve the content of a mongodb collection(of json documents).
The results only had values and not the keys, thus rendering it un-queryable.

jdbc sql server does not work

Two issues that I found connecting to sql server are:

The connetion string that is created with fluet builder do not work, you have to set it like this

val conn_str = "jdbc:sqlserver://xxxxxxxx.database.windows.net:1433;database=master;user=insights@oxub00w1l9;password=xxxxxxxxxx;encrypt=true;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"

val tableConfig: JdbcDeepJobConfig[Cells] = JdbcConfigFactory
  .createJdbc
  .database("master") 
  .table("information_schema.tables")
  .driverClass("com.microsoft.sqlserver.jdbc.SQLServerDriver")
  .connectionUrl(conn_str)
  .initialize

I found this hard to fingure out how to use, if you do not set the connection url it will create the connection string that will not work.

The second thing that I found that does not work right is the query that the code above creates is not valid.

tableConfig.getQuery() gets yous you a query of
SELECT information_schema.tables.* FROM insights-playland-db.information_schema.tables information_schema.tables

if you execute this query even without spark you get an error of:

val conn = DriverManager.getConnection(tableConfig.getConnectionUrl, tableConfig.getUsername, tableConfig.getPassword)
val statement: Statement = conn.createStatement
val query: SelectQuery = tableConfig.getQuery

val resultSet = statement.executeQuery(query.toString)

Exception in thread "main" com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '.'.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:792)
at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:689)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:616)
at TestSQLStatements$.main(TestSQLStatements.scala:40)
at TestSQLStatements.main(TestSQLStatements.scala)

I also found that if the database name has a "-" in then you get this error

Exception in thread "main" com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '-'.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:792)
at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:689)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:616)
at TestSQLStatements$.main(TestSQLStatements.scala:40)
at TestSQLStatements.main(TestSQLStatements.scala)

I tried on a database that was not master that does not "-" in the name and on a table that does not "." in the table name and it created a query like this

 SELECT errorlevelcounts.* FROM playland2.errorlevelcounts errorlevelcounts

this query give an error of
com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'playland2.errorlevelcounts'.

if you remove the database name of playland2

 SELECT errorlevelcounts.* FROM errorlevelcounts errorlevelcounts

this query is valid in sql server

Deep integration with Spark version 1.4

Hi,

I was trying to integrate stratio-deep with spark 1.4.1, while doing this Stratio deep is compiling properly with Spark1.4.1
But while creating the distribution i am getting the error.

[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Spark Project Networking 1.3.1
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ spark-network-common_2.10 ---
[INFO]
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-versions) @ spark-network-common_2.10 ---
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:add-source (eclipse-add-source) @ spark-network-common_2.10 ---
[INFO] Add Source directory: /tmp/stratio-deep-distribution/stratiospark/network/common/src/main/scala
[INFO] Add Test Source directory: /tmp/stratio-deep-distribution/stratiospark/network/common/src/test/scala
[INFO]
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-scala-sources) @ spark-network-common_2.10 ---
[INFO] Source directory: /tmp/stratio-deep-distribution/stratiospark/network/common/src/main/scala added.
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-network-common_2.10 ---
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ spark-network-common_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /tmp/stratio-deep-distribution/stratiospark/network/common/src/main/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ spark-network-common_2.10 ---
[INFO] Using zinc server for incremental compilation
[INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
[info] Compiling 43 Java sources to /tmp/stratio-deep-distribution/stratiospark/network/common/target/scala-2.10/classes...
[info] Error occurred during initialization of VM
[info] java.lang.Error: Properties init: Could not determine current working directory.
[info] at java.lang.System.initProperties(Native Method)
[info] at java.lang.System.initializeSystemClass(System.java:1119)
[info]
[error] Compile failed at Aug 24, 2015 1:10:20 PM [0.056s]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 5.302 s]
[INFO] Spark Project Networking ........................... FAILURE [ 0.783 s]
[INFO] Spark Project Shuffle Streaming Service ............ SKIPPED
[INFO] Spark Project Core ................................. SKIPPED
[INFO] Spark Project Bagel ................................ SKIPPED
[INFO] Spark Project GraphX ............................... SKIPPED
[INFO] Spark Project Streaming ............................ SKIPPED
[INFO] Spark Project Catalyst ............................. SKIPPED
[INFO] Spark Project SQL .................................. SKIPPED
[INFO] Spark Project ML Library ........................... SKIPPED
[INFO] Spark Project Tools ................................ SKIPPED
[INFO] Spark Project Hive ................................. SKIPPED
[INFO] Spark Project REPL ................................. SKIPPED
[INFO] Spark Project YARN ................................. SKIPPED
[INFO] Spark Project Assembly ............................. SKIPPED
[INFO] Spark Project External Twitter ..................... SKIPPED
[INFO] Spark Project External Flume Sink .................. SKIPPED
[INFO] Spark Project External Flume ....................... SKIPPED
[INFO] Spark Project External MQTT ........................ SKIPPED
[INFO] Spark Project External ZeroMQ ...................... SKIPPED
[INFO] Spark Project External Kafka ....................... SKIPPED
[INFO] Spark Project Examples ............................. SKIPPED
[INFO] Spark Project YARN Shuffle Service ................. SKIPPED
[INFO] Spark Project External Kafka Assembly .............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 7.636 s
[INFO] Finished at: 2015-08-24T13:10:20+05:30
[INFO] Final Memory: 47M/318M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile-first) on project spark-network-common_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile failed. CompileFailed -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :spark-network-common_2.10
Cannot make Spark distribution

Meanwhile, I saw that Spark version 1.5 is released, so when can we expect this integration.

Thanks & regards.

deep-jdbc build fails

$ mvn clean install
...
Downloaded: http://maven.restlet.org/org/restlet/jse/org.restlet.ext.ssl/2.1.2/org.restlet.ext.ssl-2.1.2.jar (37 KB at 76.4 KB/sec)
Downloaded: http://maven.restlet.org/org/restlet/jse/org.restlet/2.1.2/org.restlet-2.1.2.jar (713 KB at 628.0 KB/sec)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 04:04 min
[INFO] Finished at: 2015-03-07T23:24:46-05:00
[INFO] Final Memory: 19M/339M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project deep-jdbc: Could not resolve dependencies for project com.stratio.deep:deep-jdbc:jar:0.7.0: The following artifacts could not be resolved: com.stratio.deep:deep-commons:jar:tests:0.7.0, com.stratio.deep:deep-core:jar:tests:0.7.0: Could not find artifact com.stratio.deep:deep-commons:jar:tests:0.7.0 in Neo4j releases (http://m2.neo4j.org/content/repositories/releases) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.

NullPointerException with a blob key [stratio-deep version 0.2.9]

Hello,

I'm using stratio-deep version 0.2.9.

I'm doing some tests to check how Stratio works, one of them has the column key defined as blob type. When I run the test the following NullPointerException is thrown:

 Exception in thread "main" java.lang.NullPointerException
    at com.stratio.deep.entity.CellValidator.<init>(CellValidator.java:292)
    at com.stratio.deep.entity.CellValidator.cellValidator(CellValidator.java:212)
    at com.stratio.deep.entity.Cell.getValueType(Cell.java:158)
    at com.stratio.deep.entity.Cell.<init>(Cell.java:192)
    at com.stratio.deep.entity.Cell.create(Cell.java:134)
    at com.stratio.deep.config.GenericDeepJobConfig.initColumnDefinitionMap(GenericDeepJobConfig.java:299)
    at com.stratio.deep.config.GenericDeepJobConfig.columnDefinitions(GenericDeepJobConfig.java:286)
    at com.stratio.deep.config.GenericDeepJobConfig.initialize(GenericDeepJobConfig.java:449)
    at com.stratio.examples.JavaExample.main(JavaExample.java:79)

To resolve the problem I have added in the class com.stratio.deep.utils.AnnotationUtils the corresponding entry in the Maps MAP_JAVA_TYPE_TO_ABSTRACT_TYPE, MAP_ABSTRACT_TYPE_CLASSNAME_TO_JAVA_TYPE and MAP_ABSTRACT_TYPE_CLASS_TO_ABSTRACT_TYPE.

    public static final Map<Class, AbstractType<?>> MAP_JAVA_TYPE_TO_ABSTRACT_TYPE =
            ImmutableMap.<Class, AbstractType<?>>builder()
                    .put(String.class, UTF8Type.instance)
                    .put(Integer.class, Int32Type.instance)
                    .put(Boolean.class, BooleanType.instance)
                    .put(Date.class, TimestampType.instance)
                    .put(BigDecimal.class, DecimalType.instance)
                    .put(Long.class, LongType.instance)
                    .put(Double.class, DoubleType.instance)
                    .put(Float.class, FloatType.instance)
                    .put(InetAddress.class, InetAddressType.instance)
                    .put(Inet4Address.class, InetAddressType.instance)
                    .put(Inet6Address.class, InetAddressType.instance)
                    .put(BigInteger.class, IntegerType.instance)
                    .put(UUID.class, UUIDType.instance)
// Begin Resolve NullPointerExeception                    
                    .put(ByteBuffer.class, BytesType.instance)
// End Resolve NullPointerExeception
                    .build();

    /**
     * Static map of associations between a cassandra marshaller fully qualified class name and the corresponding
     * Java class.
     */
    public static final Map<String, Class> MAP_ABSTRACT_TYPE_CLASSNAME_TO_JAVA_TYPE =
            ImmutableMap.<String, Class>builder()
                    .put(UTF8Type.class.getCanonicalName(), String.class)
                    .put(Int32Type.class.getCanonicalName(), Integer.class)
                    .put(BooleanType.class.getCanonicalName(), Boolean.class)
                    .put(TimestampType.class.getCanonicalName(), Date.class)
                    .put(DateType.class.getCanonicalName(), Date.class)
                    .put(DecimalType.class.getCanonicalName(), BigDecimal.class)
                    .put(LongType.class.getCanonicalName(), Long.class)
                    .put(DoubleType.class.getCanonicalName(), Double.class)
                    .put(FloatType.class.getCanonicalName(), Float.class)
                    .put(InetAddressType.class.getCanonicalName(), InetAddress.class)
                    .put(IntegerType.class.getCanonicalName(), BigInteger.class)
                    .put(UUIDType.class.getCanonicalName(), UUID.class)
                    .put(TimeUUIDType.class.getCanonicalName(), UUID.class)
                    .put(SetType.class.getCanonicalName(), Set.class)
                    .put(ListType.class.getCanonicalName(), List.class)
                    .put(MapType.class.getCanonicalName(), Map.class)
// Begin Resolve NullPointerExeception    
                    .put(BytesType.class.getCanonicalName(), ByteBuffer.class)
// End Resolve NullPointerExeception    
                    .build();

    /**
     * Static map of associations between cassandra marshaller Class objects and their instance.
     */
    public static final Map<Class<?>, AbstractType<?>> MAP_ABSTRACT_TYPE_CLASS_TO_ABSTRACT_TYPE =
            ImmutableMap.<Class<?>, AbstractType<?>>builder()
                    .put(UTF8Type.class, UTF8Type.instance)
                    .put(Int32Type.class, Int32Type.instance)
                    .put(BooleanType.class, BooleanType.instance)
                    .put(TimestampType.class, TimestampType.instance)
                    .put(DateType.class, DateType.instance)
                    .put(DecimalType.class, DecimalType.instance)
                    .put(LongType.class, LongType.instance)
                    .put(DoubleType.class, DoubleType.instance)
                    .put(FloatType.class, FloatType.instance)
                    .put(InetAddressType.class, InetAddressType.instance)
                    .put(IntegerType.class, IntegerType.instance)
                    .put(UUIDType.class, UUIDType.instance)
                    .put(TimeUUIDType.class, TimeUUIDType.instance)
// Begin Resolve NullPointerExeception                
                    .put(BytesType.class, BytesType.instance)
// End Resolve NullPointerExeception
                    .build();

With these changes, it seems to work fine.

Regards.

Problems with uppercase in keyspace and table name

Hello,

We have a keyspace with the name Keyspace1 and a table with the name Standard1, and I get exceptions when run some test over Stratio.

The following code (from thw class com.stratio.examples.JavaExample.) is the way we create a IDeepJobConfig

// Configuration and initialization
IDeepJobConfig config = DeepJobConfigFactory.create()
        .host(cassandraHost).rpcPort(cassandraPort)
        .keyspace(keyspaceName).table(tableName)
        .username(userName).password(password)
        .inputColumns("key")
        .initialize();

I have try to initialize the keyspaceName and tableName varaibles in two ways. The first one is this:

String keyspaceName = "Keyspace1";
String tableName = "Standard1";

When it is run, the exception thrown is:

Exception in thread "main" com.datastax.driver.core.exceptions.InvalidQueryException: Keyspace 'keyspace1' does not exist
        at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
        at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
        at com.datastax.driver.core.SessionManager.setKeyspace(SessionManager.java:336)
        at com.datastax.driver.core.Cluster.connect(Cluster.java:228)
        at com.stratio.deep.config.GenericDeepJobConfig.getSession(GenericDeepJobConfig.java:151)
        at com.stratio.deep.config.GenericDeepJobConfig.fetchTableMetadata(GenericDeepJobConfig.java:194)
        at com.stratio.deep.config.GenericDeepJobConfig.validate(GenericDeepJobConfig.java:547)
        at com.stratio.deep.config.GenericDeepJobConfig.initialize(GenericDeepJobConfig.java:447)
        at com.stratio.examples.JavaExample.main(JavaExample.java:91)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Keyspace 'keyspace1' does not exist
        at com.datastax.driver.core.Responses$Error.asException(Responses.java:97)
        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:108)
        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:235)
        at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:367)
        at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:571)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

And the second way is using double quotes:

String keyspaceName = "\"Keyspace1\"";
String tableName = "\"Standard1\""; 

The exception thrown is:

com.stratio.deep.exception.DeepIOException: com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured columnfamily "Standard1"
    at com.stratio.deep.cql.DeepRecordReader$RowIterator.executeQuery(DeepRecordReader.java:601)
    at com.stratio.deep.cql.DeepRecordReader$RowIterator.<init>(DeepRecordReader.java:191)
    at com.stratio.deep.cql.DeepRecordReader.initialize(DeepRecordReader.java:121)
    at com.stratio.deep.cql.DeepRecordReader.<init>(DeepRecordReader.java:92)
    at com.stratio.deep.rdd.CassandraRDD.initRecordReader(CassandraRDD.java:275)
    at com.stratio.deep.rdd.CassandraRDD.compute(CassandraRDD.java:188)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured columnfamily "Standard1"
    at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
    at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
    at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
    at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
    at com.datastax.driver.core.SessionManager.execute(SessionManager.java:88)
    at com.stratio.deep.cql.DeepRecordReader$RowIterator.executeQuery(DeepRecordReader.java:582)
    ... 20 more
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured columnfamily "Standard1"
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:97)
    at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:108)
    at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:235)
    at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:367)
    at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:571)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    ... 3 more

It seems that the problem comes from DataStax Java Driver, but I don't known if there is a way to use the DataStax Java Driver from Stratio-Deep code to avoid these errors.

On the other hand, are there any restrictions or problems when using uppercase and lowercase in Cassandra with Stratio?

Thanks,
Ernesto.

Cannot make Spark distribution

Hi,
I am trying to create a Deep distribution by using the command
cd deep-scripts
./make-distribution-deep.sh

I got an error

[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ spark-streaming-flume_2.10 ---
[WARNING] Zinc server is not available at port 3030 - reverting to normal incremental compile
[INFO] Using incremental compilation
[INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
[INFO] Compiling 6 Scala sources and 1 Java source to /tmp/stratio-deep-distribution/stratiospark/external/flume/target/scala-2.10/classes...
[ERROR] /tmp/stratio-deep-distribution/stratiospark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeBatchFetcher.scala:22: object Throwables is not a member of package com.google.common.base
[ERROR] import com.google.common.base.Throwables
[ERROR] ^
[ERROR] /tmp/stratio-deep-distribution/stratiospark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeBatchFetcher.scala:59: not found: value Throwables
[ERROR] Throwables.getRootCause(e) match {
[ERROR] ^
[ERROR] /tmp/stratio-deep-distribution/stratiospark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePollingInputDStream.scala:26: object util is not a member of package com.google.common
[ERROR] import com.google.common.util.concurrent.ThreadFactoryBuilder
[ERROR] ^
[ERROR] /tmp/stratio-deep-distribution/stratiospark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePollingInputDStream.scala:69: not found: type ThreadFactoryBuilder
[ERROR] Executors.newCachedThreadPool(new ThreadFactoryBuilder().setDaemon(true).
[ERROR] ^
[ERROR] /tmp/stratio-deep-distribution/stratiospark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePollingInputDStream.scala:76: not found: type ThreadFactoryBuilder
[ERROR] new ThreadFactoryBuilder().setDaemon(true).setNameFormat("Flume Receiver Thread - %d").build())
[ERROR] ^
[ERROR] 5 errors found
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 13.422 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 10.889 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 8.517 s]
[INFO] Spark Project Core ................................. SUCCESS [04:52 min]
[INFO] Spark Project Bagel ................................ SUCCESS [ 30.599 s]
[INFO] Spark Project GraphX ............................... SUCCESS [01:33 min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:08 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [02:18 min]
[INFO] Spark Project SQL .................................. SUCCESS [02:47 min]
[INFO] Spark Project ML Library ........................... SUCCESS [02:51 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 17.748 s]
[INFO] Spark Project Hive ................................. SUCCESS [02:03 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 59.954 s]
[INFO] Spark Project YARN Parent POM ...................... SUCCESS [ 4.491 s]
[INFO] Spark Project YARN Stable API ...................... SUCCESS [01:03 min]
[INFO] Spark Project Assembly ............................. SUCCESS [01:06 min]
[INFO] Spark Project External Twitter ..................... SUCCESS [ 23.976 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 30.382 s]
[INFO] Spark Project External Flume ....................... FAILURE [ 7.310 s]
[INFO] Spark Project External MQTT ........................ SKIPPED
[INFO] Spark Project External ZeroMQ ...................... SKIPPED
[INFO] Spark Project External Kafka ....................... SKIPPED
[INFO] Spark Project Examples ............................. SKIPPED
[INFO] Spark Project YARN Shuffle Service ................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 24:13 min
[INFO] Finished at: 2015-06-01T14:00:46+05:30
[INFO] Final Memory: 93M/1012M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile-first) on project spark-streaming-flume_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile failed. CompileFailed -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :spark-streaming-flume_2.10

Please provide me suggestions to solve this issue.

Thanks!

Stratio Sandbox's stratio-deep-shell sc not found (0.91)

Version: Stratio Sandbox 0.91

To reproduce: Followed Sandbox installation instructions. Imported appliance into VirtualBox. Started Sandbox and started services after changing /etc/hosts. Created Cassandra schema in cqlsh and started stratio-deep-shell from /opt/sds/spark/bin.

[root@sandbox bin]# ./stratio-deep-shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/sds/spark/lib/spark-assembly-1.0.0-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/sds/spark/lib/spark-examples-1.0.0-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Welcome to
______ __ _ ___
/ / /______ / /()__ / _ ___ ___ ___
\ / __/ **/ _ `/ __/ / _ \ / // / -) -
) _
/**/__// ,//// /___/// .__/
/
/ Powered by Spark v1.0.0

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)
Type in expressions to have them evaluated.
Type :help for more information.
21:38:49,763 INFO [spark-akka.actor.default-dispatcher-5] Slf4jLogger:80 - Slf4jLogger started
21:38:49,892 INFO [spark-akka.actor.default-dispatcher-4] Remoting:74 - Starting remoting
21:38:50,230 INFO [spark-akka.actor.default-dispatcher-5] Remoting:74 - Remoting started; listening on addresses :[akka.tcp://spark@sandbox:43350]
21:38:50,236 INFO [spark-akka.actor.default-dispatcher-4] Remoting:74 - Remoting now listens on addresses: [akka.tcp://spark@sandbox:43350]
Failed to load native Mesos library from
java.lang.UnsatisfiedLinkError: no mesos in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
at java.lang.Runtime.loadLibrary0(Runtime.java:849)
at java.lang.System.loadLibrary(System.java:1088)
at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:52)
at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:64)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1542)
at org.apache.spark.SparkContext.(SparkContext.scala:307)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:957)
at $iwC$$iwC.(:8)
at $iwC.(:14)
at (:16)
at .(:20)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:121)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:120)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:263)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:120)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:913)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:142)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:104)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:930)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Spark context available as sc.
Loading /opt/sds/spark/bin/stratio-deep-init.scala...
import com.stratio.deep.annotations.DeepEntity
import com.stratio.deep.annotations.DeepField
import com.stratio.deep.entity.IDeepType
import org.apache.cassandra.db.marshal.Int32Type
import org.apache.cassandra.db.marshal.LongType
import com.stratio.deep.config.{DeepJobConfigFactory=>Cfg, }
import com.stratio.deep.entity.

import com.stratio.deep.context._
import com.stratio.deep.rdd._
import com.stratio.deep.rdd.mongodb._
import com.stratio.deep.testentity._
:33: error: not found: value sc
val deepContext = new DeepSparkContext(sc)
^

scala>

Failed to Build Deep-jdbc

I was trying to build deep-spark using the command mvn clean install -DskipTests inside deep-parernt directory and at the end i got an error stated that

[INFO] deep parent ........................................ SUCCESS [ 22.328 s]
[INFO] deep commons ....................................... SUCCESS [ 33.724 s]
[INFO] deep core .......................................... SUCCESS [ 25.067 s]
[INFO] deep cassandra ..................................... SUCCESS [ 29.947 s]
[INFO] deep mongodb ....................................... SUCCESS [ 22.710 s]
[INFO] deep elasticsearch ................................. SUCCESS [ 23.845 s]
[INFO] deep aerospike ..................................... SUCCESS [ 23.377 s]
[INFO] deep jdbc .......................................... FAILURE [ 0.147 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:01 min
[INFO] Finished at: 2015-05-29T16:22:24+05:30
[INFO] Final Memory: 77M/718M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project deep-jdbc: Could not resolve dependencies for project com.stratio.deep:deep-jdbc:jar:0.8.0-SNAPSHOT: Failure to find com.oracle:ojdbc7:jar:12.1.0.2 in http://m2.neo4j.org/content/repositories/releases was cached in the local repository, resolution will not be reattempted until the update interval of Neo4j releases has elapsed or updates are forced -> [Help 1]

Can you please provide me the proper steps to build this project.

Thanks,

Stratio Sandbox's stratio-deep-shell hangs running "first steps" tutorial

Sandbox version 0.91

[root@sandbox bin]# ./stratio-deep-shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/sds/spark/lib/spark-assembly-1.0.0-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/sds/spark/lib/spark-examples-1.0.0-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Welcome to
______ __ _ ___
/ / /______ / /()__ / _ ___ ___ ___
\ / __/ **/ _ `/ __/ / _ \ / // / -) -
) _
/**/__// ,//// /___/// .__/
/
/ Powered by Spark v1.0.0

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)
Type in expressions to have them evaluated.
Type :help for more information.
23:38:25,730 INFO [spark-akka.actor.default-dispatcher-5] Slf4jLogger:80 - Slf4jLogger started
23:38:25,825 INFO [spark-akka.actor.default-dispatcher-5] Remoting:74 - Starting remoting
23:38:26,167 INFO [spark-akka.actor.default-dispatcher-5] Remoting:74 - Remoting started; listening on addresses :[akka.tcp://spark@sandbox:33621]
23:38:26,171 INFO [spark-akka.actor.default-dispatcher-3] Remoting:74 - Remoting now listens on addresses: [akka.tcp://spark@sandbox:33621]
Spark context available as sc.
Loading /opt/sds/spark/bin/stratio-deep-init.scala...
import com.stratio.deep.annotations.DeepEntity
import com.stratio.deep.annotations.DeepField
import com.stratio.deep.entity.IDeepType
import org.apache.cassandra.db.marshal.Int32Type
import org.apache.cassandra.db.marshal.LongType
import com.stratio.deep.config.{DeepJobConfigFactory=>Cfg, }
import com.stratio.deep.entity.

import com.stratio.deep.context._
import com.stratio.deep.rdd._
import com.stratio.deep.rdd.mongodb._
import com.stratio.deep.testentity._
deepContext: com.stratio.deep.context.DeepSparkContext = com.stratio.deep.context.DeepSparkContext@227330c

scala> val config : ICassandraDeepJobConfig[Cells] = Cfg.create().host("localhost").rpcPort(9160).keyspace("crawler").table("Page").initialize
config: com.stratio.deep.config.ICassandraDeepJobConfig[com.stratio.deep.entity.Cells] = com.stratio.deep.config.CellDeepJobConfig@6f16befb

scala> val rdd: CassandraRDD[Cells] = deepContext.cassandraGenericRDD(config)
rdd: com.stratio.deep.rdd.CassandraRDD[com.stratio.deep.entity.Cells] = CassandraCellRDD[0] at RDD at CassandraRDD.java:173

scala> val containsAbcRDD = rdd filter {c :Cells => c.getCellByName("domainName").getCellValue.asInstanceOf[String].contains("abc.es") }
containsAbcRDD: org.apache.spark.rdd.RDD[com.stratio.deep.entity.Cells] = FilteredRDD[1] at filter at :41

scala> containsAbcRDD.count

Hanging here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.