stratio / deep-spark Goto Github PK

View Code? Open in Web Editor NEW

197.0 197.0 42.0 6.22 MB

Connecting Apache Spark with different data stores [DEPRECATED]

Home Page: http://stratio.github.io/deep-spark

License: Apache License 2.0

Java 97.79% Scala 0.11% Shell 2.10%

deep-spark's People

Contributors

Stargazers

Watchers

deep-spark's Issues

All steps to Aerospike interaction does not work properly

Hi,

As Stratio Manager does not support Aerospike yet. Vagrant sandbox doesn't contain Aerospike bundles either. I had to build deep-spark from sources.

bin/stratio-deep-shell does not see anything in com.stratio.*. So, I had to manually launch Spark via bin/spark-shell --jars $(echo lib/*.jar | tr ' ' ',').

Then First Steps with Stratio Deep and Aerospike does not work as there are no MessageTestEntity and WordCount in compiled jars. And I haven't managed to implement those is plain Scala to paste it in REPL. (Something about NoSuchMethod exception and <init>).

Then I composed custom SBT project, and put all jars in lib dir to unmanaged dependencies. Now it complains:

[error] (run-main-f) java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
        at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
        at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
        at akka.actor.RootActorPath.$div(ActorPath.scala:159)
        at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:464)
        at akka.remote.RemoteActorRefProvider.<init>(RemoteActorRefProvider.scala:124)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  // ...

Where is the correct path in whole documentation and examples to launch deep-spark with Aerospike?

scala.MatchError: interface java.util.List

Hi,
i am trying to create a SchemaRDD of a cassandra table which contains a column with List as a datatype.
My bean class looks like this,

import org.apache.cassandra.db.marshal.ListType;
import java.util.List;

@deepfield(fieldName = "follow_user", validationClass = ListType.class)
private List follow_user;

To create schemaRDD i done the following

JavaSchemaRDD schemaSummaryPeople = sqlContext.applySchema(
inputSummaryRDD, bean.class);

But its giving me an error ,
Exception in thread "main" scala.MatchError: interface java.util.List (of class java.lang.Class)
at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:216)
at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$getSchema$1.apply(JavaSQLContext.scala:215)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.sql.api.java.JavaSQLContext.getSchema(JavaSQLContext.scala:215)
at org.apache.spark.sql.api.java.JavaSQLContext.applySchema(JavaSQLContext.scala:100)

Am getting this same error when i try Set instead of List also.

Multi column partition key when writing to Cassandra (v0.4.0)

When writing Cells to Cassandra, if partition key flag is set to more than one column only the first one is taken into account, setting the remaining one(s) as cluster key instead.

Failed to Build Deep-jdbc

I was trying to build deep-spark using the command mvn clean install -DskipTests inside deep-parernt directory and at the end i got an error stated that

[INFO] deep parent ........................................ SUCCESS [ 22.328 s]
[INFO] deep commons ....................................... SUCCESS [ 33.724 s]
[INFO] deep core .......................................... SUCCESS [ 25.067 s]
[INFO] deep cassandra ..................................... SUCCESS [ 29.947 s]
[INFO] deep mongodb ....................................... SUCCESS [ 22.710 s]
[INFO] deep elasticsearch ................................. SUCCESS [ 23.845 s]
[INFO] deep aerospike ..................................... SUCCESS [ 23.377 s]
[INFO] deep jdbc .......................................... FAILURE [ 0.147 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:01 min
[INFO] Finished at: 2015-05-29T16:22:24+05:30
[INFO] Final Memory: 77M/718M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project deep-jdbc: Could not resolve dependencies for project com.stratio.deep:deep-jdbc:jar:0.8.0-SNAPSHOT: Failure to find com.oracle:ojdbc7:jar:12.1.0.2 in http://m2.neo4j.org/content/repositories/releases was cached in the local repository, resolution will not be reattempted until the update interval of Neo4j releases has elapsed or updates are forced -> [Help 1]

Can you please provide me the proper steps to build this project.

Thanks,

Stratio Sandbox's stratio-deep-shell sc not found (0.91)

Version: Stratio Sandbox 0.91

To reproduce: Followed Sandbox installation instructions. Imported appliance into VirtualBox. Started Sandbox and started services after changing /etc/hosts. Created Cassandra schema in cqlsh and started stratio-deep-shell from /opt/sds/spark/bin.

[root@sandbox bin]# ./stratio-deep-shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/sds/spark/lib/spark-assembly-1.0.0-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/sds/spark/lib/spark-examples-1.0.0-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Welcome to
______ __ _ ___
/ / /______ / /()__ / _ ___ ___ ___
\ / __/ **/ _ `/ __/ / _ \ / // / -) -) _
/**/__// ,//// /___/// .__/
// Powered by Spark v1.0.0

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)
Type in expressions to have them evaluated.
Type :help for more information.
21:38:49,763 INFO [spark-akka.actor.default-dispatcher-5] Slf4jLogger:80 - Slf4jLogger started
21:38:49,892 INFO [spark-akka.actor.default-dispatcher-4] Remoting:74 - Starting remoting
21:38:50,230 INFO [spark-akka.actor.default-dispatcher-5] Remoting:74 - Remoting started; listening on addresses :[akka.tcp://spark@sandbox:43350]
21:38:50,236 INFO [spark-akka.actor.default-dispatcher-4] Remoting:74 - Remoting now listens on addresses: [akka.tcp://spark@sandbox:43350]
Failed to load native Mesos library from
java.lang.UnsatisfiedLinkError: no mesos in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
at java.lang.Runtime.loadLibrary0(Runtime.java:849)
at java.lang.System.loadLibrary(System.java:1088)
at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:52)
at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:64)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1542)
at org.apache.spark.SparkContext.(SparkContext.scala:307)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:957)
at $iwC$$iwC.(:8)
at $iwC.(:14)
at (:16)
at .(:20)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:121)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:120)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:263)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:120)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:913)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:142)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:104)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:930)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Spark context available as sc.
Loading /opt/sds/spark/bin/stratio-deep-init.scala...
import com.stratio.deep.annotations.DeepEntity
import com.stratio.deep.annotations.DeepField
import com.stratio.deep.entity.IDeepType
import org.apache.cassandra.db.marshal.Int32Type
import org.apache.cassandra.db.marshal.LongType
import com.stratio.deep.config.{DeepJobConfigFactory=>Cfg, }
import com.stratio.deep.entity.
import com.stratio.deep.context._
import com.stratio.deep.rdd._
import com.stratio.deep.rdd.mongodb._
import com.stratio.deep.testentity._
:33: error: not found: value sc
val deepContext = new DeepSparkContext(sc)
^

scala>

Cannot make Spark distribution

Hi,
I am trying to create a Deep distribution by using the command
cd deep-scripts
./make-distribution-deep.sh

I got an error

[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ spark-streaming-flume_2.10 ---
[WARNING] Zinc server is not available at port 3030 - reverting to normal incremental compile
[INFO] Using incremental compilation
[INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
[INFO] Compiling 6 Scala sources and 1 Java source to /tmp/stratio-deep-distribution/stratiospark/external/flume/target/scala-2.10/classes...
[ERROR] /tmp/stratio-deep-distribution/stratiospark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeBatchFetcher.scala:22: object Throwables is not a member of package com.google.common.base
[ERROR] import com.google.common.base.Throwables
[ERROR] ^
[ERROR] /tmp/stratio-deep-distribution/stratiospark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeBatchFetcher.scala:59: not found: value Throwables
[ERROR] Throwables.getRootCause(e) match {
[ERROR] ^
[ERROR] /tmp/stratio-deep-distribution/stratiospark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePollingInputDStream.scala:26: object util is not a member of package com.google.common
[ERROR] import com.google.common.util.concurrent.ThreadFactoryBuilder
[ERROR] ^
[ERROR] /tmp/stratio-deep-distribution/stratiospark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePollingInputDStream.scala:69: not found: type ThreadFactoryBuilder
[ERROR] Executors.newCachedThreadPool(new ThreadFactoryBuilder().setDaemon(true).
[ERROR] ^
[ERROR] /tmp/stratio-deep-distribution/stratiospark/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumePollingInputDStream.scala:76: not found: type ThreadFactoryBuilder
[ERROR] new ThreadFactoryBuilder().setDaemon(true).setNameFormat("Flume Receiver Thread - %d").build())
[ERROR] ^
[ERROR] 5 errors found
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 13.422 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 10.889 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 8.517 s]
[INFO] Spark Project Core ................................. SUCCESS [04:52 min]
[INFO] Spark Project Bagel ................................ SUCCESS [ 30.599 s]
[INFO] Spark Project GraphX ............................... SUCCESS [01:33 min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:08 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [02:18 min]
[INFO] Spark Project SQL .................................. SUCCESS [02:47 min]
[INFO] Spark Project ML Library ........................... SUCCESS [02:51 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 17.748 s]
[INFO] Spark Project Hive ................................. SUCCESS [02:03 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 59.954 s]
[INFO] Spark Project YARN Parent POM ...................... SUCCESS [ 4.491 s]
[INFO] Spark Project YARN Stable API ...................... SUCCESS [01:03 min]
[INFO] Spark Project Assembly ............................. SUCCESS [01:06 min]
[INFO] Spark Project External Twitter ..................... SUCCESS [ 23.976 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 30.382 s]
[INFO] Spark Project External Flume ....................... FAILURE [ 7.310 s]
[INFO] Spark Project External MQTT ........................ SKIPPED
[INFO] Spark Project External ZeroMQ ...................... SKIPPED
[INFO] Spark Project External Kafka ....................... SKIPPED
[INFO] Spark Project Examples ............................. SKIPPED
[INFO] Spark Project YARN Shuffle Service ................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 24:13 min
[INFO] Finished at: 2015-06-01T14:00:46+05:30
[INFO] Final Memory: 93M/1012M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile-first) on project spark-streaming-flume_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile failed. CompileFailed -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :spark-streaming-flume_2.10

Please provide me suggestions to solve this issue.

Thanks!

Deep integration with Spark version 1.4

Hi,

I was trying to integrate stratio-deep with spark 1.4.1, while doing this Stratio deep is compiling properly with Spark1.4.1
But while creating the distribution i am getting the error.

[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Spark Project Networking 1.3.1
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ spark-network-common_2.10 ---
[INFO]
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-versions) @ spark-network-common_2.10 ---
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:add-source (eclipse-add-source) @ spark-network-common_2.10 ---
[INFO] Add Source directory: /tmp/stratio-deep-distribution/stratiospark/network/common/src/main/scala
[INFO] Add Test Source directory: /tmp/stratio-deep-distribution/stratiospark/network/common/src/test/scala
[INFO]
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-scala-sources) @ spark-network-common_2.10 ---
[INFO] Source directory: /tmp/stratio-deep-distribution/stratiospark/network/common/src/main/scala added.
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-network-common_2.10 ---
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ spark-network-common_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /tmp/stratio-deep-distribution/stratiospark/network/common/src/main/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ spark-network-common_2.10 ---
[INFO] Using zinc server for incremental compilation
[INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
[info] Compiling 43 Java sources to /tmp/stratio-deep-distribution/stratiospark/network/common/target/scala-2.10/classes...
[info] Error occurred during initialization of VM
[info] java.lang.Error: Properties init: Could not determine current working directory.
[info] at java.lang.System.initProperties(Native Method)
[info] at java.lang.System.initializeSystemClass(System.java:1119)
[info]
[error] Compile failed at Aug 24, 2015 1:10:20 PM [0.056s]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 5.302 s]
[INFO] Spark Project Networking ........................... FAILURE [ 0.783 s]
[INFO] Spark Project Shuffle Streaming Service ............ SKIPPED
[INFO] Spark Project Core ................................. SKIPPED
[INFO] Spark Project Bagel ................................ SKIPPED
[INFO] Spark Project GraphX ............................... SKIPPED
[INFO] Spark Project Streaming ............................ SKIPPED
[INFO] Spark Project Catalyst ............................. SKIPPED
[INFO] Spark Project SQL .................................. SKIPPED
[INFO] Spark Project ML Library ........................... SKIPPED
[INFO] Spark Project Tools ................................ SKIPPED
[INFO] Spark Project Hive ................................. SKIPPED
[INFO] Spark Project REPL ................................. SKIPPED
[INFO] Spark Project YARN ................................. SKIPPED
[INFO] Spark Project Assembly ............................. SKIPPED
[INFO] Spark Project External Twitter ..................... SKIPPED
[INFO] Spark Project External Flume Sink .................. SKIPPED
[INFO] Spark Project External Flume ....................... SKIPPED
[INFO] Spark Project External MQTT ........................ SKIPPED
[INFO] Spark Project External ZeroMQ ...................... SKIPPED
[INFO] Spark Project External Kafka ....................... SKIPPED
[INFO] Spark Project Examples ............................. SKIPPED
[INFO] Spark Project YARN Shuffle Service ................. SKIPPED
[INFO] Spark Project External Kafka Assembly .............. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 7.636 s
[INFO] Finished at: 2015-08-24T13:10:20+05:30
[INFO] Final Memory: 47M/318M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile-first) on project spark-network-common_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile failed. CompileFailed -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :spark-network-common_2.10
Cannot make Spark distribution

Meanwhile, I saw that Spark version 1.5 is released, so when can we expect this integration.

Thanks & regards.

jdbc sql server does not work

Two issues that I found connecting to sql server are:

The connetion string that is created with fluet builder do not work, you have to set it like this

val conn_str = "jdbc:sqlserver://xxxxxxxx.database.windows.net:1433;database=master;user=insights@oxub00w1l9;password=xxxxxxxxxx;encrypt=true;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"

val tableConfig: JdbcDeepJobConfig[Cells] = JdbcConfigFactory
  .createJdbc
  .database("master") 
  .table("information_schema.tables")
  .driverClass("com.microsoft.sqlserver.jdbc.SQLServerDriver")
  .connectionUrl(conn_str)
  .initialize

I found this hard to fingure out how to use, if you do not set the connection url it will create the connection string that will not work.

The second thing that I found that does not work right is the query that the code above creates is not valid.

tableConfig.getQuery() gets yous you a query of
SELECT information_schema.tables.* FROM insights-playland-db.information_schema.tables information_schema.tables

if you execute this query even without spark you get an error of:

val conn = DriverManager.getConnection(tableConfig.getConnectionUrl, tableConfig.getUsername, tableConfig.getPassword)
val statement: Statement = conn.createStatement
val query: SelectQuery = tableConfig.getQuery

val resultSet = statement.executeQuery(query.toString)

Exception in thread "main" com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '.'.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:792)
at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:689)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:616)
at TestSQLStatements$.main(TestSQLStatements.scala:40)
at TestSQLStatements.main(TestSQLStatements.scala)

I also found that if the database name has a "-" in then you get this error

Exception in thread "main" com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '-'.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:792)
at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:689)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:616)
at TestSQLStatements$.main(TestSQLStatements.scala:40)
at TestSQLStatements.main(TestSQLStatements.scala)

I tried on a database that was not master that does not "-" in the name and on a table that does not "." in the table name and it created a query like this

 SELECT errorlevelcounts.* FROM playland2.errorlevelcounts errorlevelcounts

this query give an error of
com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'playland2.errorlevelcounts'.

if you remove the database name of playland2

 SELECT errorlevelcounts.* FROM errorlevelcounts errorlevelcounts

this query is valid in sql server

deep-jdbc build fails

$ mvn clean install
...
Downloaded: http://maven.restlet.org/org/restlet/jse/org.restlet.ext.ssl/2.1.2/org.restlet.ext.ssl-2.1.2.jar (37 KB at 76.4 KB/sec)
Downloaded: http://maven.restlet.org/org/restlet/jse/org.restlet/2.1.2/org.restlet-2.1.2.jar (713 KB at 628.0 KB/sec)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 04:04 min
[INFO] Finished at: 2015-03-07T23:24:46-05:00
[INFO] Final Memory: 19M/339M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project deep-jdbc: Could not resolve dependencies for project com.stratio.deep:deep-jdbc:jar:0.7.0: The following artifacts could not be resolved: com.stratio.deep:deep-commons:jar:tests:0.7.0, com.stratio.deep:deep-core:jar:tests:0.7.0: Could not find artifact com.stratio.deep:deep-commons:jar:tests:0.7.0 in Neo4j releases (http://m2.neo4j.org/content/repositories/releases) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.

Will it work with mongodb 3.0.7?

Can I import a mongodb 3.0.7 data from a different machine into spark?

Mongodb documentation not available

https://github.com/Stratio/deep-spark/blob/master/doc/src/site/sphinx/t40-basic-application.md

Link not working,

Stratio Sandbox's stratio-deep-shell hangs running "first steps" tutorial

Sandbox version 0.91

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)
Type in expressions to have them evaluated.
Type :help for more information.
23:38:25,730 INFO [spark-akka.actor.default-dispatcher-5] Slf4jLogger:80 - Slf4jLogger started
23:38:25,825 INFO [spark-akka.actor.default-dispatcher-5] Remoting:74 - Starting remoting
23:38:26,167 INFO [spark-akka.actor.default-dispatcher-5] Remoting:74 - Remoting started; listening on addresses :[akka.tcp://spark@sandbox:33621]
23:38:26,171 INFO [spark-akka.actor.default-dispatcher-3] Remoting:74 - Remoting now listens on addresses: [akka.tcp://spark@sandbox:33621]
Spark context available as sc.
Loading /opt/sds/spark/bin/stratio-deep-init.scala...
import com.stratio.deep.annotations.DeepEntity
import com.stratio.deep.annotations.DeepField
import com.stratio.deep.entity.IDeepType
import org.apache.cassandra.db.marshal.Int32Type
import org.apache.cassandra.db.marshal.LongType
import com.stratio.deep.config.{DeepJobConfigFactory=>Cfg, }
import com.stratio.deep.entity.
import com.stratio.deep.context._
import com.stratio.deep.rdd._
import com.stratio.deep.rdd.mongodb._
import com.stratio.deep.testentity._
deepContext: com.stratio.deep.context.DeepSparkContext = com.stratio.deep.context.DeepSparkContext@227330c

scala> val config : ICassandraDeepJobConfig[Cells] = Cfg.create().host("localhost").rpcPort(9160).keyspace("crawler").table("Page").initialize
config: com.stratio.deep.config.ICassandraDeepJobConfig[com.stratio.deep.entity.Cells] = com.stratio.deep.config.CellDeepJobConfig@6f16befb

scala> val rdd: CassandraRDD[Cells] = deepContext.cassandraGenericRDD(config)
rdd: com.stratio.deep.rdd.CassandraRDD[com.stratio.deep.entity.Cells] = CassandraCellRDD[0] at RDD at CassandraRDD.java:173

scala> val containsAbcRDD = rdd filter {c :Cells => c.getCellByName("domainName").getCellValue.asInstanceOf[String].contains("abc.es") }
containsAbcRDD: org.apache.spark.rdd.RDD[com.stratio.deep.entity.Cells] = FilteredRDD[1] at filter at :41

scala> containsAbcRDD.count

Hanging here

NullPointerException with a blob key [stratio-deep version 0.2.9]

Hello,

I'm using stratio-deep version 0.2.9.

I'm doing some tests to check how Stratio works, one of them has the column key defined as blob type. When I run the test the following NullPointerException is thrown:

 Exception in thread "main" java.lang.NullPointerException
    at com.stratio.deep.entity.CellValidator.<init>(CellValidator.java:292)
    at com.stratio.deep.entity.CellValidator.cellValidator(CellValidator.java:212)
    at com.stratio.deep.entity.Cell.getValueType(Cell.java:158)
    at com.stratio.deep.entity.Cell.<init>(Cell.java:192)
    at com.stratio.deep.entity.Cell.create(Cell.java:134)
    at com.stratio.deep.config.GenericDeepJobConfig.initColumnDefinitionMap(GenericDeepJobConfig.java:299)
    at com.stratio.deep.config.GenericDeepJobConfig.columnDefinitions(GenericDeepJobConfig.java:286)
    at com.stratio.deep.config.GenericDeepJobConfig.initialize(GenericDeepJobConfig.java:449)
    at com.stratio.examples.JavaExample.main(JavaExample.java:79)

To resolve the problem I have added in the class com.stratio.deep.utils.AnnotationUtils the corresponding entry in the Maps MAP_JAVA_TYPE_TO_ABSTRACT_TYPE, MAP_ABSTRACT_TYPE_CLASSNAME_TO_JAVA_TYPE and MAP_ABSTRACT_TYPE_CLASS_TO_ABSTRACT_TYPE.

    public static final Map<Class, AbstractType<?>> MAP_JAVA_TYPE_TO_ABSTRACT_TYPE =
            ImmutableMap.<Class, AbstractType<?>>builder()
                    .put(String.class, UTF8Type.instance)
                    .put(Integer.class, Int32Type.instance)
                    .put(Boolean.class, BooleanType.instance)
                    .put(Date.class, TimestampType.instance)
                    .put(BigDecimal.class, DecimalType.instance)
                    .put(Long.class, LongType.instance)
                    .put(Double.class, DoubleType.instance)
                    .put(Float.class, FloatType.instance)
                    .put(InetAddress.class, InetAddressType.instance)
                    .put(Inet4Address.class, InetAddressType.instance)
                    .put(Inet6Address.class, InetAddressType.instance)
                    .put(BigInteger.class, IntegerType.instance)
                    .put(UUID.class, UUIDType.instance)
// Begin Resolve NullPointerExeception                    
                    .put(ByteBuffer.class, BytesType.instance)
// End Resolve NullPointerExeception
                    .build();

    /**
     * Static map of associations between a cassandra marshaller fully qualified class name and the corresponding
     * Java class.
     */
    public static final Map<String, Class> MAP_ABSTRACT_TYPE_CLASSNAME_TO_JAVA_TYPE =
            ImmutableMap.<String, Class>builder()
                    .put(UTF8Type.class.getCanonicalName(), String.class)
                    .put(Int32Type.class.getCanonicalName(), Integer.class)
                    .put(BooleanType.class.getCanonicalName(), Boolean.class)
                    .put(TimestampType.class.getCanonicalName(), Date.class)
                    .put(DateType.class.getCanonicalName(), Date.class)
                    .put(DecimalType.class.getCanonicalName(), BigDecimal.class)
                    .put(LongType.class.getCanonicalName(), Long.class)
                    .put(DoubleType.class.getCanonicalName(), Double.class)
                    .put(FloatType.class.getCanonicalName(), Float.class)
                    .put(InetAddressType.class.getCanonicalName(), InetAddress.class)
                    .put(IntegerType.class.getCanonicalName(), BigInteger.class)
                    .put(UUIDType.class.getCanonicalName(), UUID.class)
                    .put(TimeUUIDType.class.getCanonicalName(), UUID.class)
                    .put(SetType.class.getCanonicalName(), Set.class)
                    .put(ListType.class.getCanonicalName(), List.class)
                    .put(MapType.class.getCanonicalName(), Map.class)
// Begin Resolve NullPointerExeception    
                    .put(BytesType.class.getCanonicalName(), ByteBuffer.class)
// End Resolve NullPointerExeception    
                    .build();

    /**
     * Static map of associations between cassandra marshaller Class objects and their instance.
     */
    public static final Map<Class<?>, AbstractType<?>> MAP_ABSTRACT_TYPE_CLASS_TO_ABSTRACT_TYPE =
            ImmutableMap.<Class<?>, AbstractType<?>>builder()
                    .put(UTF8Type.class, UTF8Type.instance)
                    .put(Int32Type.class, Int32Type.instance)
                    .put(BooleanType.class, BooleanType.instance)
                    .put(TimestampType.class, TimestampType.instance)
                    .put(DateType.class, DateType.instance)
                    .put(DecimalType.class, DecimalType.instance)
                    .put(LongType.class, LongType.instance)
                    .put(DoubleType.class, DoubleType.instance)
                    .put(FloatType.class, FloatType.instance)
                    .put(InetAddressType.class, InetAddressType.instance)
                    .put(IntegerType.class, IntegerType.instance)
                    .put(UUIDType.class, UUIDType.instance)
                    .put(TimeUUIDType.class, TimeUUIDType.instance)
// Begin Resolve NullPointerExeception                
                    .put(BytesType.class, BytesType.instance)
// End Resolve NullPointerExeception
                    .build();

With these changes, it seems to work fine.

Regards.

json schema not retrieved

I used the strato mongodb connector to retrieve the content of a mongodb collection(of json documents).
The results only had values and not the keys, thus rendering it un-queryable.

Problems with uppercase in keyspace and table name

Hello,

We have a keyspace with the name Keyspace1 and a table with the name Standard1, and I get exceptions when run some test over Stratio.

The following code (from thw class com.stratio.examples.JavaExample.) is the way we create a IDeepJobConfig

// Configuration and initialization
IDeepJobConfig config = DeepJobConfigFactory.create()
        .host(cassandraHost).rpcPort(cassandraPort)
        .keyspace(keyspaceName).table(tableName)
        .username(userName).password(password)
        .inputColumns("key")
        .initialize();

I have try to initialize the keyspaceName and tableName varaibles in two ways. The first one is this:

String keyspaceName = "Keyspace1";
String tableName = "Standard1";

When it is run, the exception thrown is:

Exception in thread "main" com.datastax.driver.core.exceptions.InvalidQueryException: Keyspace 'keyspace1' does not exist
        at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
        at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
        at com.datastax.driver.core.SessionManager.setKeyspace(SessionManager.java:336)
        at com.datastax.driver.core.Cluster.connect(Cluster.java:228)
        at com.stratio.deep.config.GenericDeepJobConfig.getSession(GenericDeepJobConfig.java:151)
        at com.stratio.deep.config.GenericDeepJobConfig.fetchTableMetadata(GenericDeepJobConfig.java:194)
        at com.stratio.deep.config.GenericDeepJobConfig.validate(GenericDeepJobConfig.java:547)
        at com.stratio.deep.config.GenericDeepJobConfig.initialize(GenericDeepJobConfig.java:447)
        at com.stratio.examples.JavaExample.main(JavaExample.java:91)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Keyspace 'keyspace1' does not exist
        at com.datastax.driver.core.Responses$Error.asException(Responses.java:97)
        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:108)
        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:235)
        at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:367)
        at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:571)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

And the second way is using double quotes:

String keyspaceName = "\"Keyspace1\"";
String tableName = "\"Standard1\"";

The exception thrown is:

com.stratio.deep.exception.DeepIOException: com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured columnfamily "Standard1"
    at com.stratio.deep.cql.DeepRecordReader$RowIterator.executeQuery(DeepRecordReader.java:601)
    at com.stratio.deep.cql.DeepRecordReader$RowIterator.<init>(DeepRecordReader.java:191)
    at com.stratio.deep.cql.DeepRecordReader.initialize(DeepRecordReader.java:121)
    at com.stratio.deep.cql.DeepRecordReader.<init>(DeepRecordReader.java:92)
    at com.stratio.deep.rdd.CassandraRDD.initRecordReader(CassandraRDD.java:275)
    at com.stratio.deep.rdd.CassandraRDD.compute(CassandraRDD.java:188)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured columnfamily "Standard1"
    at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
    at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
    at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
    at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
    at com.datastax.driver.core.SessionManager.execute(SessionManager.java:88)
    at com.stratio.deep.cql.DeepRecordReader$RowIterator.executeQuery(DeepRecordReader.java:582)
    ... 20 more
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured columnfamily "Standard1"
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:97)
    at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:108)
    at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:235)
    at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:367)
    at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:571)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    ... 3 more

It seems that the problem comes from DataStax Java Driver, but I don't known if there is a way to use the DataStax Java Driver from Stratio-Deep code to avoid these errors.

On the other hand, are there any restrictions or problems when using uppercase and lowercase in Cassandra with Stratio?

Thanks,
Ernesto.

t50 has 404-link

I found this page via google while searching for spark and aerospike:

https://github.com/Stratio/deep-spark/blob/master/doc/t50-first-steps-deep-aerospike.md

It has a getting started link.

I would like to get started.

So, I clicked the link and it serves me a 404.

Here is the href in the bad link:

https://github.com/Stratio/deep-spark/blob/master/getting-started.html

ERROR: parameter class java.util.ArrayList does not have a Cassandra marshaller (v0.4.0)

This error occurs in version 0.4.0 when writing a Cells RDD to Cassandra that contains a collection (List, Map or Set). Here is the trace when trying to write a ArrayList:

com.stratio.deep.exception.DeepGenericException: parameter class java.util.ArrayList does not have a Cassandra marshaller
at com.stratio.deep.rdd.CassandraRDDUtils.marshallerInstance(CassandraRDDUtils.java:167)
at com.stratio.deep.entity.CellValidator.cellValidator(CellValidator.java:229)
at com.stratio.deep.entity.CassandraCell.getValueType(CassandraCell.java:139)
at com.stratio.deep.entity.CassandraCell.(CassandraCell.java:153)
at com.stratio.deep.entity.CassandraCell.create(CassandraCell.java:91)
at com.stratio.deep.entity.CassandraCell.create(CassandraCell.java:66)
at com.stratio.quantum.QuantumALS$1.call(QuantumALS.java:131)
at com.stratio.quantum.QuantumALS$1.call(QuantumALS.java:122)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:923)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at com.stratio.deep.rdd.CassandraRDDUtils$1.apply(CassandraRDDUtils.java:132)
at com.stratio.deep.rdd.CassandraRDDUtils$1.apply(CassandraRDDUtils.java:125)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

The code snippet:

    JavaRDD<Cells> outputRDD = features.map(new Function<Tuple2<Object, double[]>, Cells>() {
        @Override
        public Cells call(Tuple2<Object, double[]> t) throws Exception {
            List<Double> dl = new ArrayList<Double>();
            for (double d : t._2()) {
                dl.add(d);
            }
            Cell c1 = CassandraCell.create("id", t._1(), true, false);
            Cell c2 = CassandraCell.create("features", dl); <== ERROR 
            return new Cells(c1, c2);
        }
    });
    CassandraRDD.saveRDDToCassandra(outputRDD, featuresConfig);

Broken link in First Steps Deep Aerospike

In the docs, on this page:

• https://github.com/Stratio/deep-spark/blob/master/doc/t50-first-steps-deep-aerospike.md

There is a broken link to "Getting Started", which gives a 404 file-not-found error pointing here:

• https://github.com/Stratio/deep-spark/blob/release/0.7/getting-started.md

stratio / deep-spark Goto Github PK

deep-spark's People

Contributors

Stargazers

Watchers

Forkers

deep-spark's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs