Comments (2)
You get a RDD[Row]
with newDf.rdd
and it's currently not supported. It would be a good idea to support it directly.
Can you try to workaround it by using something like:
val rdd = newDf.map(row => (row(1), row(2), ...))
In order to map into a RDD
of tuples, that is currently supported.
from spark-hbase-connector.
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.functions.{col, concat, lit}
import it.nerdammer.spark.hbase._
object SparkHBase {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf()
.setAppName("HbaseSpark")
.setMaster("local[*]")
.set("spark.hbase.host", "localhost")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val df = sqlContext
.read
.format("com.databricks.spark.csv")
.option("delimiter", "\001")
.load("/Users/11130/small")
val df1 = df.withColumn("row_key", concat(col("C3"), lit("_"), col("C5"), lit("_"), col("C0")))
df1.registerTempTable("mytable")
val newDf = sqlContext.sql("Select row_key, C0, C1, C2, C3, C4, C5, C6, C7," +
"C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19 from mytable")
val rdd = newDf.rdd
val finalRdd = rdd.map(row => (row(0).toString, row(1).toString, row(2).toString, row(3).toString, row(4).toString, row(5).toString, row(6).toString,
row(7).toString, row(8).toString, row(9).toString, row(10).toString, row(11).toString, row(12).toString, row(13).toString,
row(14).toString, row(15).toString, row(16).toString, row(17).toString, row(18).toString, row(19).toString, row(20).toString))
finalRdd.toHBaseTable("mytable")
.toColumns("event_id", "device_id", "uidx", "session_id", "server_ts", "client_ts", "event_type", "data_set_name",
"screen_name", "card_type", "widget_item_whom", "widget_whom", "widget_v_position", "widget_item0_h_position",
"publisher_tag", "utm_medium", "utm_source", "utmCampaign", "referrer_url", "notificationClass")
.inColumnFamily("mycf")
.save()
sc.stop()
}
}
Thanks, this works. Just needed to cast any to string.
from spark-hbase-connector.
Related Issues (20)
- How is the Performance of this Connector? HOT 1
- Upload package to https://spark-packages.org/
- Can this framework use Java? HOT 1
- there is not spark-hbase-connector_2.11 Could it support scala 2.11? HOT 4
- Running in Spark 2.2
- spark streaming with hbase ERROR HOT 2
- I got this error when running the connector: java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException Any idea ?
- ClusterId read in ZooKeeper is null HOT 2
- Continuously INFO JobScheduler: Added jobs for time HOT 2
- Whether to support kerberos authentication access? HOT 2
- Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)
- how do i apply a customized partitioner to hBaseRDD? HOT 2
- how do i generate a pairRDD?
- IllegalArgumentException: Unexpected number of columns: expected 2 or 1, returned 1 HOT 1
- Pyspark support
- java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD HOT 1
- tuple limit 22 while inserting data into hbase HOT 1
- Supports hbase-1.2.X? HOT 2
- Spark2.4.2, Hbase1.4.9 run error, can not find the class : java.lang.ClassNotFoundException: org.apache.hadoop.hbase.regionserver.StoreFileWriter HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-hbase-connector.