GithubHelp home page GithubHelp logo

spark-in-action / first-edition Goto Github PK

View Code? Open in Web Editor NEW
272.0 42.0 189.0 8.1 MB

The book's repo

Home Page: https://www.manning.com/books/spark-in-action

Shell 0.30% Scala 36.98% Python 27.29% Java 31.67% HTML 0.92% JavaScript 2.84%

first-edition's Introduction

Spark in Action book repository

Current edition: Manning Early Access Program (MEAP)

The MEAP publishing date is 2015.04.04.
Manning's book forum: https://forums.manning.com/forums/spark-in-action

The repo contains book listings organized by chapter and programming language (if applicable):

ch02
  ├── scala
  │    ├── ch02-listings.scala
  │    ├── scala-file.scala
  │    └── ...
  ├── java
  │    ├── Class1.java
  │    ├── Class2.java
  │    └── ...
  ├── python
  │    ├── ch02-listings.py
  │    ├── python-file.py
  │    └── ...
ch03
  ├── scala
  │    ├── ch03-listings.scala
  │    ├── scala-file.scala
  │    └── ...
  ├── java
  │    ├── Class1.java
  │    ├── Class2.java
  │    └── ...
  ├── python
  │    ├── ch03-listings.py
  │    ├── python-file.py
  │    └── ...

We tried to organize the listings so that you have minimum distractions while going through the book.
We hope you'll find the content useful and that you'll have fun reading the book and going through examples.

As part of Manning's "in action" series, it's a hands-on, tutorial-style book.

Thank you,
Petar Zečević and Marko Bonaći

first-edition's People

Contributors

mbonaci avatar zecevicp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

first-edition's Issues

why files in ch06output/output-*.txt are empty!

`import org.apache.spark._
import org.apache.spark.streaming._

val ssc = new StreamingContext(sc, Seconds(5))

val filestream = ssc.textFileStream("/home/spark/ch06input")

import java.sql.Timestamp
case class Order(time: java.sql.Timestamp, orderId:Long, clientId:Long, symbol:String, amount:Int, price:Double, buy:Boolean)

import java.text.SimpleDateFormat
val orders = filestream.flatMap(line => {
val dateFormat = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss")
val s = line.split(",")
try {
assert(s(6) == "B" || s(6) == "S")
List(Order(new Timestamp(dateFormat.parse(s(0)).getTime()), s(1).toLong, s(2).toLong, s(3), s(4).toInt, s(5).toDouble, s(6) == "B"))
}
catch {
case e : Throwable => println("Wrong line format ("+e+"): "+line)
List()
}
})

val numPerType = orders.map(o => (o.buy, 1L)).reduceByKey((c1, c2) => c1+c2)

numPerType.repartition(1).saveAsTextFiles("/home/spark/ch06output/output", "txt")

ssc.start()`

I've been following the book in chapter06 step by step, 1. spark-shell --master local[*]. 2. create ssc. 3. ssc.start. 4. excute the shell script. 5. waiting for results. Since I'm in the spark-shell (in VM of course), but couldn't get any results of this case study, all files in ch06output/ directory are empty. Don't know why, anyone can help me...

installation of Vagrant

Section 1.5.1 (Downloading and starting the VM) talks about installing Oracle VirtualBox and Vagrant.
I already have VirtualBox running. Do I need to install Vagrant on my Mac or in VirtualBox.

Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

While doing spark-submit for the realtime dashboard application, i get following error 👍
spark@spark-in-action:~/uc1-docker$ ./run-all.sh
Zookeeper already started
Kafka already started
sia-dashboard already running
Submitting Spark job
Starting Kafka direct stream to broker list: 192.168.10.2:9092
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at kafka.utils.Pool.(Pool.scala:28)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala:60)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala)
at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39)
at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:52)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:345)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:342)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at org.apache.spark.streaming.kafka.KafkaCluster.org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:342)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:125)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at org.sia.loganalyzer.StreamingLogAnalyzer$.main(StreamingLogAnalyzer.scala:76)
at org.sia.loganalyzer.StreamingLogAnalyzer.main(StreamingLogAnalyzer.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 25 more

Discretized stream - Number of buy or Sell per second ?

The task is to count the number of buy and sell orders per second.

The code example does not take into account the time stamp at all. How is it possible to know that what is reduced is actually within a second. What i understood was that the mini-batch was every 3 seconds.

Honestly i am quite confused, when we say the number of sell and buy per second, what do we mean exactly ? Do we mean buy and sell that have a time stamp that fall within 1 second of distance, do we mean what we get per second, independently of the time stamp ?

Can this be at least clarified ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.