GithubHelp home page GithubHelp logo

Comments (3)

christopherbozeman avatar christopherbozeman commented on May 26, 2024

The bootstrap assumes you are using YARN and comes ready. It still requires submitting the app with master set to 'yarn' (https://spark.apache.org/docs/latest/submitting-applications.html).

What problem are you running into specifically?

from emr-bootstrap-actions.

ankurmitujjain avatar ankurmitujjain commented on May 26, 2024

I am trying to run spark-examples (JavaKinesisWordCountASLYARN)
I had created EMR and used spark-install bootstrap to install spark 1.3.0c

I had also updated "EMR_DefaultRole" and attached " AdministratorAccess" and "AmazonKinesisFullAccess" to it.

iam roles

I read example running instruction in Java file..

/home/hadoop/spark/examples/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASLYARN.java

Example:
 *      $ export AWS_ACCESS_KEY_ID=<your-access-key>
 *      $ export AWS_SECRET_KEY=<your-secret-key>
 *      $ $SPARK_HOME/bin/run-example \
 *            org.apache.spark.examples.streaming.JavaKinesisWordCountASLYARN mySparkStream \
 *            https://kinesis.us-east-1.amazonaws.com
 *
 * There is a companion helper class called KinesisWordCountProducerASL which puts dummy data
 *   onto the Kinesis stream.
 * Usage instructions for KinesisWordCountProducerASL are provided in the class                                                                                         definition.
 */

I first created kinesis stream "mySparkStream"

and using putty ran below command on EMR master node....

set aws keys

Used companion helper class called KinesisWordCountProducerASL which puts dummy data

msg producer

Verified that records are written within kinesis stream/shard
record counts in kinesis

After writing data I ran below command

./run-example  org.apache.spark.examples.streaming.JavaKinesisWordCountASLYARN mySparkStream https://kinesis.us-east-1.amazonaws.com

and I got below error...

[hadoop@ip-10-76-215-114 bin]$ ./run-example  org.apache.spark.examples.streaming.JavaKinesisWordCountASLYARN mySparkStream https://kinesis.us-east-1.amazonaws.com
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/03/27 09:46:09 INFO spark.SparkContext: Running Spark version 1.3.0
15/03/27 09:46:09 WARN spark.SparkConf:
SPARK_CLASSPATH was detected (set to '/home/hadoop/spark/conf:/home/hadoop/conf:/home/hadoop/spark/classpath/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath

15/03/27 09:46:09 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/spark/conf:/home/hadoop/conf:/home/hadoop/spark/classpath/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar' as a work-around.
15/03/27 09:46:09 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/spark/conf:/home/hadoop/conf:/home/hadoop/spark/classpath/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar' as a work-around.
15/03/27 09:46:10 INFO spark.SecurityManager: Changing view acls to: hadoop
15/03/27 09:46:10 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/03/27 09:46:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/03/27 09:46:11 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/03/27 09:46:11 INFO Remoting: Starting remoting
15/03/27 09:46:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:53585]
15/03/27 09:46:11 INFO util.Utils: Successfully started service 'sparkDriver' on port 53585.
15/03/27 09:46:11 INFO spark.SparkEnv: Registering MapOutputTracker
15/03/27 09:46:11 INFO spark.SparkEnv: Registering BlockManagerMaster
15/03/27 09:46:11 INFO storage.DiskBlockManager: Created local directory at /mnt/spark/spark-deaa2da9-53bc-4ceb-9659-25ce2ac1904e/blockmgr-1ce48168-1548-4dc3-86e8-ff2ff4f3bad5
15/03/27 09:46:12 INFO storage.MemoryStore: MemoryStore started with capacity 265.4 MB
15/03/27 09:46:12 INFO spark.HttpFileServer: HTTP File server directory is /mnt/spark/spark-0a9ca43d-0d92-4959-9d0a-1bf34537c6fc/httpd-89729846-8a4f-4b20-b7bb-086590343ee4
15/03/27 09:46:12 INFO spark.HttpServer: Starting HTTP Server
15/03/27 09:46:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/03/27 09:46:12 INFO server.AbstractConnector: Started [email protected]:52041
15/03/27 09:46:12 INFO util.Utils: Successfully started service 'HTTP file server' on port 52041.
15/03/27 09:46:12 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/03/27 09:46:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/03/27 09:46:12 INFO server.AbstractConnector: Started [email protected]:4040
15/03/27 09:46:12 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/03/27 09:46:12 INFO ui.SparkUI: Started SparkUI at http://ip-10-76-215-114.ec2.internal:4040
15/03/27 09:46:13 INFO spark.SparkContext: Added JAR file:/home/hadoop/spark/lib/spark-examples-1.3.0-hadoop2.4.0.jar at http://10.76.215.114:52041/jars/spark-examples-1.3.0-hadoop2.4.0.jar with timestamp 1427449573470
15/03/27 09:46:13 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
15/03/27 09:46:13 ERROR cluster.YarnClusterSchedulerBackend: Application ID is not set.
15/03/27 09:46:14 INFO netty.NettyBlockTransferService: Server created on 57600
15/03/27 09:46:14 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/03/27 09:46:14 INFO storage.BlockManagerMasterActor: Registering block manager ip-10-76-215-114.ec2.internal:57600 with 265.4 MB RAM, BlockManagerId(<driver>, ip-10-76-215-114.ec2.internal, 57600)
15/03/27 09:46:14 INFO storage.BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.NullPointerException
        at org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:581)
        at org.apache.spark.scheduler.cluster.YarnClusterScheduler.postStartHook(YarnClusterScheduler.scala:32)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
        at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:642)
        at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:75)
        at org.apache.spark.streaming.api.java.JavaStreamingContext.<init>(JavaStreamingContext.scala:132)
        at org.apache.spark.examples.streaming.JavaKinesisWordCountASLYARN.main(JavaKinesisWordCountASLYARN.java:127)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[hadoop@ip-10-76-215-114 bin]$

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

This is a duplicate of issue 80. JavaKinesisWordCountASLYARN was added back when the examples hard coded the master to local . Given apache/spark@d16e161 this direct example is no longer valid and better form is to follow the stock example which does not set the master in code.

from emr-bootstrap-actions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.