Comments (40)
+1 Spark 1.4.1 is now released... Really appreciate if you can quickly include this one...
Thank you
from emr-bootstrap-actions.
Waiting for this release on AWS EMR... have major bug fixes.
Thanks
from emr-bootstrap-actions.
Waiting as well for 1.4.1 due to the several bug fixes. Thanks
from emr-bootstrap-actions.
+1
from emr-bootstrap-actions.
It's coming...
from emr-bootstrap-actions.
Great............
from emr-bootstrap-actions.
is it there?
from emr-bootstrap-actions.
@christopherbozeman Can you please tell me how much time it'll take?
Thanks
from emr-bootstrap-actions.
Any update on this issue? I'd really appreciate the possibility to use the latest bug-fixed version.. Thanks
from emr-bootstrap-actions.
Spark 1.4.1 is now available as native application with EMR's new release, see https://forums.aws.amazon.com/ann.jspa?annID=3160.
from emr-bootstrap-actions.
Hi Chris,
How to enable the dynamic allocation as it requires to copy shuffle jar and have the following changes in yarn-site.xml (Ref link http://www.slideshare.net/ozax86/spark-on-yarn-with-dynamic-resource-allocation)
yarn.nodemanager.aux-services
spark_shuffle,mapreduce_shuffle
yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService
from emr-bootstrap-actions.
@PKUKILLA @christopherbozeman answered that in #32 with the links:
Please see https://forums.aws.amazon.com/ann.jspa?annID=3160 and
http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-configure.html#spark-dynamic-allocation
from emr-bootstrap-actions.
@christopherbozeman This page needs updating for the dynamic feature because it calls out a fixed instance-count, true?
http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-launch.html
Create the cluster with the following command:
aws emr create-cluster --name "Spark cluster" --release-label emr-4.0.0 --applications Name=Spark --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --use-default-roles
from emr-bootstrap-actions.
Thanks @christopherbozeman. Do you think you are also going to add 1.4.1 support to the
"old" bootstrap action as per this GH project? We are deeply using it, and we are not yet ready to move to Hadoop 2.6 and Hive 1.0. It would be great to both have the "automated" way and the "manual" way to install Spark so to be able to test all the pieces step by step before moving a production system to a new set of upgraded frameworks. Also, can you please give the community any hints about how long this project (emr-bootstrap-action to install Spark) will still be maintained (and offered)? Appreciated. Thanks as always.
from emr-bootstrap-actions.
thanks it works
from emr-bootstrap-actions.
@christopherbozeman Does the EMR 4.0.0 version of Spark contain the patch from christopherbozeman/spark@316b2e0? It doesn't look like it does, as when I insert into a Hive table using Spark SQL, it creates temporary files in S3 and then appears to get stuck when trying to move them to their right place.
from emr-bootstrap-actions.
Considering also the issues presented for Hive (#154) and Ganglia ( #153), is there any possibilities to get Spark 1.4.1 available as bootstrap action (a.k.a. "the usual way") so to get it in the meanwhile working on the 3.8.0 AMI (and Hadoop 2.4)? The upgrade of our system is stuck because of this, since 1.4.0 has know blocking bugs, so no change to move forward from Spark 1.3.1 until you kindly upgrade the emr-bootstrap-action support as well. I think many people would really appreciate it. Thanks.
from emr-bootstrap-actions.
Stuck on the upgrade to Spark 1.4.0 using AMI 3.8.0 due to https://issues.apache.org/jira/browse/SPARK-8368. So we can't move forward neither to 1.4.0 unless we switch to AMI 4.0.0. PLEASE, upgrade the emr-boostrap-action to support Spark 1.4.1 on AMI 3.8.0, this is really a big issue for many people!
from emr-bootstrap-actions.
Same here, would appreciate getting 1.4.1 integrated here while EMR 4.0 matures.
from emr-bootstrap-actions.
Furthermore, we actually CAN'T switch to AMI 4.0.0 since we are leveraging DataPipeline that, obviously, doesn't currently support such AMI version: read https://forums.aws.amazon.com/thread.jspa?messageID=662004 and https://forums.aws.amazon.com/thread.jspa?messageID=658891 for references.
from emr-bootstrap-actions.
+1, I think emr 4.0.0 is not mature enough to replace all application available on AMI 3.8.0.
from emr-bootstrap-actions.
A Spark 1.4.1 is now available for the Spark bootstrap action for EMR AMI 3.x and can be requested by version "1.4.1.a".
from emr-bootstrap-actions.
@christopherbozeman Can you answer my previous question about the EMR 4.0 version of Spark containing christopherbozeman/spark@316b2e0?
from emr-bootstrap-actions.
Hi christopherbozeman,
Thank you for the update. Could you please provide some feedback on the configuration that I am trying to use. My Configuration:
-ami-version : 3.3 (which defaults to 3.3.2)
-spark: 1.4.1.a
etc.
The question is, should I use ami-version 3.3 or should I use the latest ami-version. I want to use emr-4.0.0 release label as it provides spark 1.4.1 but it provides Hadoop 2.6.0 where as I want to use 2.4.0
from emr-bootstrap-actions.
@Sazpaimon I dug into your comment on #142 (comment) and determined that christopherbozeman/spark@316b2e0 is a NOOP (the underlying RDD interaction with Hadoop output format takes care of the S3 direct write). What performs the magic for not creating extra temporary paths when writing to S3 is the code that EMR added to Hive which gets included by the Spark BA installed when -h option is supplied. This is what is missing from EMR release 4.0.0. Also, Spark 1.4 only supports up to Hive 0.13 (https://issues.apache.org/jira/browse/SPARK-8065) the native Spark in EMR release 4.0.0 cannot just use the Hive 1.0 jars in order to fix the issue. I'll report this issue internally with the development team so it is resolved in a future EMR release. At this time, the ugly workaround would be to take the Hive jars from EMR AMI 3.x with Hive 0.13 that is pruned for Spark (~spark/classpath/hive/*), copy to master of a EMR 4.0.0 cluster and then append the jars to the Spark classpath.
from emr-bootstrap-actions.
@rajatdt - why are you avoiding Hadoop 2.6.0?
from emr-bootstrap-actions.
Hi ,
Can you please specify the comment that i have made on this issue. I think that you have the wrong guy here.
Regards
Rajat Dikshit
Sent by Outlook for Android
On Tue, Sep 8, 2015 at 3:42 PM -0700, "Christopher Bozeman" [email protected] wrote:
@rajatdt - why are you avoiding Hadoop 2.6.0?
—
Reply to this email directly or view it on GitHub.
from emr-bootstrap-actions.
@rajatdt - in reference to #142 (comment). Why are you needing to use Hadoop 2.4.0?
from emr-bootstrap-actions.
I was trying to work on a project with outdated instructions. So i started working and i lost track of the updated versions.
Sent by Outlook for Android
On Tue, Sep 8, 2015 at 4:20 PM -0700, "Christopher Bozeman" [email protected] wrote:
@rajatdt - in reference to #142 (comment). Why are you needing to use Hadoop 2.4.0?
—
Reply to this email directly or view it on GitHub.
from emr-bootstrap-actions.
@christopherbozeman Thanks. I know exactly the piece of code you're talking about (I've had to decompile Amazon's Hive distribution for debugging purposes more times than I'd care to admit) and I'll give your suggestion a shot next time I need EMR 4.0
from emr-bootstrap-actions.
@christopherbozeman: Unfortunately I'm facing issues with this v.1.4.1.a
when trying to run (YARN-cluster mode) my Spark driver on EMR with both AMI 3.7.0 and 3.8.0, in particular when trying to create a Hive external table backed on S3 I get:
5/09/09 10:09:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: org.apache.http.params.HttpConnectionParams.setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V
java.lang.NoSuchMethodError: org.apache.http.params.HttpConnectionParams.setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V
at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:95)
at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:198)
at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:132)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:431)
at com.amazon.ws.emr.hadoop.fs.guice.EmrFSProdModule.createAmazonS3Client(EmrFSProdModule.java:125)
at com.amazon.ws.emr.hadoop.fs.guice.EmrFSProdModule.createAmazonS3(EmrFSProdModule.java:165)
at com.amazon.ws.emr.hadoop.fs.guice.EmrFSBaseModule.provideAmazonS3(EmrFSBaseModule.java:81)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:104)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:53)
at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:110)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:94)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:54)
at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1009)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:105)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2445)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2479)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2461)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at org.apache.hadoop.hive.common.FileUtils.isLocalFile(FileUtils.java:430)
at org.apache.hadoop.hive.common.FileUtils.isLocalFile(FileUtils.java:414)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:9887)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9180)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:422)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:128)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755)
at myCompany.myPackage.otherPackage.ReadStuffUsingHive.apply(ReadStuffUsingHive.scala:12)
at myCompany.myPackage.BatchSparkDriver$.main(BatchSparkDriver.scala:200)
at myCompany.myPackage.BatchSparkDriver.mainBatchSparkDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:483)
15/09/09 10:09:57 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: org.apache.http.params.HttpConnectionParams.setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V)
15/09/09 10:09:57 INFO spark.SparkContext: Invoking stop() from shutdown hook
Please, note that the very same app but built for and deployed on Spark 1.3.1 (AMI 3.7.0) use to always work smoothly on EMR. Also, the same app built for Spark 1.4.1 has been successfully run/tested on a private physical cluster (CentOS based, with Hadoop 2.4, Hive 0.13, Java 7, Scala 2.10).
Any hints? Thanks in advance!
from emr-bootstrap-actions.
@erond the error is likely a version conflict/mismatch on dependencies. Can I have your spark-submit arguments?
from emr-bootstrap-actions.
of course @christopherbozeman. I launch the Spark driver as an EmrActivity's step within a DataPipeline:
"step" : ["
s3://elasticmapreduce/libs/script-runner/script-runner.jar,
file:///home/hadoop/spark/bin/spark-submit,
--class,myCompany.myPackage.BatchSparkDriver,
--name,\"BatchSparkDriver on DP #{runsOn.@pipelineId}\",
--files,/home/hadoop/spark/conf/hive-site.xml,
--driver-class-path,/home/hadoop/spark/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/spark/lib/datanucleus-core-3.2.10.jar:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/spark/classpath/emr/mysql-connector-java-5.1.30.jar:hive-site.xml,
--master,yarn-cluster,
--driver-memory,512m,
--num-executors,3,
--executor-memory,2176m,
s3://myCompany-bucket/path/to/my-app-1.2.3-SNAPSHOT.jar,
(then driver's args)
"]
from emr-bootstrap-actions.
@christopher,
Is there any way to use Spark 1.5.0 with EMR?
On Wed, Sep 9, 2015 at 8:43 PM, Roberto Coluccio [email protected]
wrote:
of course @christopherbozeman https://github.com/christopherbozeman. I
launch the Spark driver as an EmrActivity's step within a DataPipeline:"step" : [" s3://elasticmapreduce/libs/script-runner/script-runner.jar, file:///home/hadoop/spark/bin/spark-submit, --class,myCompany.myPackage.BatchSparkDriver, --name,"BatchSparkDriver on DP #{runsOn.@pipelineId}", --files,/home/hadoop/spark/conf/hive-site.xml, --driver-class-path,/home/hadoop/spark/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/spark/lib/datanucleus-core-3.2.10.jar:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/spark/classpath/emr/mysql-connector-java-5.1.30.jar:hive-site.xml, --master,yarn-cluster, --driver-memory,512m, --num-executors,3, --executor-memory,2176m, s3://myCompany-bucket/path/to/my-app-1.2.3-SNAPSHOT.jar, (then driver's args) "]
—
Reply to this email directly or view it on GitHub
#142 (comment)
.
from emr-bootstrap-actions.
@christopherbozeman did anyone else experienced the same I reported when upgrading to 1.4.1.e as the best of your knowledge? You got any advice? Thank you very much.
from emr-bootstrap-actions.
When can we expect Spark 1.5.0 on emr?
from emr-bootstrap-actions.
@erond Please try build 1.4.1.b that pushed with #163 to see if it resolves the issue.
from emr-bootstrap-actions.
@njvijay and @PKUKILLA see issue #160 regarding Spark 1.5.
from emr-bootstrap-actions.
@christopherbozeman thanks for your update. Unfortunately, it still fails with the very same error, with both AMI 3.7.0 and 3.8.0.
from emr-bootstrap-actions.
Hi there - thanks for your contribution. We're updating this repository to include more relevant and recent information.
As such, we're cleaning up and closing old issues and PRs.
Feel free to open an issue if you still use EMR and would like to see an example of something!
from emr-bootstrap-actions.
Related Issues (20)
- bootstrapping opentsdb using emr-4.6.0, HBASE_HOME issue HOT 1
- Support Scala 2.11 HOT 1
- zookeeper version is invalid HOT 1
- Installing latest version of Impala on EMR HOT 10
- Permission denied error AMI 3.11.0 HOT 1
- Bootstrap for Apache Kylin HOT 3
- is there any plan to create one BA for JCE? HOT 1
- Error downloading file from Amazon S3 HOT 4
- Kafka support on EMR 5.x HOT 2
- Support jupyter notebook HOT 1
- Reading LZO files from Spark stand alone program HOT 1
- Persto 0.157.1 in EMR is facing issues regarding client side encryption AWS KMS Master Key HOT 1
- running an s3 jar file with dependencies HOT 1
- Installing latest version of Impala on EMR HOT 1
- Bootstrap for Sentry HOT 1
- Add bootstrap script to install netdata HOT 1
- sudo R command not found, when using the emR_bootstrap.sh
- Error while reading core-site.xml in elasticsearch bootstrap action HOT 1
- EMR cluster fails at boot strap HOT 1
- Bootstrap has execute failed to my shell script file on S3 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emr-bootstrap-actions.