GithubHelp home page GithubHelp logo

Comments (40)

ankurmitujjain avatar ankurmitujjain commented on May 26, 2024

+1 Spark 1.4.1 is now released... Really appreciate if you can quickly include this one...

Thank you

from emr-bootstrap-actions.

mkanchwala avatar mkanchwala commented on May 26, 2024

Waiting for this release on AWS EMR... have major bug fixes.

Thanks

from emr-bootstrap-actions.

erond avatar erond commented on May 26, 2024

Waiting as well for 1.4.1 due to the several bug fixes. Thanks

from emr-bootstrap-actions.

MattFlower avatar MattFlower commented on May 26, 2024

+1

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

It's coming...

from emr-bootstrap-actions.

ankurmitujjain avatar ankurmitujjain commented on May 26, 2024

Great............

from emr-bootstrap-actions.

ankurmitujjain avatar ankurmitujjain commented on May 26, 2024

is it there?

from emr-bootstrap-actions.

mkanchwala avatar mkanchwala commented on May 26, 2024

@christopherbozeman Can you please tell me how much time it'll take?

Thanks

from emr-bootstrap-actions.

erond avatar erond commented on May 26, 2024

Any update on this issue? I'd really appreciate the possibility to use the latest bug-fixed version.. Thanks

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

Spark 1.4.1 is now available as native application with EMR's new release, see https://forums.aws.amazon.com/ann.jspa?annID=3160.

from emr-bootstrap-actions.

PKUKILLA avatar PKUKILLA commented on May 26, 2024

Hi Chris,
How to enable the dynamic allocation as it requires to copy shuffle jar and have the following changes in yarn-site.xml (Ref link http://www.slideshare.net/ozax86/spark-on-yarn-with-dynamic-resource-allocation)
 
            yarn.nodemanager.aux-services
            spark_shuffle,mapreduce_shuffle
       

       
            yarn.nodemanager.aux-services.spark_shuffle.class
            org.apache.spark.network.yarn.YarnShuffleService
       
       

from emr-bootstrap-actions.

jkleckner avatar jkleckner commented on May 26, 2024

@PKUKILLA @christopherbozeman answered that in #32 with the links:
Please see https://forums.aws.amazon.com/ann.jspa?annID=3160 and
http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-configure.html#spark-dynamic-allocation

from emr-bootstrap-actions.

jkleckner avatar jkleckner commented on May 26, 2024

@christopherbozeman This page needs updating for the dynamic feature because it calls out a fixed instance-count, true?

http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-launch.html

Create the cluster with the following command:

aws emr create-cluster --name "Spark cluster" --release-label emr-4.0.0 --applications Name=Spark --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --use-default-roles

from emr-bootstrap-actions.

erond avatar erond commented on May 26, 2024

Thanks @christopherbozeman. Do you think you are also going to add 1.4.1 support to the
"old" bootstrap action as per this GH project? We are deeply using it, and we are not yet ready to move to Hadoop 2.6 and Hive 1.0. It would be great to both have the "automated" way and the "manual" way to install Spark so to be able to test all the pieces step by step before moving a production system to a new set of upgraded frameworks. Also, can you please give the community any hints about how long this project (emr-bootstrap-action to install Spark) will still be maintained (and offered)? Appreciated. Thanks as always.

from emr-bootstrap-actions.

PKUKILLA avatar PKUKILLA commented on May 26, 2024

thanks it works

from emr-bootstrap-actions.

Sazpaimon avatar Sazpaimon commented on May 26, 2024

@christopherbozeman Does the EMR 4.0.0 version of Spark contain the patch from christopherbozeman/spark@316b2e0? It doesn't look like it does, as when I insert into a Hive table using Spark SQL, it creates temporary files in S3 and then appears to get stuck when trying to move them to their right place.

from emr-bootstrap-actions.

erond avatar erond commented on May 26, 2024

Considering also the issues presented for Hive (#154) and Ganglia ( #153), is there any possibilities to get Spark 1.4.1 available as bootstrap action (a.k.a. "the usual way") so to get it in the meanwhile working on the 3.8.0 AMI (and Hadoop 2.4)? The upgrade of our system is stuck because of this, since 1.4.0 has know blocking bugs, so no change to move forward from Spark 1.3.1 until you kindly upgrade the emr-bootstrap-action support as well. I think many people would really appreciate it. Thanks.

from emr-bootstrap-actions.

erond avatar erond commented on May 26, 2024

Stuck on the upgrade to Spark 1.4.0 using AMI 3.8.0 due to https://issues.apache.org/jira/browse/SPARK-8368. So we can't move forward neither to 1.4.0 unless we switch to AMI 4.0.0. PLEASE, upgrade the emr-boostrap-action to support Spark 1.4.1 on AMI 3.8.0, this is really a big issue for many people!

from emr-bootstrap-actions.

knowak avatar knowak commented on May 26, 2024

Same here, would appreciate getting 1.4.1 integrated here while EMR 4.0 matures.

from emr-bootstrap-actions.

erond avatar erond commented on May 26, 2024

Furthermore, we actually CAN'T switch to AMI 4.0.0 since we are leveraging DataPipeline that, obviously, doesn't currently support such AMI version: read https://forums.aws.amazon.com/thread.jspa?messageID=662004 and https://forums.aws.amazon.com/thread.jspa?messageID=658891 for references.

from emr-bootstrap-actions.

ankurmitujjain avatar ankurmitujjain commented on May 26, 2024

+1, I think emr 4.0.0 is not mature enough to replace all application available on AMI 3.8.0.

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

A Spark 1.4.1 is now available for the Spark bootstrap action for EMR AMI 3.x and can be requested by version "1.4.1.a".

from emr-bootstrap-actions.

Sazpaimon avatar Sazpaimon commented on May 26, 2024

@christopherbozeman Can you answer my previous question about the EMR 4.0 version of Spark containing christopherbozeman/spark@316b2e0?

from emr-bootstrap-actions.

rajatdt avatar rajatdt commented on May 26, 2024

Hi christopherbozeman,

Thank you for the update. Could you please provide some feedback on the configuration that I am trying to use. My Configuration:
-ami-version : 3.3 (which defaults to 3.3.2)
-spark: 1.4.1.a
etc.
The question is, should I use ami-version 3.3 or should I use the latest ami-version. I want to use emr-4.0.0 release label as it provides spark 1.4.1 but it provides Hadoop 2.6.0 where as I want to use 2.4.0

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

@Sazpaimon I dug into your comment on #142 (comment) and determined that christopherbozeman/spark@316b2e0 is a NOOP (the underlying RDD interaction with Hadoop output format takes care of the S3 direct write). What performs the magic for not creating extra temporary paths when writing to S3 is the code that EMR added to Hive which gets included by the Spark BA installed when -h option is supplied. This is what is missing from EMR release 4.0.0. Also, Spark 1.4 only supports up to Hive 0.13 (https://issues.apache.org/jira/browse/SPARK-8065) the native Spark in EMR release 4.0.0 cannot just use the Hive 1.0 jars in order to fix the issue. I'll report this issue internally with the development team so it is resolved in a future EMR release. At this time, the ugly workaround would be to take the Hive jars from EMR AMI 3.x with Hive 0.13 that is pruned for Spark (~spark/classpath/hive/*), copy to master of a EMR 4.0.0 cluster and then append the jars to the Spark classpath.

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

@rajatdt - why are you avoiding Hadoop 2.6.0?

from emr-bootstrap-actions.

rajatdt avatar rajatdt commented on May 26, 2024

Hi ,

Can you please specify the comment that i have made on this issue. I think that you have the wrong guy here.

Regards

Rajat Dikshit

Sent by Outlook for Android

On Tue, Sep 8, 2015 at 3:42 PM -0700, "Christopher Bozeman" [email protected] wrote:

@rajatdt - why are you avoiding Hadoop 2.6.0?


Reply to this email directly or view it on GitHub.

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

@rajatdt - in reference to #142 (comment). Why are you needing to use Hadoop 2.4.0?

from emr-bootstrap-actions.

rajatdt avatar rajatdt commented on May 26, 2024

I was trying to work on a project with outdated instructions. So i started working and i lost track of the updated versions.

Sent by Outlook for Android

On Tue, Sep 8, 2015 at 4:20 PM -0700, "Christopher Bozeman" [email protected] wrote:

@rajatdt - in reference to #142 (comment). Why are you needing to use Hadoop 2.4.0?


Reply to this email directly or view it on GitHub.

from emr-bootstrap-actions.

Sazpaimon avatar Sazpaimon commented on May 26, 2024

@christopherbozeman Thanks. I know exactly the piece of code you're talking about (I've had to decompile Amazon's Hive distribution for debugging purposes more times than I'd care to admit) and I'll give your suggestion a shot next time I need EMR 4.0

from emr-bootstrap-actions.

erond avatar erond commented on May 26, 2024

@christopherbozeman: Unfortunately I'm facing issues with this v.1.4.1.a when trying to run (YARN-cluster mode) my Spark driver on EMR with both AMI 3.7.0 and 3.8.0, in particular when trying to create a Hive external table backed on S3 I get:

5/09/09 10:09:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: org.apache.http.params.HttpConnectionParams.setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V
java.lang.NoSuchMethodError: org.apache.http.params.HttpConnectionParams.setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V
at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:95)
at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:198)
at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:132)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:431)
at com.amazon.ws.emr.hadoop.fs.guice.EmrFSProdModule.createAmazonS3Client(EmrFSProdModule.java:125)
at com.amazon.ws.emr.hadoop.fs.guice.EmrFSProdModule.createAmazonS3(EmrFSProdModule.java:165)
at com.amazon.ws.emr.hadoop.fs.guice.EmrFSBaseModule.provideAmazonS3(EmrFSBaseModule.java:81)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:104)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:53)
at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:110)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:94)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:54)
at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1009)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:105)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2445)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2479)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2461)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at org.apache.hadoop.hive.common.FileUtils.isLocalFile(FileUtils.java:430)
at org.apache.hadoop.hive.common.FileUtils.isLocalFile(FileUtils.java:414)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:9887)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9180)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:422)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:128)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755)
at myCompany.myPackage.otherPackage.ReadStuffUsingHive.apply(ReadStuffUsingHive.scala:12)
at myCompany.myPackage.BatchSparkDriver$.main(BatchSparkDriver.scala:200)
at myCompany.myPackage.BatchSparkDriver.mainBatchSparkDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:483)
15/09/09 10:09:57 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: org.apache.http.params.HttpConnectionParams.setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V)
15/09/09 10:09:57 INFO spark.SparkContext: Invoking stop() from shutdown hook

Please, note that the very same app but built for and deployed on Spark 1.3.1 (AMI 3.7.0) use to always work smoothly on EMR. Also, the same app built for Spark 1.4.1 has been successfully run/tested on a private physical cluster (CentOS based, with Hadoop 2.4, Hive 0.13, Java 7, Scala 2.10).

Any hints? Thanks in advance!

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

@erond the error is likely a version conflict/mismatch on dependencies. Can I have your spark-submit arguments?

from emr-bootstrap-actions.

erond avatar erond commented on May 26, 2024

of course @christopherbozeman. I launch the Spark driver as an EmrActivity's step within a DataPipeline:

"step" : ["
   s3://elasticmapreduce/libs/script-runner/script-runner.jar,
   file:///home/hadoop/spark/bin/spark-submit,
    --class,myCompany.myPackage.BatchSparkDriver,
    --name,\"BatchSparkDriver on DP #{runsOn.@pipelineId}\",
    --files,/home/hadoop/spark/conf/hive-site.xml,
    --driver-class-path,/home/hadoop/spark/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/spark/lib/datanucleus-core-3.2.10.jar:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/spark/classpath/emr/mysql-connector-java-5.1.30.jar:hive-site.xml,
    --master,yarn-cluster,
    --driver-memory,512m,
    --num-executors,3,
    --executor-memory,2176m,
    s3://myCompany-bucket/path/to/my-app-1.2.3-SNAPSHOT.jar,
    (then driver's args)
  "]

from emr-bootstrap-actions.

PKUKILLA avatar PKUKILLA commented on May 26, 2024

@christopher,
Is there any way to use Spark 1.5.0 with EMR?

On Wed, Sep 9, 2015 at 8:43 PM, Roberto Coluccio [email protected]
wrote:

of course @christopherbozeman https://github.com/christopherbozeman. I
launch the Spark driver as an EmrActivity's step within a DataPipeline:

"step" : [" s3://elasticmapreduce/libs/script-runner/script-runner.jar, file:///home/hadoop/spark/bin/spark-submit, --class,myCompany.myPackage.BatchSparkDriver, --name,"BatchSparkDriver on DP #{runsOn.@pipelineId}", --files,/home/hadoop/spark/conf/hive-site.xml, --driver-class-path,/home/hadoop/spark/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/spark/lib/datanucleus-core-3.2.10.jar:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/spark/classpath/emr/mysql-connector-java-5.1.30.jar:hive-site.xml, --master,yarn-cluster, --driver-memory,512m, --num-executors,3, --executor-memory,2176m, s3://myCompany-bucket/path/to/my-app-1.2.3-SNAPSHOT.jar, (then driver's args) "]


Reply to this email directly or view it on GitHub
#142 (comment)
.

from emr-bootstrap-actions.

erond avatar erond commented on May 26, 2024

@christopherbozeman did anyone else experienced the same I reported when upgrading to 1.4.1.e as the best of your knowledge? You got any advice? Thank you very much.

from emr-bootstrap-actions.

njvijay avatar njvijay commented on May 26, 2024

When can we expect Spark 1.5.0 on emr?

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

@erond Please try build 1.4.1.b that pushed with #163 to see if it resolves the issue.

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

@njvijay and @PKUKILLA see issue #160 regarding Spark 1.5.

from emr-bootstrap-actions.

erond avatar erond commented on May 26, 2024

@christopherbozeman thanks for your update. Unfortunately, it still fails with the very same error, with both AMI 3.7.0 and 3.8.0.

from emr-bootstrap-actions.

dacort avatar dacort commented on May 26, 2024

Hi there - thanks for your contribution. We're updating this repository to include more relevant and recent information.

As such, we're cleaning up and closing old issues and PRs.

Feel free to open an issue if you still use EMR and would like to see an example of something!

from emr-bootstrap-actions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.