GithubHelp home page GithubHelp logo

Comments (27)

milescrawford avatar milescrawford commented on May 26, 2024 2

Hi guys, I've just got spark with scala 2.11 up and running on EMR - no trouble yet, but I haven't done much heavy lifting.

A few steps:

  1. I use a bootstrap action to upgrade to java8. https://gist.github.com/pstorch/c217d8324c4133a003c4

  2. Follow the standard instructions for building a Spark jar with yarn support and scala 2.11: http://spark.apache.org/docs/latest/building-spark.html

  3. Then, since EMR runs YARN for you, I've just been submitting jobs using the spark-submit script, while specifying my custom build as the spark-yarn-jar option.

This sidesteps Amazon's concept of "steps", so might not be practical for everyone, but seems to be working for me. Let me know if there's some fatal flaw in this plan I haven't run into yet! :)

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

So are you thinking an option that allows for opt-in to a Spark build with scala 2.11?

from emr-bootstrap-actions.

derrickburns avatar derrickburns commented on May 26, 2024

Yes, please.

On Thu, Jan 15, 2015 at 10:41 AM, Christopher Bozeman <
[email protected]> wrote:

So are you thinking an option that allows for opt-in to a Spark build with
scala 2.11?


Reply to this email directly or view it on GitHub
#44 (comment)
.

from emr-bootstrap-actions.

derrickburns avatar derrickburns commented on May 26, 2024

Right now, the only thing gating my adoption of 2.11 is the absence of EMR
support for Spark on 2.11. :)

On Thu, Jan 15, 2015 at 10:45 AM, Derrick Burns [email protected]
wrote:

Yes, please.

On Thu, Jan 15, 2015 at 10:41 AM, Christopher Bozeman <
[email protected]> wrote:

So are you thinking an option that allows for opt-in to a Spark build
with scala 2.11?


Reply to this email directly or view it on GitHub
#44 (comment)
.

from emr-bootstrap-actions.

aniketbhatnagar avatar aniketbhatnagar commented on May 26, 2024

+1 for this. Current workaround is to build your own spark assembly with scala 2.11 and use the user provided jar option (-u) in the spark emr bootstrap script to place it ahead of all other dependencies.

from emr-bootstrap-actions.

schmmd avatar schmmd commented on May 26, 2024

This is gating us too.

from emr-bootstrap-actions.

schmmd avatar schmmd commented on May 26, 2024

@aniketbhatnagar does your workaround work well for you?

from emr-bootstrap-actions.

derrickburns avatar derrickburns commented on May 26, 2024

No

Sent from my iPhone

On Feb 9, 2015, at 9:13 AM, Michael Schmitz [email protected] wrote:

@aniketbhatnagar does your workaround work well for you?


Reply to this email directly or view it on GitHub.

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

I have an experimental Spark 1.2.1 built with Scala 2.11 for testing. Request version 1.2.1.a-2.11.

Example:

aws --region us-east-1 emr create-cluster --name spark-ami332-121a-scala-2.11 --ami-version 3.3 --instance-type m3.xlarge --instance-count 3 --use-default-role --ec2-attributes KeyName=<blah>,SubnetId=<blah> --applications Name=Hive --bootstrap-actions Name=InstallSpark,Path=s3://support.elasticmapreduce/spark/install-spark,Args=[-x,-v,1.2.1.a-2.11] 

from emr-bootstrap-actions.

derrickburns avatar derrickburns commented on May 26, 2024

Thanks!

On Fri, Feb 13, 2015 at 12:09 PM, Christopher Bozeman <
[email protected]> wrote:

I have an experimental Spark 1.2.1 built with Scala 2.11 for testing.
Request version 1.2.1.a-2.11.

Example:

aws --region eu-central-1 emr create-cluster --name spark-ami332-121a-scala-2.11 --ami-version 3.3 --instance-type m3.xlarge --instance-count 3 --use-default-role --ec2-attributes KeyName=,SubnetId= --applications Name=Hive --bootstrap-actions Name=InstallSpark,Path=s3://support.elasticmapreduce/spark/install-spark,Args=[-x,-v,1.2.1.a-2.11]


Reply to this email directly or view it on GitHub
#44 (comment)
.

from emr-bootstrap-actions.

Sazpaimon avatar Sazpaimon commented on May 26, 2024

Why does the 2.11 installer comment out copying the hive libraries to the classpath? Doing this makes Spark use its builtin hive libraries which do not have the improvements of the EMR hive distribution

Also, I don't know if this is an issue with the standard 1.2.1.a build, but spark-assembly-1.2.1-hadoop2.4.0.jar cannot be used for pyspark unless it's repacked with Java 1.6

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

@Sazpaimon - there was a conflict that would break the 2.11 build. I have not yet returned to investigate further given the experimental status of Spark with Scala 2.11.

from emr-bootstrap-actions.

Sazpaimon avatar Sazpaimon commented on May 26, 2024

It seems that all of the other install scripts also comment out the hive libraries. For example https://s3.amazonaws.com/support.elasticmapreduce/spark/install-spark-script.py which is for stable versions, has the hive lib copy commented out. Doing so prevents me from reading my ORC tables created with Hive on EMR unless I copy the libraries (unless there's some spark-sql setting I'm missing)

from emr-bootstrap-actions.

kekedie avatar kekedie commented on May 26, 2024

+1

from emr-bootstrap-actions.

tperrigo avatar tperrigo commented on May 26, 2024

Just wanted to check on the status of supporting Scala 2.11...When I saw the examples on the project home page using --ami-version 3.6 (which, according to http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/ami-versions-supported.html, has Scala 2.11.1), I tried submitting a job built for 2.11 (Spark 1.3.0), but ran into version conflict issues. Is there a work around for this, in the meantime?

Thanks!

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

At this point Spark is built with Scala 2.10 and it is installed by the bootstrap as well.

The experimental 2.11 build can be requested with version 1.2.1.a-2.11. See a few comments earlier. At this point, Spark with 2.11 breaks too many items to be the default build.

from emr-bootstrap-actions.

stmcpherson avatar stmcpherson commented on May 26, 2024

Can we close this issue?

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

Closing issue. Spark on Scala 2.11 is experimental only given its experimental status with the Apache project itself. For those interested in Spark on Scala 2.11, please see https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/building-spark-for-emr.md as a starting point for custom build.

from emr-bootstrap-actions.

christopherbozeman avatar christopherbozeman commented on May 26, 2024

@Sazpaimon in regards to EMR Hive jars, see readme and pull request #93, added feature of "-h" install-script argument to elect to include EMR Hive jars in the Spark classpath.

from emr-bootstrap-actions.

luisobo avatar luisobo commented on May 26, 2024

@christopherbozeman hi, scala 2.11 is no longer experimental as of 1.3.0. Would make sense to add support for 2.11 now?

Happy to help. Thanks!

from emr-bootstrap-actions.

schmmd avatar schmmd commented on May 26, 2024

Do we have support for Scala 2.11 yet in EMR? It's already the default on the master branch.

from emr-bootstrap-actions.

adamnisenbaum avatar adamnisenbaum commented on May 26, 2024

+1 for scala 2.11 on spark in emr

from emr-bootstrap-actions.

darkone23 avatar darkone23 commented on May 26, 2024

Just chiming in to say that I also use custom bootstrap hacks to get EMR to support spark and scala 2.11 - now that it is the default in spark land it would be great to see it supported on EMR.

from emr-bootstrap-actions.

pjgg avatar pjgg commented on May 26, 2024

+1

from emr-bootstrap-actions.

strias avatar strias commented on May 26, 2024

+1

from emr-bootstrap-actions.

omiddjoudi avatar omiddjoudi commented on May 26, 2024

+1

from emr-bootstrap-actions.

arvy avatar arvy commented on May 26, 2024

+1

from emr-bootstrap-actions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.