Comments (27)
Hi guys, I've just got spark with scala 2.11 up and running on EMR - no trouble yet, but I haven't done much heavy lifting.
A few steps:
-
I use a bootstrap action to upgrade to java8. https://gist.github.com/pstorch/c217d8324c4133a003c4
-
Follow the standard instructions for building a Spark jar with yarn support and scala 2.11: http://spark.apache.org/docs/latest/building-spark.html
-
Then, since EMR runs YARN for you, I've just been submitting jobs using the
spark-submit
script, while specifying my custom build as the spark-yarn-jar option.
This sidesteps Amazon's concept of "steps", so might not be practical for everyone, but seems to be working for me. Let me know if there's some fatal flaw in this plan I haven't run into yet! :)
from emr-bootstrap-actions.
So are you thinking an option that allows for opt-in to a Spark build with scala 2.11?
from emr-bootstrap-actions.
Yes, please.
On Thu, Jan 15, 2015 at 10:41 AM, Christopher Bozeman <
[email protected]> wrote:
So are you thinking an option that allows for opt-in to a Spark build with
scala 2.11?—
Reply to this email directly or view it on GitHub
#44 (comment)
.
from emr-bootstrap-actions.
Right now, the only thing gating my adoption of 2.11 is the absence of EMR
support for Spark on 2.11. :)
On Thu, Jan 15, 2015 at 10:45 AM, Derrick Burns [email protected]
wrote:
Yes, please.
On Thu, Jan 15, 2015 at 10:41 AM, Christopher Bozeman <
[email protected]> wrote:So are you thinking an option that allows for opt-in to a Spark build
with scala 2.11?—
Reply to this email directly or view it on GitHub
#44 (comment)
.
from emr-bootstrap-actions.
+1 for this. Current workaround is to build your own spark assembly with scala 2.11 and use the user provided jar option (-u) in the spark emr bootstrap script to place it ahead of all other dependencies.
from emr-bootstrap-actions.
This is gating us too.
from emr-bootstrap-actions.
@aniketbhatnagar does your workaround work well for you?
from emr-bootstrap-actions.
No
Sent from my iPhone
On Feb 9, 2015, at 9:13 AM, Michael Schmitz [email protected] wrote:
@aniketbhatnagar does your workaround work well for you?
—
Reply to this email directly or view it on GitHub.
from emr-bootstrap-actions.
I have an experimental Spark 1.2.1 built with Scala 2.11 for testing. Request version 1.2.1.a-2.11.
Example:
aws --region us-east-1 emr create-cluster --name spark-ami332-121a-scala-2.11 --ami-version 3.3 --instance-type m3.xlarge --instance-count 3 --use-default-role --ec2-attributes KeyName=<blah>,SubnetId=<blah> --applications Name=Hive --bootstrap-actions Name=InstallSpark,Path=s3://support.elasticmapreduce/spark/install-spark,Args=[-x,-v,1.2.1.a-2.11]
from emr-bootstrap-actions.
Thanks!
On Fri, Feb 13, 2015 at 12:09 PM, Christopher Bozeman <
[email protected]> wrote:
I have an experimental Spark 1.2.1 built with Scala 2.11 for testing.
Request version 1.2.1.a-2.11.Example:
aws --region eu-central-1 emr create-cluster --name spark-ami332-121a-scala-2.11 --ami-version 3.3 --instance-type m3.xlarge --instance-count 3 --use-default-role --ec2-attributes KeyName=,SubnetId= --applications Name=Hive --bootstrap-actions Name=InstallSpark,Path=s3://support.elasticmapreduce/spark/install-spark,Args=[-x,-v,1.2.1.a-2.11]
—
Reply to this email directly or view it on GitHub
#44 (comment)
.
from emr-bootstrap-actions.
Why does the 2.11 installer comment out copying the hive libraries to the classpath? Doing this makes Spark use its builtin hive libraries which do not have the improvements of the EMR hive distribution
Also, I don't know if this is an issue with the standard 1.2.1.a build, but spark-assembly-1.2.1-hadoop2.4.0.jar cannot be used for pyspark unless it's repacked with Java 1.6
from emr-bootstrap-actions.
@Sazpaimon - there was a conflict that would break the 2.11 build. I have not yet returned to investigate further given the experimental status of Spark with Scala 2.11.
from emr-bootstrap-actions.
It seems that all of the other install scripts also comment out the hive libraries. For example https://s3.amazonaws.com/support.elasticmapreduce/spark/install-spark-script.py which is for stable versions, has the hive lib copy commented out. Doing so prevents me from reading my ORC tables created with Hive on EMR unless I copy the libraries (unless there's some spark-sql setting I'm missing)
from emr-bootstrap-actions.
+1
from emr-bootstrap-actions.
Just wanted to check on the status of supporting Scala 2.11...When I saw the examples on the project home page using --ami-version 3.6 (which, according to http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/ami-versions-supported.html, has Scala 2.11.1), I tried submitting a job built for 2.11 (Spark 1.3.0), but ran into version conflict issues. Is there a work around for this, in the meantime?
Thanks!
from emr-bootstrap-actions.
At this point Spark is built with Scala 2.10 and it is installed by the bootstrap as well.
The experimental 2.11 build can be requested with version 1.2.1.a-2.11. See a few comments earlier. At this point, Spark with 2.11 breaks too many items to be the default build.
from emr-bootstrap-actions.
Can we close this issue?
from emr-bootstrap-actions.
Closing issue. Spark on Scala 2.11 is experimental only given its experimental status with the Apache project itself. For those interested in Spark on Scala 2.11, please see https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/building-spark-for-emr.md as a starting point for custom build.
from emr-bootstrap-actions.
@Sazpaimon in regards to EMR Hive jars, see readme and pull request #93, added feature of "-h" install-script argument to elect to include EMR Hive jars in the Spark classpath.
from emr-bootstrap-actions.
@christopherbozeman hi, scala 2.11 is no longer experimental as of 1.3.0. Would make sense to add support for 2.11 now?
Happy to help. Thanks!
from emr-bootstrap-actions.
Do we have support for Scala 2.11 yet in EMR? It's already the default on the master branch.
from emr-bootstrap-actions.
+1 for scala 2.11 on spark in emr
from emr-bootstrap-actions.
Just chiming in to say that I also use custom bootstrap hacks to get EMR to support spark and scala 2.11 - now that it is the default in spark land it would be great to see it supported on EMR.
from emr-bootstrap-actions.
+1
from emr-bootstrap-actions.
+1
from emr-bootstrap-actions.
+1
from emr-bootstrap-actions.
+1
from emr-bootstrap-actions.
Related Issues (20)
- bootstrapping opentsdb using emr-4.6.0, HBASE_HOME issue HOT 1
- Support Scala 2.11 HOT 1
- zookeeper version is invalid HOT 1
- Installing latest version of Impala on EMR HOT 10
- Permission denied error AMI 3.11.0 HOT 1
- Bootstrap for Apache Kylin HOT 3
- is there any plan to create one BA for JCE? HOT 1
- Error downloading file from Amazon S3 HOT 4
- Kafka support on EMR 5.x HOT 2
- Support jupyter notebook HOT 1
- Reading LZO files from Spark stand alone program HOT 1
- Persto 0.157.1 in EMR is facing issues regarding client side encryption AWS KMS Master Key HOT 1
- running an s3 jar file with dependencies HOT 1
- Installing latest version of Impala on EMR HOT 1
- Bootstrap for Sentry HOT 1
- Add bootstrap script to install netdata HOT 1
- sudo R command not found, when using the emR_bootstrap.sh
- Error while reading core-site.xml in elasticsearch bootstrap action HOT 1
- EMR cluster fails at boot strap HOT 1
- Bootstrap has execute failed to my shell script file on S3 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emr-bootstrap-actions.