This is a virtual machine sandbox image to practice and learn Big Data and Data Science applications.
Running Big Data applications (Spark / Cassandra / Hadoop) can be a little convoluted because of all the dependencies. This can be even more of a hassle in Windows. We hope this VM Sandbox will make things easier.
Elephant Scale teaches Big Data & Data Science classes. This sandbox is a replica of our virtualized environment.
Checkout our training classes in Big Data and Data Science
Currently OVA based virtual machine image is available. Docker images coming 'soon'.
Note : These are LARGE downloads (~10G in size). Download when you have good bandwidth.
- Latest version : V3
- Release date : 2017-05-02
- Download link
- For older versions see changelog
- You need a virtual machine 'player'. Any of these would work:
- Download the latest sandbox image
- Double click on the 'OVA' file open it.
Login : student
password : bigdata123
See intro lab for a screencast.
Connectivity:
- Use VM GUI : when you open this OVA file in a VM environment you will be logged into the Ubuntu desktop
- SSH via port 22
- from host machine
$ ssh -l student -p 2222 localhost
This VM is tested with following Big Data stack.
- Spark v1.6 and Spark v2.x
- Cassandra v3.x
- Kafka v0.10
- Storm v1.x
- Zookeeper v3.4.8
If you are enrolled in our classes, you will get a lab bundle. Also you can run any open source labs as well.
Checkout our Sandbox channel for more videos.
- Based on Ubuntu 16.04 LTS
- Most software is in /usr/local/apps (also ~/apps)
- Dev environment : Java / Scala
- Dev environment : Python
- Python 3.6
- Anaconda v4.3.1
- Editors :
- IDEs
- Eclipse Neon - ~/apps/eclipse/java-neon/eclipse/eclipse
- IntelliJ Community Edition - ~/apps/idea/bin/idea.sh
- Big Data applications:
Binaries are downloaded in ~/files folder (same as ~/Downloads)
See version history in changelog
We welcome your feedback about the sandbox.
- send an email to [email protected]
- or open a issue at the Github page