- Update the manager and worker to your nodes' IP.
- Configure your 3-node cluster so that they can login without using password.
- Git clone the repository at the
root
directory - Run ./install.sh
- Run ./start.sh
- Run other examples
- If you have
node-0 not recognized
error, please replace your thenode-0
, in the code, to your manager's IP address - The
spark-shell
is located at/usr/local/spark/bin/spark-shell
- rdd-distinct: re-order the data
- rdd-filter: apply filter to the data
- rdd-flatmap: flatten the map data
- rdd-kv: the key / value tests
- rdd-map: map tests
- rdd-maxmin: maximum and minimum test
- rdd-union: union test
- text-search: search on the text
- file: file operations
- list: list operations
- max-count: count the maximum number of words a line
- sort: sort on data
- words: word with maximum occurrences