Local testing of PySpark query performance operations such as caching and partitioning.
The Home_Sales.ipynb
notebook will download housing data from an external S3 endpoint and save it in the data/
directory.
To make comparisons between partitioned and non-partitioned data fair, each has been saved locally prior to profiling.
No external code or other content has been used in this project, except where specifically provided in the assignment resources.