This contains data sets which are copied from Cloudera Quick Start VM.
Here are the instructions to setup this repository.
- Clone the repository
git clone https://github.com/dgadiraju/retail_db.git
- It will create folder called as retail_db.
- Folder contains 6 sub folders
- customers
- departments
- categories
- products
- orders
- order_items
- Files are of type text file. Records are delimited by new line character and fields with in each record are delimited by comma.
- These tables have sample data to create the scenarios.
You can also create tables with all relationships and load the data into all the tables by using
create_db.sql
.
You can sign up for our courses to learn about Spark, kafka and other important technologies by clicking here.