GithubHelp home page GithubHelp logo

lcardno10 / data-engineering-test-python-pyspark Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jsainsburyplc/aspire-data-test-python

0.0 0.0 1.0 14 KB

Standard tech test for Data Engineers

License: MIT License

Python 100.00%

data-engineering-test-python-pyspark's Introduction

Data Test - Starter Project

Prerequisites

Python (3.8.* or later)

You can install python either from source or with pyenv.

Check you have python installed:

python --version

Preferably an IDE such as Visual Studio Code

https://code.visualstudio.com/

Dependencies and data

Creating a virtual environment

Ensure your pip (package manager) is up to date:

pip install --upgrade pip

To check your pip version run:

pip --version

Create the virtual environment in the root of the cloned project:

python -m venv .venv

Activating the newly created virtual environment

You always want your virtual environment to be active when working on this project.

source ./.venv/bin/activate

Installing Python requirements

This will install some of the packages you might find useful:

pip install -r requirements.txt

Running tests to ensure everything is working correctly

pytest

Generating the data

A data generator is included as part of the project in ./input_data_generator/main_data_generator.py This allows you to generate a configurable number of months of data.

To run the data generator use:

python ./input_data_generator/main_data_generator.py

This should produce customers, products and transaction data under ./input_data/starter

Getting started

The skeleton of a possible solution is provided in ./solution/solution_start.py You do not have to use this code if you want to approach the problem in a different way.

data-engineering-test-python-pyspark's People

Contributors

gdfsquiq avatar joshi-hiren avatar oscar-barlow avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.