- Introduction
- Setup
- Data
- Notebooks
- Source Code
- Modelling with Prefect and MLflow
- Continuous Integration
- Assets
- Contributing
- License
This project is a part of the MLOps course at xHEC. The main goal of the project is to apply machine learning operations (MLOps) principles to a student dataset, ensuring that the model is reproducible, scalable, and maintainable.
To set up the environment, follow these steps:
cd xhec-mlops-project-student
conda env create -f environment.yml
conda activate xhec-mlops
pip install -r requirements.txt
cd src/web_service
uvicorn main:app --reload
docker build -t <image-name:tag> -f <dockerfile-name> .
docker run -p <host-port>:<container-port> <image-name:tag>
The dataset used in this project is the abalone.csv
file located in the data
directory. This dataset contains information about abalones, which are a type of marine mollusk. The dataset is used to predict the age of abalones based on various physical measurements.
The eda.ipynb
notebook located in the notebooks
directory contains exploratory data analysis of the abalone dataset. This notebook provides insights into the distribution of data, relationships between different variables, and other important aspects that can help in building a machine learning model.
The modelling.ipynb
notebook located in the notebooks
directory contains the machine learning model built for predicting the age of abalones. This notebook includes data preprocessing, model training, and evaluation steps.
The preprocessing.py
file located in the src/modelling
directory contains functions for preprocessing the abalone dataset.
The predicting.py
file located in the src/modelling
directory contains functions for making predictions using the trained machine learning model.
The utils.py
file located in the src/modelling
directory contains utility functions used in the modelling process.
The orchestration of the modelling process is handled using Prefect while tracking and logging is done via MLflow in a script named my_prefect.py
. This script orchestrates the loading of data, preprocessing, training the model, logging metrics to MLflow, and saving the model.
-
Start Prefect Server:
- Navigate to the
src/modelling
directory. - Run:
prefect server start --host 0.0.0.0
- Configure Prefect:
prefect config set PREFECT_API_URL=http://0.0.0.0:4200/api
- Navigate to the
-
Start MLflow UI (In a new terminal tab or window):
mlflow ui --host 0.0.0.0 --port 5002
-
Execute the Flow:
- With the Prefect server and MLflow UI running, execute your script:
python my_prefect.py
Visit the Prefect UI at http://0.0.0.0:4200
and MLflow UI at http://0.0.0.0:5002
to monitor the progress and examine the logged metrics and model.
The project uses GitHub Actions for continuous integration. The configuration file for continuous integration is located in the .github/workflows/ci.yaml
file.
The assets
directory contains images used in the project, such as PR_right.png
and PR_wrong.png
.
Madhura Nirale, Dikens Celaj, Steve Moses, Amjad Rehan Ibrahim, Zofia Smolen, Kaan Caylan