GithubHelp home page GithubHelp logo

skaftenicki / dtu_mlops Goto Github PK

View Code? Open in Web Editor NEW
555.0 8.0 428.0 427.97 MB

Exercises and supplementary material for the machine learning operations course at DTU.

Home Page: https://skaftenicki.github.io/dtu_mlops/

License: Apache License 2.0

Jupyter Notebook 53.40% Python 45.24% Shell 0.46% Dockerfile 0.71% Makefile 0.18%

dtu_mlops's Introduction

Machine Learning Operations

Repository for course 02476 at DTU.

Checkout the homepage!

ℹ️ Course information

  • Course responsible

  • 5 ECTS (European Credit Transfer System), corresponding to 140 hours of work

  • 3 week period in January

  • Master level course

  • Grade: Pass/not passed

  • Type of assessment: oral presentation + project report

  • Recommended prerequisites: DTU course 02456 (Deep Learning) or experience with the following topics:

    • General understanding of machine learning (datasets, probability, classifiers, overfitting etc.)
    • Basic knowledge of deep learning (backpropagation, convolutional neural networks, auto-encoders etc.)
    • Coding in PyTorch. The first day we provide some exercises in PyTorch to get everyone's skills up-to-date as fast as possible.

💻 Course setup

Start by cloning or downloading this repository

git clone https://github.com/SkafteNicki/dtu_mlops

If you do not have git installed (yet) we will touch upon it in the course. The folder will contain all the exercise material for this course and lectures. Additionally, you should join our Slack channel which we use for communication. The link may be expired, write to me.

📂 Course organization

We highly recommend that when going through the material you use the homepage which is the corresponding GitHub Pages version of this repository that is more nicely rendered, and also includes some special HTML magic provided by Material for MkDocs.

The course is divided into sessions, denoted by capital S, and modules, denoted by capital M. A session corresponds to a full day of work if you are following the course, meaning approximately 9 hours of work. Each session (S) corresponds to a topic within MLOps and consists of multiple modules (M) that each cover a specific topic.

Importantly we differ between core modules and optional modules. Core modules will be marked by

!!! info "Core Module"

at the top of their corresponding page. Core modules are important to go through to be able to pass the course. You are highly recommended to still do the optional modules.

Additionally, be aware of the following icons throughout the course material:

  • This icon can be expanded to show code belonging to a given exercise

    ??? example

      I will contain some code for an exercise.
    
  • This icon can be expanded to show a solution for a given exercise

    ??? success "Solution"

      I will present a solution to the exercise.
    
  • This icon (1) can be expanded to show a hint or a note for a given exercise { .annotate }

    1. :man_raising_hand: I am a hint or note

🆒 MLOps: What is it?

Machine Learning Operations (MLOps) is a rather new field that has seen its uprise as machine learning and particularly deep learning has become a widely available technology. The term itself is a compound of "machine learning" and "operations" and covers everything that has to do with the management of the production ML lifecycle.

The lifecycle of production ML can largely be divided into three phases:

  1. Design: The initial phase starts with an investigation of the problem. Based on this analysis, several requirements can be prioritized for what we want our future model to do. Since machine learning requires data to be trained, we also investigate in this step what data we have and if we need to source it in some other way.

  2. Model development: Based on the design phase we can begin to conjure some machine learning algorithms to solve our problems. As always, the initial step often involves doing some data analysis to make sure that our model is learning the signal that we want it to learn. Secondly, is the machine learning engineering phase, where the particular model architecture is chosen. Finally, we also need to do validation and testing to make sure that our model is generalizing well.

  3. Operations: Based on the model development phase, we now have a model that we want to use. The operations are where create an automatic pipeline that makes sure that whenever we make changes to our codebase they get automatically incorporated into our model, such that we do not slow down production. Equally important is the ongoing monitoring of already deployed models to make sure that they behave exactly as we specified them.

It is important to note that the three steps are a cycle, meaning that when you have successfully deployed a machine learning model that is not the end of it. Your initial requirements may change, forcing you to revisit the design phase. Some new algorithms may show promising results, so you revisit the model development phase to implement this. Finally, you may try to cut the cost of running your model in production, making you revisit the operations phase, and trying to optimize some steps.

The focus in this course is particularly on the Operations part of MLOps as this is what many data scientists are missing in their toolbox to take all the knowledge they have about data processing and model development into a production setting.

❔ Learning objectives

General course objective

Introduce the student to a number of coding practices that will help them organization, scale, monitor and deploy machine learning models either in a research or production setting. To provide hands-on experience with a number of frameworks, both local and in the cloud, for doing large scale machine learning models.

This includes:

  • Organize code in an efficient way for easy maintainability and shareability
  • Understand the importance of reproducibility and how to create reproducible containerized applications and experiments
  • Cable of using version control to efficiently collaborate on code development
  • Knowledge of continuous integration (CI) and continuous machine learning (CML) for automating code development
  • Being able to debug, profile, visualize and monitor multiple experiments to assess model performance
  • Cable of using online cloud-based computing services to scale experiments
  • Demonstrate knowledge about different distributed training paradigms within machine learning and how to apply them
  • Deploy machine learning models, both locally and in the cloud
  • Conduct a research project in collaboration with fellow students using the frameworks taught in the course
  • Have lots of fun and share memes! :)

📓 References

Additional reading resources (in no particular order):

  • Ref 1 Introduction blog post for those who have never heard about MLOps and want to get an overview.

  • Ref 2 Great document from Google about the different levels of MLOps.

  • Ref 3 Another introduction to the principles of MLOps and the different stages of MLOps.

  • Ref 4 Great paper about the technical dept in machine learning.

  • Ref 5 Interview study that uncovers many of the pain points that ML engineers go through when doing MLOps.

Other courses with content similar to this:

  • Made with ML. Great online MLOps course that also covers additional topics on the foundations of working with ML.

  • Full stack deep learning. Another MLOps online course going through the whole developer pipeline.

  • MLOps Zoomcamp. MLOps online course that includes many of the same topics.

👨‍🏫 Contributing

If you want to contribute to the course, we are happy to have you! Anything from fixing typos to adding new content is welcome. For building the course material locally, it is a simple two-step process:

pip install -r requirements.txt
mkdocs serve

Which will start a local server that you can access at http://127.0.0.1:8000 and will automatically update when you make changes to the course material. When you have something that you want to contribute, please make a pull request.

❕ License

I highly value open source, and the content of this course is therefore free to use under the Apache 2.0 license. If you use parts of this course in your work, please cite using:

@misc{skafte_mlops,
    author       = {Nicki Skafte Detlefsen},
    title        = {Machine Learning Operations},
    howpublished = {\url{https://github.com/SkafteNicki/dtu_mlops}},
    year         = {2024}
}

dtu_mlops's People

Contributors

1p0d avatar akua21 avatar albertebaht avatar albertkjoller avatar anampavicic avatar andreasaspe avatar antonrydahl avatar christianhinge avatar dependabot[bot] avatar dhsvendsen avatar fredemandensgit avatar jaschn avatar javalborz avatar jereml99 avatar kasiaotko avatar laurinedargaud avatar marcosquilla avatar michaelfeil avatar olinestaerke avatar peterampazzo avatar rasgaard avatar rasmuswael avatar rohboz avatar shinteki avatar skaftenicki avatar sorenhauberg avatar stefpetro avatar whitesheep18 avatar yaxin9luo avatar yecanlee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dtu_mlops's Issues

Answer Error Issue

Hi, I think I briefly tested the answers you added three days ago and made a pull request to fix the error, I think today my friend was doing the homework and she asked me why the "official answer" is not working, would you please check my pull request or just modified the tiny errors inside your code? Thank you so much for this amazing open source course:)!

Accidently D-dos attack on Github

When running the commands in the beginning of module 22 I got following error message:

{"message":"API rate limit exceeded for 192.38.81.6. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)","documentation_url":"https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting"}

I guess to many people calling the API from the same IP without Authentication is tinkering some safety features.

Mention pytorch-lightning's LightningCLI

Given what is currently in the course, it might be worth considering to mention pytorch-lightning's LightningCLI. Already pytorch-lightning is in M14 - Minimizing boilerplate. But LightningCLI could be a good complement to what is in other sections. The main goals that LightningCLI addresses are:

  • Automatically save the full config (reproducibility)
  • Separation of code from config (good practice)
  • Make things automatically configurable (minimize boilerplate)

When the code is well written, i.e. parameters with type hints, descriptions in docstrings, defined where they are used (M7 - Good coding practice), all of these parameters automatically become configurable M10 - Config files. There is no need to learn how to use another framework like Hydra.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.