GithubHelp home page GithubHelp logo

decodingml / llm-twin-course Goto Github PK

View Code? Open in Web Editor NEW
660.0 39.0 132.0 6.91 MB

๐Ÿค– ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป for ๐—ณ๐—ฟ๐—ฒ๐—ฒ how to ๐—ฏ๐˜‚๐—ถ๐—น๐—ฑ an end-to-end ๐—ฝ๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐˜๐—ถ๐—ผ๐—ป-๐—ฟ๐—ฒ๐—ฎ๐—ฑ๐˜† ๐—Ÿ๐—Ÿ๐—  & ๐—ฅ๐—”๐—š ๐˜€๐˜†๐˜€๐˜๐—ฒ๐—บ using ๐—Ÿ๐—Ÿ๐— ๐—ข๐—ฝ๐˜€ best practices: ~ ๐˜ด๐˜ฐ๐˜ถ๐˜ณ๐˜ค๐˜ฆ ๐˜ค๐˜ฐ๐˜ฅ๐˜ฆ + 11 ๐˜ฉ๐˜ข๐˜ฏ๐˜ฅ๐˜ด-๐˜ฐ๐˜ฏ ๐˜ญ๐˜ฆ๐˜ด๐˜ด๐˜ฐ๐˜ฏ๐˜ด

License: MIT License

Python 79.41% TypeScript 15.09% Dockerfile 1.90% Makefile 2.92% Cadence 0.67%
aws bytewax comet-ml generative-ai large-language-models machine-learning-engineering ml-system-design mlops qdrant qwak

llm-twin-course's Introduction

LLM Twin Course: Building Your Production-Ready AI Replica

An End-to-End Framework for Production-Ready LLM Systems by Building Your LLM Twin

From data gathering to productionizing LLMs using LLMOps good practices.

by Paul Iusztin, Alexandru Vesa and Alexandru Razvant

Your image description


Why is this course different?

By finishing the "LLM Twin: Building Your Production-Ready AI Replica" free course, you will learn how to design, train, and deploy a production-ready LLM twin of yourself powered by LLMs, vector DBs, and LLMOps good practices.

Why should you care? ๐Ÿซต

โ†’ No more isolated scripts or Notebooks! Learn production ML by building and deploying an end-to-end production-grade LLM system.

What will you learn to build by the end of thisย course?

You will learn how to architect and build a real-world LLM system from start to finishโ€Š-โ€Šfrom data collection to deployment.

You will also learn to leverage MLOps best practices, such as experiment trackers, model registries, prompt monitoring, and versioning.

The end goal? Build and deploy your own LLM twin.

What is an LLM Twin? It is an AI character that learns to write like somebody by incorporating its style and personality into an LLM.

The architecture of the LLM twin is split into 4 Python microservices:

The data collection pipeline

  • Crawl your digital data from various social media platforms.
  • Clean, normalize and load the data to a Mongo NoSQL DB through a series of ETL pipelines.
  • Send database changes to a RabbitMQ queue using the CDC pattern.
  • โ˜๏ธ Deployed on AWS.

The feature pipeline

  • Consume messages from a queue through a Bytewax streaming pipeline.
  • Every message will be cleaned, chunked, embedded (using Superlinked, and loaded into a Qdrant vector DB in real-time.
  • โ˜๏ธ Deployed on AWS.

The training pipeline

  • Create a custom dataset based on your digital data.
  • Fine-tune an LLM using QLoRA.
  • Use Comet ML's experiment tracker to monitor the experiments.
  • Evaluate and save the best model to Comet's model registry.
  • โ˜๏ธ Deployed on Qwak.

The inference pipeline

  • Load and quantize the fine-tuned LLM from Comet's model registry.
  • Deploy it as a REST API.
  • Enhance the prompts using RAG.
  • Generate content using your LLM twin.
  • Monitor the LLM using Comet's prompt monitoring dashboard.
  • โ˜๏ธ Deployed on Qwak.

Your image description


Along the 4 microservices, you will learn to integrate 3 serverless tools:

Who is thisย for?

Audience: MLE, DE, DS, or SWE who want to learn to engineer production-ready LLM systems using LLMOps good principles.

Level: intermediate

Prerequisites: basic knowledge of Python, ML, and the cloud

How will youย learn?

The course contains 11 hands-on written lessons and the open-source code you can access on GitHub.

You can read everything and try out the code at your own pace.ย 

Costs?

The articles and code are completely free. They will always remain free.

But if you plan to run the code while reading it, you have to know that we use several cloud tools that might generate additional costs.

The cloud computing platforms (AWS, Qwak) have a pay-as-you-go pricing plan. Qwak offers a few hours of free computing. Thus, we did our best to keep costs to a minimum.

For the other serverless tools (Qdrant, Comet), we will stick to their freemium version, which is free of charge.

Lessons

Important

The course is a work in progress. We plan to release a new lesson every 2 weeks.

To understand the entire code step-by-step, check out our articles โ†“

The course is split into 11 lessons. Every Medium article will be its own lesson.

System Design

  1. An End-to-End Framework for Production-Ready LLM Systems by Building Your LLM Twin

Data Engineering: Gather & store the data for your LLM twin

  1. The Importance of Data Pipelines in the Era of Generative AI
  2. Change Data Capture: Enabling Event-Driven Architectures

Feature Pipeline: prepare data for LLM fine-tuning & RAG

  1. SOTA Python Streaming Pipelines for Fine-tuning LLMs and RAG โ€” in Real-Time!
  2. The 4 Advanced RAG Algorithms You Must Know to Implement

Training Pipeline: fine-tune your LLM twin

  1. Training data preparation [Module 3]ย โ€ฆWIP
  2. Fine-tuning LLM [Module 3]ย โ€ฆWIP
  3. LLM evaluation [Module 4]ย โ€ฆWIP
  4. Quantization [Module 5]ย โ€ฆWIPย 

Inference Pipeline: serve your LLM twin

  1. Build the digital twin inference pipeline [Module 6]ย โ€ฆWIP
  2. Deploy the digital twin as a REST API [Module 6]ย โ€ฆWIP

Meet your teachers!

The course is created under the Decoding ML umbrella by:

Paul Iusztin
Senior ML & MLOps Engineer
Alexandru Vesa
Senior AI Engineer
Rฤƒzvanศ› Alexandru
Senior ML Engineer

License

This course is an open-source project released under the MIT license. Thus, as long you distribute our LICENSE and acknowledge our work, you can safely clone or fork this project and use it as a source of inspiration for whatever you want (e.g., university projects, college degree projects, personal projects, etc.).

๐Ÿ† Contribution

A big "Thank you ๐Ÿ™" to all our contributors! This course is possible only because of their efforts.

llm-twin-course's People

Contributors

alexandruvesa avatar iusztinpaul avatar rsergiuistoc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llm-twin-course's Issues

Getting error when running "make local-test-cdc" in module-2

Screenshot 2024-05-06 at 06 38 59 I'm using mac M1 (macOS Sonoma), I run "make local-start". After that, when I run "make local-test-cdc" I got: " An error occurred: mongo1:30001: [Errno 8] nodename nor servname provided, or not known (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms),mongo3:30003: [Errno 8] nodename nor servname provided, or not known (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms),mongo2:30002: [Errno 8] nodename nor servname provided, or not known (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: , , ]>".

Can anyone help me fix this error?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.