GithubHelp home page GithubHelp logo

practical-data-scientist's Introduction

πŸŽ“ The Practical Data Scientist

nbviewer Binder colab

πŸ€— Welcome!

This is a course which teaches you how to turn raw data into useful insights.

This course aims wide. In 6 weeks, we will cover all topics required to join a Data Science team. This includes data munging, data exploration, and machine learning, using python frameworks like numpy, pandas, matplotlib, and sklearn. (for a full list check the syllabus and the learning goals section).

This course is lean. You will learn just enough to analyse datasets from scratch. However Data Science is a vast subject, so additional resources are provided for deeper dives into any of its subfields.

This course is pragmatic. All lectures consist of slides explaining key theoretical concepts, followed by a hands-on python notebook with coding exercises.

βœ… What this course will do:

  • teach you all the theory and skills required to load, manipulate, and analyse structured and unstructured datasets
  • give you hands-on experience training and evaluating Machine Learning models
  • make you employable in the Data Science industry
  • turn you into a jack of all trades
  • guide you to become a master of few
  • share some low quality AI memes

❌ What this course won't do:

  • give you 10 years of experience in ML
  • turn you into a Deep Learning wizard
  • make you publish a paper at NeurIPS
  • build Skynet

πŸ€” Why this course?

There are excellent online degrees that focus on ML theory, and great practical tutorials that cover frameworks. The Practical Data Scientist blends both in a guided package to bootstrap your Data Science career. This course's mission is to enable all coders to get out there and analyse the world's problems one dataset at a time.

πŸ‘½ Who is this for?

This course is perfect for the beginner coder looking to start a career in Data Science, the software engineer curious about Machine Learning technologies, or the AI enthusiast searching for more practical experience.

Beginner programming skills are required. A little python experience (can you define a function?), and some statistical basics (what's a standard deviation?) are recommended.

πŸ§‘β€πŸ« Who teaches this?

Teacher: Amric Trudel has a Science degree from McGill University (MontrΓ©al) and studied Computer Science at 42 Paris. He has been the president of 42 Artificial Intelligence, an association in which he developed training programs on Machine Learning for his fellow students. Amric has also given Machine Learning initiation workshops to large company executives. He now works at OCTO Technology as a Data Science consultant and industrializes data-oriented projects for clients like InVivo, TotalEnergies, and Accenture.

πŸš€ How should I use this course?

The Practical Data Scientist is taught as a live online course on Jungle Program. Join the next micro-class and find more details here.

The course content is also open-source and free to use. Here's a few ideas:

  • For each lecture, read the slides, then go through the notebooks. Complete the πŸ’ͺ and 🧠 exercises, then flip through the additional resources.
  • Skip straight to a particular section/lecture if you have already taken 50 billion ML courses.
  • Forget about the slides, find the notebook for that method you can't remember how to use, and copy paste to your heart's content.
  • Test yourself with the assignments. Become the nerdiest Pokemon trainer there ever was.

Notebooks can be viewed in github, viewed in nbviewer, run locally with jupyter, run with mybinder, or run with google colab.

Learning Goals

Data scientists can tame all types of data and reveal their secrets. This course takes us through all the python tools needed to turn raw data into useful insights.

  • Data Munging
    Students can manipulate and visualise tabular, time series, image, text, and geospatial datasets
  • Unsupervised Learning
    Students can use clustering, dimensionality reduction, and anomaly detection methods
  • Supervised Learning
    Students can train regression and classification models
  • Evaluation and Optimisation
    Students can build accurate Machine Learning models
  • Exploratory Data Analysis
    Students can analyse a public dataset from scratch

Syllabus

Prepwork

Week 1: Data Munging

The nitty-gritty of data analysis: this chapter teaches you how to load, clean, and manipulate basic datasets in python.

Week 2: Data Exploration

Beyond tables: learn how to extract and communicate key insights from time-series, text, and image datasets.

Week 3: Unsupervised Learning

Analytical power-up: this chapter adds clustering, dimensionality reduction, and anomaly detection to your data analysis techniques.

Week 4: Supervised Learning

Moving past data summaries: this chapter introduces predictive models to solve fundamental machine learning tasks.

###Β Week 5: Advanced Supervised Learning

Taming the beast: this chapter shows how to build accurate and effective machine learning models.

Week 6: Non-Linear Models

Week 7: Final Project

This chapter tests your skills on a public dataset by completing an exploratory data analysis report, and training at least one Machine Learning model.
recording

Week 8: Final Presentation

On the last week of this course, you will finalize your project, submit your notebook and present your results in front of your peers in a 10-min oral presentation.
recording

Assignments

Assignments

The coursework is split between five small assignments, a final project, and a final presentation.

The small assignments serve to synthesise the previous course content on your own, and put it to practice. They are all coding exercises: you will be given resources and/or code stubs, and will submit runnable code and some observations.

The final project tests everything that you have learnt from this course. This is a python notebook report like those data scientists make to share their experimental progress. It tests your ability to design, carry out, and communicate machine learning experiments. This is complemented by the final presentation, a 10mn talk to synthesize, and discuss the results.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License. See the LICENSE.txt file for details.

practical-data-scientist's People

Contributors

camille-vanhoffelen avatar atrudel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.