GithubHelp home page GithubHelp logo

sandy4321 / -scalable-data-analysis-in-python-with-dask Goto Github PK

View Code? Open in Web Editor NEW

This project forked from packtpublishing/-scalable-data-analysis-in-python-with-dask

0.0 0.0 0.0 6.86 MB

Scalable Data Analysis in Python with Dask, by Packt publishing

License: MIT License

Jupyter Notebook 100.00%

-scalable-data-analysis-in-python-with-dask's Introduction

Scalable Data Analysis in Python with Dask [Video]

This is the code repository for Scalable Data Analysis in Python with Dask [Video], published by Packt. It contains all the supporting project files necessary to work through the video course from start to finish.

About the Video Course

Data analysts, Machine Learning professionals, and data scientists often use tools such as Pandas, Scikit-Learn, and NumPy for data analysis on their personal computer. However, when they want to apply their analyses to larger datasets, these tools fail to scale beyond a single machine, and so the analyst is forced to rewrite their computation. If you work on big data and you’re using Pandas, you know you can end up waiting up to a whole minute for a simple average of a series. And that’s just for a couple of million rows! In this course, you’ll learn to scale your data analysis and execute distributed data science projects right from data ingestion to data manipulation and visualization using Dask. You’ll explore the Dask framework and see how Dask can be used with other common Python tools such as NumPy, Pandas, matplotlib, Scikit-learn, and more. You’ll be working on large datasets and performing exploratory data analysis to investigate the dataset, then come up with the findings from the dataset. You’ll learn by implementing data analysis principles using different statistical techniques in one go across different systems on the same massive datasets. Throughout the course, we’ll go over the various techniques, modules, and features that Dask has to offer. Finally, you’ll learn to use its unique offering for machine learning, using the Dask-ML package. You’ll also start using parallel processing in your data tasks on your own system without moving to the distributed environment.

What You Will Learn

  • Understand the concept of Block algorithms and how Dask leverages it to load large data.
  • Implement various example using Dask Arrays, Bags, and Dask Data frames for efficient parallel computing
  • Combine Dask with existing Python packages such as NumPy and Pandas
  • See how Dask works under the hood and the various in-built algorithms it has to offer
  • Leverage the power of Dask in a distributed setting and explore its various schedulers
  • Implement an end-to-end Machine Learning pipeline in a distributed setting using Dask and scikit-learn
  • Use Dask Arrays, Bags, and Dask Data frames for parallel and out-of-memory computations

Instructions and Navigation

Assumed Knowledge

To fully benefit from the coverage included in this course, you will need:
This course is for data scientists, Machine Learning engineers, and data engineers who want to perform predictive analytics and data science tasks at scale. Working knowledge of Python coding and familiarity with Python libraries would be beneficial.

Technical Requirements

This course has the following software requirements:

  • Python 3.6 version
  • Jupyter Notebook
  • Ipython Package
  • NumPy
  • SciPy
  • Numba
  • Dask
  • Scikit-learn
  • Any web browser for running Jupyter notebook
  • Basic understanding of programming concepts like loops, conditional statements, etc.
  • Familiar with Python Syntax.
  • Related Products

    -scalable-data-analysis-in-python-with-dask's People

    Contributors

    mohdkashif93 avatar packt-itservice avatar sanjeetkumar13 avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.