GithubHelp home page GithubHelp logo

golamrashed / azure-dp-100-data-scientist-study-guide Goto Github PK

View Code? Open in Web Editor NEW

This project forked from igponce/azure-dp-100-data-scientist-study-guide

0.0 0.0 0.0 253 KB

Azure DP-100 Data Scientist Study Guide

azure-dp-100-data-scientist-study-guide's Introduction

Azure DP-100 Exam Data Scientist Study Guide

Microsoft Certified: Azure Data Scientist Associate study guide (unofficial)

Link to certification: Azure DS-100

How to build

This little book is written in Markdown using MDBook. If you want to render the book, you should install Rust first and then install MDBook from cargo:

cargo install mdbook

Once you have MDBook installed, you can compile the book just cd'ing to the root folder of this repository and executing:

mdbook build
2019-03-06 22:53:13 [INFO] (mdbook::book): Book building has started
2019-03-06 22:53:13 [INFO] (mdbook::book): Running the html backend

The compiled book is located in the ๐Ÿ“ book folder.

Book contents

Certification Objectives:

Define and prepare de development environment

  • Select development environment May include but is not limited to: Assess the deployment environment constraints, analyze and recommend tools that meet system requirements, select the development environment

  • Set up development environment May include but is not limited to: Create an Azure data science environment, configure data science work environments Azure Data Science Virtual Machines

  • Quantify the business problem May include but is not limited to: Define technical success metrics, quantify risks

Prepare data

Transform data into usable datasets

  • develop data structures
  • design a data sampling strategy
  • design the data preparation flow

Perform Exploratory Data Analysis (EDA)

  • Review visual analytics data to discover patterns and determine next steps identify anomalies, outliers, and other data inconsistencies
  • Create descriptive statistics for a dataset

Cleanse and transform data

  • Resolve anomalies, outliers, and other data inconsistencies.
  • Standardize data formats
  • Set the granularity for data

Perform feature Egineering

Perform feature extraction

โป Perform feature extraction algorithms on numerical data

  • Perform feature extraction algorithms on non-numerical data
  • Scale features

Perform feature selection

  • Define the optimality criteria
  • Apply feature selection algorithms

Develop Models

  • Select an algorithmic approach

    • Determine appropriate performance metrics
    • Implement appropriate algorithms
    • Consider data preparation steps that are specific to the selected algorithms
  • Split datasets

    • Determine ideal split based on the nature of the data
    • Determine number of splits
    • Determine relative size of splits
    • Ensure splits are balanced
  • Identify data imbalances

    • Resample a dataset to impose balance
    • Adjust performance metric to resolve imbalances
    • Implement penalization
  • Train the model

    • Select early stopping criteria
    • Tune hyper-parameters
  • Evaluate model performance

    • Score models against evaluation metrics
    • Implement cross-validation
    • Identify and address overfitting
    • Identify root cause of performance results

azure-dp-100-data-scientist-study-guide's People

Contributors

igponce avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.