GithubHelp home page GithubHelp logo

ynwa-algo / ibm-ml-ai-cert-project1 Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 23.18 MB

IBM ML/AI Certification Projects Repo - EDA, Supervised ML

Jupyter Notebook 100.00%
exploratory-data-analysis feature-engineering feature-selection machinelearning supervised-classification-methods supervised-machine-learning

ibm-ml-ai-cert-project1's Introduction

IBM-ML-AI-CERT-PROJECT - EDA, and Supervised Learning:

Repository for my IBM AI/ML project based certification course.

The colab notes reflect strutured approach to ML/AI problem solving to serve as my general workflow template:

  1. After problem is defined and understood - Plan data collection and retrieve the data. Colab files with phrases "Data Retrieval and Data Pull" in that sequence are the structured approach (template) to follow for data pull best practises
  2. The colab notes/file with phrase "Data_Cleaning" is good template/workflow for cleaning data for ML models
  3. The colab notes/file with phrase "EDA" in file name is a good template to guide EDA (exploratory data analysis). Specific exploratory techniques worthy of prioritizing will depend on the specific work and use case
  4. The colab notes/file with phrase "feature engineering" in file name is a guide to feature engineering

*Python File Structure Per Workflow

A Files: Data Collection, Cleaning*

A1 -IBM_ML_AI_DataRetrieval_SQL_WK2.1

A2 -IBM_ML_AI_WK2_2_Data_Cleaning_Lab

A3 -ibm_ml_ai_datapullsqlwk2_2

A4 -Wk2Data_Cleaning_Lab2

B Files: Exploratory Data Analysis (EDA)*

B1 -ML_EDA_FLOW_Wk3_1c_EDA B2 -Week3_1c_EDA

C Files: Feature Engineering and other Data Preprocessing

C1 -Wk31d_Feature_Engineering C2 -Wk3_Feature_Engineering2_PCA

*D Files: Feature Engineering / Hypothesis Testing (Preproc part 2)

D1 -Wk41e_Hypothesis_Testing D2 -Wk41f_HypothesisTesting_2

E Files: Regression - Train/Test Split (simple linear regression) & Polynomial Regression

Data files for section E-> car price.CSV file for simple linear regression; Ames_housing data for the polynomial regression; encoded_car_data_PolyFeat.csv

E1 - L2_wk1_linear_regression

E2 - 02bL2_Wk2_LAB_Regression_Train_Test_Split

E3 - 02cL2_Wk2_Polynomial_Regression

Standardization, chaining steps using pipelines, transformations etc.

F Files: Cross Validation; Grid_CV and Regularization (using Ridge, Lasso, E-Net) Data files for section E - encoded car data, see not books. F3 and F4 ->This workflow spends time on using the pipeline approach to chain ML steps and also using Gridsearch for hyperparameter selection by performing hyperparameter selection on a model using validation data and finally showing the impact of PCA (principal component analysis). F4 introduces PCA (principal component analysis in reducing dimensionality)

F1-02cL2_Wk3_DEMO_Cross_Validation
F2- 02dL2_Wk4_DEMO_Regularization
F3-02eL2_WK5_LAB_Regularization_jupyterlite.ipynb
F4-02eeL2_WK5_Regularization_Techniques.ipynb


G - Section is ML - Classification: + Advanced Model Types/Reg Techniques: Bagging,Boosting and othe Ensemble methods

Logistic Reg error metrics- precision, recall, confusion matrix etc. KNN approach to class, support vector machines
Bagging, Gradient Boosting, XG Boost,Stacking.

G1 - 03aL3_Wk1_Logistic_Regression_Error_Metrics.ipynb
G2 - 03bL3.Wk2_KNN1.ipynb
G3 - 03cL3_Wk3_SVM.ipynb
G4 - 03dL3_Wk3_SVM_RBF.ipynb
G5 - 03dL3_WK3_Decision_Trees.ipynb
G6 - 03e_L3_Wk5_Bagging.ipynb

G7 - 03gL3_Wk5_GradientBoosting_and_Stacking.ipynb

G8 - 3gL3_Wk3_Ada_Boost.ipynb

G9 - 3fL3_WK5_Stacking_Classification.ipynb

G10 -3g_L3_Wk5_XGBoost.ipynb


H - Dealing with imbalanced datasets typical patterns of imbalanced data challenges

  • Class Re-weighting method to adjust the impacts of different classes in model training processes
  • Oversampling and Undersampling to generate synthetic datasets and rebalance classes
  • Evaluate consolidated classifiers using robust metrics such as F-score and AUC

H1 - 3hL3Wk6__imbalanced_data.ipynb

ibm-ml-ai-cert-project1's People

Contributors

ynwa-algo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.