GithubHelp home page GithubHelp logo

vvagias / ailibrary Goto Github PK

View Code? Open in Web Editor NEW

This project forked from accessibleai/ailibrary

0.0 0.0 0.0 52.5 MB

License: Apache License 2.0

Python 99.20% Shell 0.08% Jupyter Notebook 0.35% CSS 0.37%

ailibrary's Introduction

The cnvrg AI Library

The AI Library is an ML package manager.
Build, share and reuse machine learning components that are packaged with code, docker containers and compute engine types.
Quickly execute across any kind of infrastructure and automatically track hyper-parameters, metrics, utilization, and more.

Content

  1. 'sk' directories: Those include the AI libraries for machine learning which are backed by scikit learn. Scikit learn is a free software machine learning module in Python language. It features various classification, regression and clustering algorithms. In the cnvrg library we use mostly classification, regression and clustering algorithms.

  2. 'tf2' directories: Those include the AI libraries for deep learning. The purpose of those libraries is not to build or create models, but rather to leverage prepared model structures.
    Currently, the library includes commonly known successful models structures like ResNet50 and vgg16, and they are imported with ImageNet weights, so when used they perform transfer learning.

  3. 'pr' directories: Those include libraries which enables pre-processing of various types of files.

  4. Directories begins with '_': Those are not for user's use, but for internal testing at cnvrg.io and version control.

Metrics

There are some metrics which are created in the libraries.
The metrics are changed between families of algorithm (e.g: classification and regression algorithms has different metrics).

Classification Metrics

model - The name of the output model file.
folds - The number of folds used in the cross validation (if demanded by the user).
train_acc - float in (0, 1).
train_loss - calculated by default by mean_squared_error.
test_acc - float in (0, 1).
test_loss - calculated by default by mean_squared_error.
Example:

Regression Metrics

model - The name of the output model file.
folds - The number of folds used in the cross validation (if demanded by the user).
train_loss_MSE - The train error evaluated by mean squared error.
train_loss_MAE - The train error evaluated by mean absolute error.
train_loss_R2 - The train error evaluated by R2.
test_loss_MSE - The test error evaluated by mean squared error.
test_loss_MAE - The test error evaluated by mean absolute error.
test_loss_R2 - The test error evaluated by R2.
Example:

Deep Learning Metrics

SOON

Clustering Metrics

SOON
Example:

Visualizations

There are several types of visualizations which are built-in the libraries and are plotted automatically.
Not all the libraries plot all the possible visualizations due to differences in the the algorithms.

Classification Report

Classification Report is a textual report shows some main classification metrics.
It is important to notice that classification report is relevant only for classification algorithms.
The metrics which it shows are: precision, recall, f1-score and support.

  • Precision -

  • Recall -

  • F1-score -

  • Support -

Examples:

  1. Iris data set (3 labels):
    first_class_rep

  2. 2-labels data set:
    second_class_rep

Confusion Matrix

Confusion Matrix is a table which evaluates accuracy by computing the confusion matrix with each row corresponding to the true class.
Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa).
The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).
For instance, lets say we have a data set with 4 different labels: a, b, c, d.
So, the confusion matrix is going to be 4 * 4 table where the y-axis describes the true label, and the x-axis describes the predicted label.

Examples:

  1. 2-labels data set:
    first_conf_mat

  2. 3-labels data set:
    second_conf_mat

Feature Importance

Feature importance is a bars chars where the features names are the x-axis and the y-axis is a range of floats between 0 to 1 (or the opposite).
Each bar, belongs the single feature, represents a float number which is the score of importance of the specific feature in the prediction process.
It can be used for feature selection, dimensionality reduction, improving estimators accuracy and boosting performance.

Examples:
1)
first_fea_imp

second_fea_imp

ROC curve

A ROC (receiver operating characteristic curve) is a graphical plot that illustrates the diagnostic ability of a binary classifier.
The ROC curve is created by plotting the true positive rate (TP/(TP+FN)) against the false positive rate (FP/(FP + TN)) at various threshold settings.
Note: this implementation is restricted to the binary classification task.

Examples:
1)
first_roc_curve

Correlation

Correlation is any statistical relationship between two random variables or bivariate data.
In the broadest sense correlation is any statistical association, though it commonly refers to the degree to which a pair of variables are linearly related.
Correlations are useful because they can indicate a predictive relationship that can be exploited in practice.
Data Scientist or Statisticians might be interested in correlation in order to make dimensionality reduction, improving estimators accuracy and boosting performance.

Examples: 1)
first_corr

feature-vs-feature

When our library detects strong correlation (smaller than -0.7 or greater than 0.7), it automatically produces a scatter plot which presents the correlation.

Examples:

  1. Watch the following correlation table:
    second_corr
    As you can see (in the popped window), there is a high correlation (0.93) between the two features.
    So, the library automatically produces the following scatter plot:
    first-fea-vs-fea
    Which shows the strong relation between the two (where the first grows, the second grows almost linearly).

ailibrary's People

Contributors

leah4kosh avatar yochze avatar omerliberman avatar tomerhasson avatar aaroncnvrg avatar snirbenyosef avatar vvagias avatar robert-cnvrg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.