GithubHelp home page GithubHelp logo

antoniopenta / ml-project-structure Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 34 KB

Easy and powerful template for ML projects

License: Apache License 2.0

Python 84.15% Shell 15.85%
machine-learning project-structure luigi-workflows luigi luigi-pipeline

ml-project-structure's Introduction

Easy and powerful template for ML projects

Prerequisites

pip install -r requirements.txt

Getting Started

The following command creates a structure specified in template.json for your ML projects

python build.py -dir project_example/ -template_file  template.json

An example of structure that can be defined in the template.json

{
  "num_algorithm": 2,
  "num_training_testing_validation": 2,
  "directory_training_suffix": "training",
  "datasets": [
    "dataset_1"
  ],
  "main_directories": [
    "config_pipelines",
    "data_experiments",
    "@framework",
    "@jupyter",
    "@luigi_pipeline"
  ],
  "sub_directories": [
    {
      "father": "config_pipelines",
      "dirs": [
        "directory_algoritm_suffix*num_algorithm",
        "data_generation",
        "data_processing",
        "metrics"
      ]
    }
  ]
}
  • @ in "@framework" is used to specify if the folder is a python module
  • num_algorithm is used to specify how many algorithms you would like to test
  • "directory_algoritm_suffix*num_algorithm" is used to generate multiple folders where the * suffix(directory_algoritm_suffix) and the number (num_algorithm) are specified in the template too.

Luigi Pipeline for experiments

In the folder pipeline_example, there is an dummy example of how to use Luigi pipeline for evaluating a KMeans algorithm.

More info on the amazing framework Luigi ( or Gigino from friends in Naples) can be found here: https://github.com/spotify/luigi

The main idea is to define the experiments using excel as follows:

experiment diminstance clusters n_features random_state file_dataframe file_label_true k file_label_predicted file_metrics
1 100@data_generation 10@data_generation 5@data_generation 0@data_generation data_experiments/data_generation/file_dataframe.csv@file data_experiments/data_generation/file_label_true.csv@file 10@kmeansalgo0 data_experiments/algorithm0/file_label_predicted_algorithm0_1.csv@file data_experiments/metrics/metrics_algorithm_1.csv@file
2 100@data_generation 10@data_generation 5@data_generation 0@data_generation data_experiments/data_generation/file_dataframe.csv@file data_experiments/data_generation/file_label_true.csv@file 20@kmeansalgo0 data_experiments/algorithm0/file_label_predicted_algorithm0_2.csv@file data_experiments/metrics/metrics_algorithm_2.csv@file
3 100@data_generation 10@data_generation 5@data_generation 0@data_generation data_experiments/data_generation/file_dataframe.csv@file data_experiments/data_generation/file_label_true.csv@file 30@kmeansalgo0 data_experiments/algorithm0/file_label_predicted_algorithm0_3.csv@file data_experiments/metrics/metrics_algorithm_3.csv@file

Each row is an experiment Each column is an attribute of the configuration file @ is used to defined the key of the dictorany in the configuration file. For example :

experiment k
1 10@kmeansalgo0

becomes in a configuration file :

[kmeansalgo0]
k = 30

The extraction of the configuration file from the excel file is done using the python script update_config_files.py

The bash file exp_cluster.sh is used to run the pipeline:

This is used to create the configuration file using the data defined in the experiment 1

python scripts/update_config_files.py -excel_file experimental_settings/experiments_metafile.xlsx -sheet exp_cluster -experiment 1 -conf_file config_pipelines/data_generation/evaluation_pipeline.conf

Then the pipeline is lunched using the configuration file created above:

luigi --module luigi_pipeline.evaluation_pipeline   GenerateData  --conf config_pipelines/data_generation/evaluation_pipeline.conf  --local-scheduler --no-lock

Authors

Antonio Penta

ml-project-structure's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.