GithubHelp home page GithubHelp logo

dsa2's Introduction

Check here if latest commit is working :

Testing code

Main Main, test_fast_linux Main, test_full

Multi test_fast_linux test_full

Preprocessors Check test_preprocess

Looking for contributors

 Maintain and setup roadmap of this excellent Data Science / ML repo.
 Goal is to unified Data Science and Machine Learning .
 Basic idea is to have one single dictionary/json for
        model, compute, data definition,
 --> easy to define, easy to track, easy to modify.

Install

 git clone 
 cd dsa2
 pip install -r zrequirements.txt

Basic usage

python  titanic_classifier.py  preprocess    --nsample 1000
python  titanic_classifier.py  train         --nsample 2000
python  titanic_classifier.py  predict

How to train a new dataset ?

1) Put your data file   in   data/input/mydata/raw/   

link

2) Update script        in   data/input/mydata/clean.py
   to load column names, basic profile...


3) run  python clean.py train_test
    which generates train and test data in :   
       data/input/mydata/train/features.parquet   target.parquet  (y label)        
       data/input/mydata/test/features.parquet    target.parquet  (y label)                
            
4) Copy Paste titanic_classifier.py  into  mydata_classifier.py

5) Modify the script     mydata_classifier.py
    to match your dataset and the models you want to test.
      
6) Run 
    python  mydata_classifier.py  train
    python  mydata_classifier.py  predict

Examples

  In example/

List of preprocessor

    #### Data Over/Under sampling 
    prepro_sampler.pd_autoencoder(df,col, pars)
    
    prepro_sampler.pd_col_genetic_transform(df,col, pars)        
    prepro_sampler.pd_colcat_encoder_generic(df,col, pars)
    
    prepro_sampler.pd_filter_resample(df,col, pars)
    prepro_sampler.pd_filter_rows(df,col, pars)


    #### Category, Numerical
    prepro.pd_autoencoder(df,col, pars)
    prepro.pd_col_genetic_transform(df,col, pars)
    
    prepro.pd_colcat_bin(df,col, pars)
    prepro.pd_colcat_encoder_generic(df,col, pars)
    prepro.pd_colcat_minhash(df,col, pars)
    prepro.pd_colcat_to_onehot(df,col, pars)
    
    prepro.pd_colcross(df,col, pars)
    prepro.pd_coldate(df,col, pars)
    
    prepro.pd_colnum(df,col, pars)
    prepro.pd_colnum_bin(df,col, pars)
    prepro.pd_colnum_binto_onehot(df,col, pars)
    prepro.pd_colnum_normalize(df,col, pars)
    prepro.pd_colnum_quantile_norm(df,col, pars)

    
    #### Text        
    prepro.pd_coltext(df,col, pars)
    prepro.pd_coltext_clean(df,col, pars)
    prepro.pd_coltext_universal_google(df,col, pars)
    prepro.pd_coltext_wordfreq(df,col, pars)
    
    
    #### Target label encoding
    prepro.pd_coly(df,col, pars)
    
    prepro.pd_filter_resample(df,col, pars)
    prepro.pd_filter_rows(df,col, pars)
    prepro.pd_label_clean(df,col, pars)


    #### Time Series 
    prepro_tseries.pd_ts_autoregressive(df,col, pars)
    prepro_tseries.pd_ts_basic(df,col, pars)
    prepro_tseries.pd_ts_date(df,col, pars)
    
    prepro_tseries.pd_ts_detrend(df,col, pars)
    prepro_tseries.pd_ts_generic(df,col, pars)
    prepro_tseries.pd_ts_groupby(df,col, pars)
    prepro_tseries.pd_ts_identity(df,col, pars)
    prepro_tseries.pd_ts_lag(df,col, pars)
    prepro_tseries.pd_ts_onehot(df,col, pars)
    prepro_tseries.pd_ts_rolling(df,col, pars)
    prepro_tseries.pd_ts_template(df,col, pars)

dsa2's People

Contributors

arita37 avatar soheil-star01 avatar vladimir9390 avatar priyabrata409 avatar notamine avatar akouaouchissam avatar mozin avatar elaynousse avatar cihanyatbaz avatar hammadtufail avatar ananthmanoj avatar manoharkaranth avatar mysterious588 avatar bhbharat avatar deepsourcebot avatar stamatelou avatar ekotik avatar abdallah1097 avatar sachin636 avatar iamrehman avatar yashkumar1992 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.