GithubHelp home page GithubHelp logo

yangjiamu / sklearn-transform Goto Github PK

View Code? Open in Web Editor NEW

This project forked from coreylynch/sklearn-transform

0.0 2.0 0.0 137 KB

Collection of scripts for doing common transformations in machine learning

Python 100.00%

sklearn-transform's Introduction

#Categorical DataFrames to Sparse SVM-Light Format

Converts a categorical design matrix X to a sparse CSR matrix, then writes to SVM-lite format.

This takes a pandas DataFrame with categorical features, converts category values to a sparse one-hot representation, then writes the sparse matrix to SVM-light format.

SVM-light is a text-based format, with one sample per line. It does not store zero valued features hence is suitable for sparse dataset.

The first element of each line can be used to store a target variable to predict.

Parameters

X : pandas DataFrame, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples] Target values.

filename : string or file-like in binary mode If string, specifies the path that will contain the data. If file-like, data will be written to f. f should be opened in binary mode.

cat_columns: array-like List of categorical columns

num_columns: array-like, optional List of numerical columns

zero_based : boolean, optional Whether column indices should be written zero-based (True) or one-based (False).

comment : string, optional Comment to insert at the top of the file. This should be either a Unicode string, which will be encoded as UTF-8, or an ASCII byte string.

query_id : array-like, shape = [n_samples] Array containing pairwise preference constraints (qid in svmlight format).

Examples

    import pandas as pd
    import numpy as np

    category_data_1 = ['tcp','udp','udp','tcp','dns','tcp']
    category_data_2 = ['red','blue','red','green','blue','red']
    numerical_data = [1,2,1,1,3,4]
    data  = {'category_data_1': category_data_1,
             'category_data_2': category_data_2,
             'numerical_data':numerical_data}
    X = pd.DataFrame(data)
    y = np.array([1.,0.,1.,1.,0.,0.])
    
    cat_columns = ['category_data_1', 'category_data_2']
    num_columns = ['numerical_data']

    dump_categorical_df_to_svm_light(X, y, 'example', cat_columns, num_columns)

    head example    
    # Generated by dump_svmlight_file from scikit-learn 0.13-git
    # Column indices are zero-based
    1.000000 2:1.0000000000000000e+00 4:2.0000000000000000e+00
    0.000000 0:1.0000000000000000e+00 2:1.0000000000000000e+00 4:2.0000000000000000e+00
    1.000000 0:1.0000000000000000e+00 4:2.0000000000000000e+00
    1.000000 2:1.0000000000000000e+00 3:1.0000000000000000e+00 4:1.0000000000000000e+00
    0.000000 1:1.0000000000000000e+00 2:1.0000000000000000e+00 4:3.0000000000000000e+00
    0.000000 2:1.0000000000000000e+00 4:5.0000000000000000e+00

sklearn-transform's People

Contributors

coreylynch avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.