GithubHelp home page GithubHelp logo

amirhszd / jostar Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 6.0 822 KB

Feature Selection Library for Data Sciences in Python

Home Page: https://www.mdpi.com/2072-4292/13/16/3241

License: MIT License

Python 100.00%
optimization feature-selection feature genetic-algorithm particle-swarm-optimization ant-colony-optimization metaheuristics differential-evolution sequential-search multiobjective-optimization

jostar's Introduction

Jostar

Feature Selection Module for Data Sciences in Python

Jostar, from the Farsi word جستار meaning finder, is a Python-based feature selection module comprised of nine different feature selection approaches from single objective to multi-objective methods, for regression and classification tasks. The algorithms, to this date, are:

  • Ant Colony Optimization (ACO)
  • Differential Evolution (DE)
  • Genetic Algorithm (GA)
  • Plus-L Minus-R (LRS)
  • Non-dominated Sorting Genetic Algorithm (NSGAII)
  • Particle Swarm Optimization (PSO)
  • Simulated Annealing (SA)
  • Sequential Forward Selection (SFS)
  • Sequential Backward Selection (SBS)

Features

  • User-friendly, Sklearn-like interface (just call fit )
  • Thorough documentation and explanation of hyperparameters and their ranges
  • Tune hyperparameters easily
  • Generate rankings of the selected features
  • Display the results of your classification or regression task

Example

With only few lines of code:

from sklearn.cross_decomposition import PLSRegression
from jostar.algorithms import ACO
from sklearn.metrics import r2_score
import pandas as pd

data = pd.read_csv(r"F:\SnapbeanSummer2020\regression_data_v2.csv").to_numpy()
x = data[:,1:]
y = data[:,0]
model = PLSRegression()

# # optimizing 
aco = ACO(model=model, n_f=5, weight=1, scoring=r2_score, n_iter=30)

aco.fit(x,y,decor=0.95, scale = True)

Evolution of created pareto front via NSGAII:

Installation

Use pip as below to install jostar.

pip install jostar

to test if your installation was successful, change path to the directory and run pytest.

pytest

Documentation

Jostar comes with a powerful documentation. Below is the in-line documentation for Genetic Algorithm.

class GA(BaseFeatureSelector):

    def __init__(self, model, n_f, weight, scoring, n_gen=1000, n_pop=20 , cv=None,                                
				cross_perc = 0.5, mut_perc = 0.3, mut_rate= 0.02, beta = 5,
				verbose= True, random_state=None,**kwargs):

        """
        Genetic Algorithms or GA is a widely used global optimization algorithm 
        which was first introduced by Holland. GA is based on the natural selection
        in the evolution theory. Properties of GA such as probability of mutation and 
        cross over determines the specifics of the search done in each iteration.
        Additionally, we can also set the proportion of the population we want to
        perform cross over or mutation for. 
                
        Parameters
        ----------
        model : class
            Instantiated Sklearn regression or classification estimator.
        n_f : int
            Number of features needed to be extracted.
        weight : int
            Maximization or minimization objective, for maximization use +1
            and for mimization use -1.
        scoring : callable
            Callable sklearn score or customized score function that takes in y_pred and y_true                                                
        n_gen : int, optional
            Maximum number of generations or iterations. For more complex 
            problems it is better to set this parameter larger. 
            The default is 1000.
        n_pop : int, optional
            Number of population size at each iteration. Typically, this 
            parameter is set to be 10*n_f, but it is dependent on the complexity 
            of the model and it is advised that user tune this parameter based 
            on their problem. The default is 20.
        cv : class, optional
            Instantiated sklearn cross-validation class. The default is None.
        cross_perc : float, 
            The percentage of the population to perform cross over on. A common 
            choice for this parameter is 0.5. The larger cross_perc is chosen,
            the more exploition of the current population. The default is 0.5.
        mut_perc : float, optional
            The percentage of the population to perform mutation on. This is 
            usually chosen a small percentage (smaller than cross_perc). As 
            mut_perc is set larger, the model explorates more. 
            The default is 0.1.
        mut_rate : float, optional
            The mutation rate. This parameter determines the probability of 
            mutation for each individual in a population. It is often chosen 
            a small number to maintain the current good solutions.
            The default is 0.1.
        beta : int, optional
            Selection Pressure for cross-over. The higher this parameter the 
            stricter the selection of parents for cross-over. This value
            could be an integer [1,10]. The default value
            is 5.        
        verbose : bool, optional
            Wether to print out progress messages. The default is True.
        random_state : int, optional
            Determines the random state for random number generation, 
            cross-validation, and classifier/regression model setting. 
            The default is None.

        Returns
        -------
        Instantiated Optimziation model class.

        """

Developing Jostar

If you would like to develop and run tests for Jostar, run the following command in you virtual environment to install dev dependencies:

$ pip install -e .[dev]

Acknowledgement and References

Jostar is an extended Python version of YPEA developed in MATALB. If you found this project useful in your research, pelase consider citing our paper.

@article{Hassanzadeh_2021, title={Broadacre Crop Yield Estimation Using Imaging Spectroscopy from 
Unmanned Aerial Systems (UAS): A Field-Based Case Study with Snap Bean}, volume={13}, 
ISSN={2072-4292}, url={http://dx.doi.org/10.3390/rs13163241}, DOI={10.3390/rs13163241}, 
number={16}, journal={Remote Sensing}, publisher={MDPI AG}, 
author={Hassanzadeh, Amirhossein and Zhang, Fei and van Aardt, Jan and Murphy, 
Sean P. and Pethybridge, Sarah J.}, year={2021}, month={Aug}, pages={3241}}

jostar's People

Contributors

amirhszd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

jostar's Issues

dataset issue

when i used my dataset it's consists of 5 input and 1 output all of them numerical i have this error
index 0 is out of bounds for axis 0 with size 0

Installation error: 'sklearn' PyPI package is deprecated, use 'scikit-learn' rather than 'sklearn' for pip commands

I was trying to install Jostar when i encountered the following error

`pip install jostar
Collecting jostar
  Downloading jostar-0.0.4-py3-none-any.whl (65 kB)
     ---------------------------------------- 65.8/65.8 kB 3.7 MB/s eta 0:00:00
Collecting pathos
  Downloading pathos-0.3.0-py3-none-any.whl (79 kB)
     ---------------------------------------- 79.8/79.8 kB 4.3 MB/s eta 0:00:00
Collecting sklearn
  Downloading sklearn-0.0.post1.tar.gz (3.6 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
      rather than 'sklearn' for pip commands.

      Here is how to fix this error in the main use cases:
      - use 'pip install scikit-learn' rather than 'pip install sklearn'
      - replace 'sklearn' by 'scikit-learn' in your pip requirements files
        (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
      - if the 'sklearn' package is used by one of your dependencies,
        it would be great if you take some time to track which package uses
        'sklearn' instead of 'scikit-learn' and report it to their issue tracker
      - as a last resort, set the environment variable
        SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error

      More information is available at
      https://github.com/scikit-learn/sklearn-pypi-package

      If the previous advice does not cover your use case, feel free to report it at
      https://github.com/scikit-learn/sklearn-pypi-package/issues/new
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`

My environment
windows 10 computer
used anaconda to create a new environemt
python 3.9

Expected behavior
The tool installs correctly.

Screenshots
N/A

Additional context
N/A

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.