GithubHelp home page GithubHelp logo

inoueakimitsu / milwrap Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 594 KB

Wrapping single instance learning algorithms for fitting them to data for multiple instance learning

License: MIT License

Python 1.98% Jupyter Notebook 98.02%
multiple-instance-learning multi-class-classification large-data sklearn machine-learning python

milwrap's Introduction

milwrap

Build Status GitHub issues Open In Colab

Python package for multiple instance learning (MIL). This wraps single instance learning algorithms so that they can be fitted to data for MIL.

Features

  • support count-based multiple instance assumptions (see wikipedia)
  • support multi-class setting
  • support scikit-learn algorithms (such as RandomForestClassifier, SVC, LogisticRegression)

Installation

pip install milwrap

Usage

For more information, see Use scikit-learn models in multiple instance learning based on the count-based assumption.

# Prepare single-instance supervised-learning algorithm
# Note: only supports models with predict_proba() method.
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()

# Wrap it with MilCountBasedMultiClassLearner
from milwrap import MilCountBasedMultiClassLearner 
mil_learner = MilCountBasedMultiClassLearner(clf)

# Prepare follwing dataset
#
# - bags ... list of np.ndarray
#            (num_instance_in_the_bag * num_features)
# - lower_threshold ... np.ndarray (num_bags * num_classes)
# - upper_threshold ... np.ndarray (num_bags * num_classes)
#
# bags[i_bag] contains not less than lower_thrshold[i_bag, i_class]
# i_class instances.

# run multiple instance learning
clf_mil, y_mil = learner.fit(
    bags,
    lower_threshold,
    upper_threshold,
    n_classes,
    max_iter=10)

# after multiple instance learning,
# you can predict instance class
clf_mil.predict([instance_feature])

See tests/test_countbased.py for an example of a fully working test data generation process.

License

milwrap is available under the MIT License.

milwrap's People

Contributors

inoueakimitsu avatar

Stargazers

 avatar

Watchers

 avatar

milwrap's Issues

Allow y to be initialized externally

When using the current initialization process for y, the initial class cannot be assigned to an unusual class. This may cause learning to fail.

Allow for subsampling when inputting large training data

When the total number of instances is large, the computation time of most scikit-learn trainers becomes unacceptably long.
In early iterations of multiple instance learning, it may not be necessary to include all instances in the training data.
Therefore, we will add a function to adjust the amount of subsamples used for training according to the progress of convergence of multiple instance learning.

How to wrap a deep learning model

start_date: 2021/10/16
due_date: 2021/10/16
progress: 0
parent: 0
dependon:
  - '1'

We need to create a procedure to wrap deep learning models.

With an interface similar to scikit-learn, there is no need to change the current implementation.

For Keras, you can use keras.wrappers.scikit_learn.KerasClassifier.

For PyTorch, I do not know how to do this at the moment.

Initial labels for imbalance data entry

For data with large imbalance between classes, the initial labels may be inappropriate.

For example, rare classes cannot be assigned as initial labels and therefore cannot be trained.

pypi cannot show emoticons in README.md

In pypi, the emoticons in the heading of README.md are garbled.

I think it is necessary to remove the emoticons from README.md or to implement a process in setup.py to remove emoticons from the strings registered in pypi.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.