GithubHelp home page GithubHelp logo

ntucllab / libact Goto Github PK

View Code? Open in Web Editor NEW
778.0 59.0 174.0 1.89 MB

Pool-based active learning in Python

Home Page: http://libact.readthedocs.org/

License: BSD 2-Clause "Simplified" License

Python 75.12% C 8.93% C++ 14.02% Cython 1.93%
machine-learning-library active-learning machine-learning uncertainty-sampling

libact's People

Contributors

alexandreabraham avatar ariapoy avatar dlackty avatar eugene-yang avatar hsuantien avatar iamyuanchung avatar jkleint avatar kh-huang avatar kjacks21 avatar lazywei avatar lsc36 avatar sian-chen avatar skgg avatar tungen avatar wadkar avatar yangarbiter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libact's Issues

Problems installing in Linux

Hello,

I am trying to install Libact in the HPC facilites of my university. However I am getting the following error every time I try to install it:

error: Command "gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/rmegret/irodriguez/anaconda3/envs/bee/lib/python3.6/site-packages/numpy/core/include -I/usr/include/lapacke -I/home/rmegret/irodriguez/anaconda3/envs/bee/include/python3.6m -c libact/query_strategies/src/variance_reduction/variance_reduction.c -o build/temp.linux-x86_64-3.6/libact/query_strategies/src/variance_reduction/variance_reduction.o -std=c11" failed with exit status 1

I have tried pip and cloning the repo and then using setup.py.

Just in case here is the specifications of the HPC: https://www.hpcf.upr.edu/documentation/boqueron/#ffs-tabbed-15

Allow make_query to return multiple items (or the entire scored set)

In certain applications, you might want to know what the top N unlabelled entities are so that a human can go through and do batch labeling offline. Right now I have a particularly hacky way of getting multiple results out, just assuming the majority class in the update, but it would be great to tweak the make_query function to return arbitrary numbers of ordered results for batch label processing.
for i in range(20):
item_to_investigate = qs.make_query()
libact_ds.update(item_to_investigate, 0)
print item_to_investigate

Happy to contribute code to try to help this happen!

HintSVM mldataset - Buffer dtype mismatch error

Hi,

I try to use hintSVM query strategy with the vehicle dataset from mldata.
However, I don't understand why, I got the following error :

File "testing.py", line 60, in run
    ask_id = qs.make_query()
  File "/usr/local/lib/python3.5/site-packages/libact-0.1.2-py3.5-macosx-10.12-x86_64.egg/libact/query_strategies/hintsvm.py", line 151, in make_query
    np.array([x.tolist() for x in unlabeled_pool]), self.svm_params)
  File "libact/query_strategies/_hintsvm.pyx", line 16, in libact.query_strategies._hintsvm.hintsvm_query (libact/query_strategies/_hintsvm.c:1836)
ValueError: Buffer dtype mismatch, expected 'float64_t' but got 'long'

I don't have this error when I use others strategies (UncertaintySampling,Quire).

def split_scale_train_test(name_dataset,test_size):
    # choose a dataset with unbalanced class instances
    #data = sklearn.datasets.fetch_mldata('segment')
    data = sklearn.datasets.fetch_mldata(name_dataset)

    X = StandardScaler().fit_transform(data['data'])
    target = np.unique(data['target'])
    # mapping the targets to 0 to n_classes-1
    y = np.array([np.where(target == i)[0][0] for i in data['target']])

    X_trn, X_tst, y_trn, y_tst = \
        train_test_split(X, y, test_size=test_size, stratify=y)

    # making sure each class appears ones initially
    init_y_ind = np.array(
        [np.where(y_trn == i)[0][0] for i in range(len(target))])
    y_ind = np.array([i for i in range(len(X_trn)) if i not in init_y_ind])
    trn_ds = Dataset(
        np.vstack((X_trn[init_y_ind], X_trn[y_ind])),
        np.concatenate((y_trn[init_y_ind], [None] * (len(y_ind)))))

    tst_ds = Dataset(X_tst, y_tst)

    fully_labeled_trn_ds = Dataset(
        np.vstack((X_trn[init_y_ind], X_trn[y_ind])),
        np.concatenate((y_trn[init_y_ind], y_trn[y_ind])))

    cost_matrix = 2000. * np.random.rand(len(target), len(target))
    np.fill_diagonal(cost_matrix, 0)

    return trn_ds, tst_ds, y_trn,y_tst, fully_labeled_trn_ds, cost_matrix
def run(trn_ds, tst_ds, lbr, model, qs, quota):
    E_in, E_out = [], []
    score_train = []
    score_test = []

    for _ in range(quota):
        ask_id = qs.make_query()
        X, _ = zip(*trn_ds.data)
        lb = lbr.label(X[ask_id])
        trn_ds.update(ask_id, lb)

        model.train(trn_ds)
        E_in = np.append(E_in, 1 - model.score(trn_ds))
        E_out = np.append(E_out, 1 - model.score(tst_ds))
        score_train = np.append(score_train,model.score(trn_ds)*100)
        score_test = np.append(score_test,model.score(tst_ds)*100)

    return E_in, E_out,score_train,score_test
qs5 = HintSVM(trn_ds5, cl=1.0, ch=1.0, p=0.5)
        model = SVM(kernel='rbf',C = n_C, gamma = n_gamma, decision_function_shape='ovr')
        E_in_5, E_out_5,score_train_5,score_test_5 = run(trn_ds5, tst_ds, idealLabels, model, qs5, quota_to_query)
        results_out.append(E_out_5.tolist())
        results_score.append(score_test_5.tolist())

Do you have any insights about this error ?

thank you

QS: Model type check at constructor

For QSs that rely on a user-given model, a type checked should be performed since different QSs require different capabilities (e.g. UncertaintySampling requires a ContinuousModel).

Enhancement for unit testing

For now, the unit tests for active learning algorithms are using the results of real-world data with fixed random seeds. So in the future if any modification to these algorithms have conflict with current test, it should be taken care carefully.

The rigorous way to do the test is to design artificial datasets. We'll leave it as future development goal.

Is specified version of Python is required when compiling? Compile error using "python setup.py install"

Hello, Thank you for providing this project

After I have installed the dependencies, I run
python setup.py install

But, I get some errors:

Platform Detection: Linux. Link to liblapacke...
running install
running build
running build_py
running build_ext
building 'libact.query_strategies._variance_reduction' extension
C compiler: x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC

compile options: '-I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/lapacke -I/usr/include/python2.7 -c'
extra options: '-std=c11'
x86_64-linux-gnu-gcc: libact/query_strategies/src/variance_reduction/variance_reduction.c
libact/query_strategies/src/variance_reduction/variance_reduction.c:26:15: error: variable ‘moduledef’ has initializer but incomplete type
static struct PyModuleDef moduledef = {
^
libact/query_strategies/src/variance_reduction/variance_reduction.c:27:5: error: ‘PyModuleDef_HEAD_INIT’ undeclared here (not in a function)
PyModuleDef_HEAD_INIT,
^
。。。 。。。 。。。
。。。 。。。 。。。

I wonder if I need to specify the version of Python, so I tried
python3 steup.py install
Still, I cannot install successfully, but the error changes
File "setup.py", line 13, in
from Cython.Build import cythonize
ImportError: No module named 'Cython'

However, I have already installed Cython using "pip install Cython"

It will be very kind of you if you could tell me the requirement of version of the installed dependencies

OR could you please tell how to modify the "-I/usr/include/lapacke -I/usr/include/python2.7" in the compile option

Many Thanks

Installation using pip fails for python 2

Tried to install libact using sudo pip install libact and got the following error message

libact/query_strategies/variance_reduction.c:26:15: error: variable ‘moduledef’ has initializer but incomplete type

You can see the full error message here.

I also tried to install using the setup.pyscript, which actually did work just fine, also the python3 installation worked using pip on the same machine.
I did some googling and the error looked similar to here, I cant look into it because setup.py worked.
Just wanted to let you guys know.

Is there a way to perform batch mode active learning ?

Hi,

Instead of having of having unlabeled data which come as a stream, I would like to know if there is a way with libact to perform batch mode active learning meaning that the users can select multiples images at once (positive and negatives) ?

thank you in advance

Fix Travis Python 3.5 build

Python 3.5 seems to import everything before running unit tests, the _variance_reduction native extension is built and installed but import fails:

ImportError: Failed to import test module: libact.query_strategies
Traceback (most recent call last):
  File "/opt/python/3.5.0/lib/python3.5/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/opt/python/3.5.0/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/home/travis/build/ntucllab/libact/libact/query_strategies/__init__.py", line 16, in <module>
    from .variance_reduction import VarianceReduction
  File "/home/travis/build/ntucllab/libact/libact/query_strategies/variance_reduction.py", line 11, in <module>
    from libact.query_strategies import _variance_reduction
ImportError: cannot import name '_variance_reduction'

Build/install log of extension:

running build_ext
building 'libact.query_strategies._variance_reduction' extension
C compiler: gcc -pthread -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/libact
creating build/temp.linux-x86_64-3.5/libact/query_strategies
compile options: '-I/home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/numpy/core/include -I/opt/python/3.5.0/include/python3.5m -c'
extra options: '-std=c11'
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
gcc: libact/query_strategies/variance_reduction.c
gcc -pthread -shared -L/opt/python/3.5.0/lib -Wl,-rpath=/opt/python/3.5.0/lib build/temp.linux-x86_64-3.5/libact/query_strategies/variance_reduction.o -L/opt/python/3.5.0/lib -lpython3.5m -o build/lib.linux-x86_64-3.5/libact/query_strategies/_variance_reduction.cpython-35m-x86_64-linux-gnu.so -llapacke -llapack -lblas
running install_lib
creating /home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/libact
creating /home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/libact/query_strategies
copying build/lib.linux-x86_64-3.5/libact/query_strategies/_variance_reduction.cpython-35m-x86_64-linux-gnu.so -> /home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/libact/query_strategies

scikit-learn model adapter

Since we use scikit-learn models a lot, we should define an adapter from scikit-learn models to libact models.

Probabilistic models

I would like to ask you about which classsifiers are theorized as Probabilistic so as to be combined with query strategies like Uncertainty Sampling?

Thanks in advance.

Next stage

  1. Implement more classical query strategies.
  2. Add examples for using all query strategies.

Clarify semantics of Model.predict_real

Currently Model.predict_real is connected to predict_proba in scikit-learn, which returns an array of n_classes floats standing for probabilities of corresponding labels. But decision_function is another candidate whose returning shapes vary from model to model, for example (in our case n_samples = 1):

  • LogisticRegression: (n_samples,) if n_classes == 2 else (n_samples, n_classes)
  • C-SVC: (n_samples, n_classes * (n_classes-1) / 2)

We have to make sure what we want in order to well-define the interface. @hsuantien can you give us some advice on this?

Identify whether the relabeling in sklearn will cause problem

Since sklearn internally relabels the given label to 0-n_labels. If I get it correctly, they do it in the order of data sending into the fit method.
So if after we updated an unlabeled data and cause the order of data sending into fit method to change. The value from predict_real method of our model might have wrong order.
One proposal for solving this problem could be manage relabeling set ourself in the model classes.

Incompatibility with plotly and cufflinks

Hello,
I have found that your lib is not compatible with python packages plotly and cufflinks. I have tested it on fresh install of ubuntu 16.04 where anaconda was installed.
Everything was ok till installation of plotly and cufflinks:


pip install plotly --upgrade
pip install cufflinks --upgrade

Then running python setup.py test ends on this:

======================================================================
ERROR: query_strategies (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: query_strategies
Traceback (most recent call last):
  File "/path/anaconda3/lib/python3.5/unittest/loader.py", line 153, in loadTestsFromName
    module = __import__(module_name)
  File "/path/libact/libact/query_strategies/__init__.py", line 20, in <module>
    from ._variance_reduction import estVar
ImportError: /usr/lib/liblapacke.so.3: undefined symbol: dpotrf2_

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.