ardalanm / pylightgbm Goto Github PK

View Code? Open in Web Editor NEW

340.0 20.0 101.0 172 KB

Python binding for Microsoft LightGBM

License: Other

Python 17.33% Jupyter Notebook 82.67%

pylightgbm's People

Contributors

Stargazers

Watchers

Forkers

ebazarov yslzestfinance fanshaopu benjamesbabala wepe directorscut82 fulquan techscientist cherish24 khronosplus yunxileo onthelake yebi2013 niknoproblems martinkersner hanxiaoyang 42machinelearning xujin1982 duxuhao ternaus sooheon aaxwaz serganikin kevinmtian ltoscano ash-datalytica jianleisun jinyu0310 sunmingze ranniee wuzhongdehua xiangyongcao wujiahongpku carina28 sambid9988 yuyichen09 qgzang davidurpani tanzhuqing machinelearningorg ten2net qingdatascience pickou huangshizhi junior-d zhangyang5511 citysir scofieldyoo gypsysunny zhmi1204 difanyi whmnoe4j grseb9s hellofranker oakyms kirillmouraviev zjuqiushi ruobai089 georgkantsedal skchina sealuwee hanwsf liangzuan1983 lzwscu bunyamin4725 chuckshanhua amaoyoudianfang nihilitior ethanlee555 zhuya1996 tiantengwang nichenxi shanring changle2018 168wenfangjun tianlei822 jackyin68 2757412961 zijianzhou0918 dgreen2017

pylightgbm's Issues

is_unbalance

Hi Ardalan,

Thank you for making the changes for early stopping, max_bin and verbose. Would you mind adding the parameter is_unbalance as well?

Thanks again,

Chris,

Cannot train for n_iteration greater than 2600?

Hi.
It seems that I cannot train the model when setting num_iterations greater than 3000? Setting to 5000 throws error:
[LightGBM] [Info] 209.081899 seconds elapsed, finished iteration 2600
Traceback (most recent call last):
File "/home/lemma/miniconda2/lib/python2.7/site-packages/pylightgbm/models.py", line 143, in fit
with open(self.param['output_model'], mode='r') as file:
IOError: [Errno 2] No such file or directory: '/tmp/tmpNEm79g/LightGBM_model.txt'
Command exited with non-zero status 1

Is there a need to fix this? or no need because no matter how many num_iterations, the result will be the same?
Below 3000 is fine though.

what's your next plan?

Can I contribute for this repo?

Error in installing Via Ipython terminal

Hello,

Whenever I have tried installing it using the ipython terminal or the anaconda command prompt , it throws the following error, saying the procedure entry point SSL_COMP_free_compression_methods could not be located.

The screen shot is attached. Would you possibly know how to fix this error ?? Apparently, i have been unable to search it using keywords on google , so apologies if it's not entirely related. Any help is appreciated

Support init score

There is possibility to give init score (as array) in LightGBM in form additioonal file (train.txt.init).

Can you support this as well? As input to fit() function?

It is very suitable for regression task where init in form of zeros is not good and better choice is mean of target.

thx

dart, max_depth

Hi @ArdalanM , can you please support also dart (boosting, drop_rate) and max_depth params?

thx

max_bin and early_stopping_rounds

Thank you @ArdalanM for creating the wrapper which looks great!

Is it possible to:

add max_bin to the parameters
add a verbose/ silent flag to control if LightGBM's running message could be printed out, this can be of help for:
extract the best rounds from the running message if early stopping is used.

Thanks,

PermissionError: [WinError 32] The process cannot access "LightGBM_model.txt"

The model trains and then breaks at the last instance.

The model does output a prediction when called to do so

[LightGBM] [Info] 0.018901 seconds elapsed, finished iteration 99
[LightGBM] [Info] 0.019084 seconds elapsed, finished iteration 100
[LightGBM] [Info] Finished training

... Lots of whitespace ...

---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
<ipython-input-11-f6bccd7e15ae> in <module>()
     17 #x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2)
     18 
---> 19 clf.fit(X, y)
     20 print("Mean Square Error: ", metrics.mean_squared_error(y, clf.predict(X)))

C:\Anaconda\envs\py35\lib\site-packages\pylightgbm\models.py in fit(self, X, y, test_data)
    110         with open(self.param['output_model'], mode='r') as file:
    111             self.model = file.read()
--> 112             shutil.rmtree(tmp_dir)
    113 
    114         if test_data and self.param['early_stopping_round'] > 0:

C:\Anaconda\envs\py35\lib\shutil.py in rmtree(path, ignore_errors, onerror)
    486             os.close(fd)
    487     else:
--> 488         return _rmtree_unsafe(path, onerror)
    489 
    490 # Allow introspection of whether or not the hardening against symlink

C:\Anaconda\envs\py35\lib\shutil.py in _rmtree_unsafe(path, onerror)
    381                 os.unlink(fullname)
    382             except OSError:
--> 383                 onerror(os.unlink, fullname, sys.exc_info())
    384     try:
    385         os.rmdir(path)

C:\Anaconda\envs\py35\lib\shutil.py in _rmtree_unsafe(path, onerror)
    379         else:
    380             try:
--> 381                 os.unlink(fullname)
    382             except OSError:
    383                 onerror(os.unlink, fullname, sys.exc_info())

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\admin\\AppData\\Local\\Temp\\tmpehya862g\\LightGBM_model.txt'

Model prediction call,

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished loading 100 models
[LightGBM] [Info] Finished initializing prediction
[LightGBM] [Info] Finished prediction

... Lots of whitespace ...

Mean Square Error:  668.323460051

Just wondering if this error affects the model or what can be done to stop the error being thrown? Any other info regarding this please specify. Thanks.

'LIGHTGBM_EXEC' environment variable, cannot be found

The examples show:

path_to_exec = "~/Documents/apps/LightGBM/lightgbm"

This path does not exist in the package

ls /Users/me/LightGBM/
CMakeLists.txt	README.md	docs		include		python-package	tests
LICENSE		build		examples	pmml		src		windows

...and when I try to build a model I get an error

from pylightgbm.models import GBMClassifier

df = pd.read_csv('my_data.csv')

params = {'exec_path': path_to_exec,
      'num_iterations': 1000, 'learning_rate': 0.01,
      'min_data_in_leaf': 1, 'num_leaves': 5,
      'metric': 'binary_error', 'verbose': False,
      'early_stopping_round': 20}

GBMClassifier(params).fit(df['X_var'], df['y_var'])

pyLightGBM is looking for 'LIGHTGBM_EXEC' environment variable, cannot be found.
exec_path will be deprecated in favor of environment variable

/Users/me/anaconda/lib/python2.7/site-packages/pylightgbm/models.pyc in fit(self, X, y, test_data, init_scores)
    129 
    130             process = subprocess.Popen([self.exec_path, "config={}".format(conf_filepath)],
--> 131                                        stdout=subprocess.PIPE, bufsize=1)
    132 
    133         else:

/Users/me/anaconda/lib/python2.7/subprocess.pyc in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
    708                                 p2cread, p2cwrite,
    709                                 c2pread, c2pwrite,
--> 710                                 errread, errwrite)
    711         except Exception:
    712             # Preserve original exception in case os.close raises.

/Users/me/anaconda/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
   1333                         raise
   1334                 child_exception = pickle.loads(data)
-> 1335                 raise child_exception
   1336 
   1337 

OSError: [Errno 13] Permission denied

how can i use this package

Hello.
It's my first time to use cpp file and headfile for python.

If I want to use this package, May I put the LightGBM's src into LighGBM/lightgbm? And then, I can design a classifier.

ps. Is the directory wrong? LighGBM -> LightGBM?

FileNotFoundError after validation

I got the error after validation.
Here is the code.

import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from pylightgbm.models import GBMClassifier
from sklearn.metrics import roc_auc_score

diabetes = load_diabetes()

split_ind = 200

X = diabetes['data']
y = diabetes['target']

X = pd.DataFrame(X).add_prefix('c')
y = pd.Series(y)
y = (y>150)*1

X_train, X_test = X[:split_ind], X[split_ind:]
y_train, y_test = y[:split_ind], y[split_ind:]

exec = "~/LightGBM/lightgbm"

clf = GBMClassifier(exec_path=exec, 
                    num_iterations=3000,
                    metric='auc',
                    early_stopping_round=20)

clf.fit(X_train, y_train, 
        test_data=[(X_test, y_test)])
clf.param['num_iterations'] = clf.best_round # Also .set_params wouldn't work
clf.fit(X_train, y_train)

The second 'clf.fit' occurs 'FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/~', since in 'def fit' the 'with process.stdout:' does not write LightGBM_model.

Direct using of categorical features

LightGBM can use categorical feature directly.

There is a categorical_feature parameter in LightGBM docs to deal with this behavior:
https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.md

It would be nice to add categorical feature support to pyLightGBM.

Set LightGBM path with a environment variable

At the moment you need to put LightGBM path in the constructor, which can be a bad idea since your code won't work on other environment.

One of the solution is to check the existence of an environment variable, like LIGHTGBM_PATH, which should contain the LightGBM path.

new param max_depth added in LightGBM

Hi,

in LightGBM there is new setting for control overfitting for small datasets: max_depth

microsoft/LightGBM#35

thx for adding

pyLightGBM is looking for LIGHTGBM_EXEC environment Variable

Hello,

I have been trying to run the code of pylightgbm but it raises the exception that

" pyLightGBM is looking for 'LIGHTGBM_EXEC' environment variable, cannot be found.
exec_path will be deprecated in favor of environment variable "

I have also tried specifying the path to the lightgbm package installed in the library as well as pointed the path towards the lightgbm or pylightgbm package files which were downloaded from their respective github sources, but none of the files seem to provide the 'LIGHTGBM_EXEC' file/folder.

That is why i think the exception is getting raised again and again, ?? Is there a workaround this Issue.

Also, please see that I have tried using the following link which i think could probably provide a work around, but the last two lines while using conda prompt are NOT clear. That is, pip install requirements points to which source, they are certainly not present in pep8 package that was downloaded. Why are we using setup.py at the end ?? What package does this try install considering pip was already used to install packages

https://github.com/ArdalanM/pyLightGBM/blob/master/.travis.yml

conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION
- source activate test-environment
- pip install pytest pytest-cov python-coveralls pytest-xdist coverage #we need this version of coverage for coveralls.io to work
- pip install pep8 pytest-pep8
  _- pip install -r requirements.txt
- python setup.py install_

Any suggestion, feedback is welcome

missing log output

even though verbose= True I will not get all the log output of lightgbm.
This would be helpful to better debug problems in the config of lightGBM.

best_iteration for multi-class classification

It seems there is an issue for getting best round for multi-class model.
From [LightGBM] [Info] the best iteration is 450 but clf.best_round is 9429.

Thanks!

Permission Denied Error?!

I got this error. Any idea?
ps: ubuntu 14.04 with python 2.7

[Improvement] pickle support

The xgboost sklearn wrapper for python has pickle support. It would be great if lightgbm models could be serialized just as easy.

file format issues

sometimes I get
input format error, should be LibSVMinput format error, should be LibSVM but have no real clue how to debug it. Do you have any suggestions?

The strange thing is that the training worked just fine:

[LightGBM] [Info] 3.972398 seconds elapsed, finished 59 iteration
[LightGBM] [Info] 4.042168 seconds elapsed, finished 60 iteration
[LightGBM] [Info] Finish train

The problem occurs when I try to predict new values.

result file not found

A clf.predict(X) seems to cause a
FileNotFoundError: [Errno 2] No such file or directory: 'pathToLightGBM/lightgbm_models/32028_1476974477/LightGBM_predict_result_32028_1476974523.txt' for me.
I just downloaded the package from github and created a GBMClassifier - not sure if install via pip ... would be required.

edit

I am using a mac / osx 10.11.6

but I can see the files in the finder:

edit2

I noticed, that no LightGBM_predict_result_32028_1476974809.txt result.txt but rather only 3 other files were created.

comparison with original lightGBM python package

Hi, what is the difference of pyLightGBM compared to https://github.com/Microsoft/LightGBM/tree/master/python-package is this package still maintained?

can't find parameter min_sum_hessian_in_leaf in init

qustion: find generated path

Is it possible to output the path where the ligtgbm model is generated at?
I am trying to figure out a problem with ligtgbm microsoft/LightGBM#46 and would like to look at the raw files.

what is "bagging_freq"?

Can someone explain what is the meaning of parameter "bagging_freq"?

python example error

clf.fit(x_train, y_train, test_data=[(x_test, y_test)])
Traceback (most recent call last):
File "", line 1, in
File "/home/apps/duane_tmp/anaconda2/lib/python2.7/site-packages/pylightgbm/models.py", line 131, in fit
stdout=subprocess.PIPE, bufsize=1)
File "/home/apps/duane_tmp/anaconda2/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/home/apps/duane_tmp/anaconda2/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

hello,when try the python example at clf.fit(x_train, y_train, test_data=[(x_test, y_test)]) this line,it response me that above error.could you help me fix that?tks!

IOError when calling fit()

IOError when calling fit()
OS = Windows 10 x64
Cloned and built it just a few hours ago.

est = GBMRegressor(exec_path="O:/Coding/LightGBM/", 
                   config='', 
                   application='regression', 
                   num_iterations=2500, 
                   learning_rate=0.1, 
                   num_leaves=127, 
                   tree_learner='serial', 
                   num_threads=4, 
                   min_data_in_leaf=100, 
                   metric='l2', 
                   feature_fraction=1.0, 
                   feature_fraction_seed=2, 
                   bagging_fraction=1.0, 
                   bagging_freq=0, 
                   bagging_seed=3, 
                   metric_freq=1, 
                   early_stopping_round=0)
est.fit(X, y, test_data=[(X_holdout, y_holdout)])

IOErrorTraceback (most recent call last)
in ()
----> 1 est.fit(X, y, test_data=[(X_holdout, y_holdout)])

C:\Users\ihopethiswillfi\Anaconda2\lib\site-packages\pylightgbm-0.2-py2.7.egg\pylightgbm\models.pyc in fit(self, X, y, test_data)
71 os.system("{} config={}".format(self.exec_path, self.config))
72
---> 73 with open(self.param['output_model'], mode='rb') as file:
74 self.model = file.read()
75

IOError: [Errno 2] No such file or directory: 'c:\users\ihopethiswillfi\appdata\local\temp\tmpksw8jv\LightGBM_model.txt'

save and load model

Hi,

can you please provide some example how to save and load model?

Thx

CalledProcessError

Hey!

Thanks for creating this wrapper, really appreciate it. I'm getting the following error when attemping to use gridsearchCV with pyLightGBM, not sure if its an issue with LightGBM or pyLightGBM; note that the same error occurs if I try to use it within loops as well outside of gridserachcv.

Link to notebook:

https://github.com/NickBuchny/UTSCProjects/blob/master/Fitting%2BMicrosoft%2527s%2BLightGBM%2Bto%2Bthe%2BTESS%2B19h_44d%2Bdataset.%2B.ipynb

GBM predict with returned non-zero exit status 1 error

Hello ArdalanM, it's great and simple to use GBM in this py wrapper code.

Now i am run your notebook regression_example_kaggle_allstate.ipynb , get one error in gbmr.predict line, the output message is

<ipython-input-10-0c40d9335a9b> in <module>()
     22 
     23 gbmr.fit(X_train, y_train, test_data=[(X_valid, y_valid)])
---> 24 print("Mean Square Error: ", metrics.mean_absolute_error(y_true=(np.exp(y_valid)-1), y_pred=(np.exp(gbmr.predict(X_valid))-1)))

/usr/local/lib/python2.7/dist-packages/pylightgbm/models.pyc in predict(self, X)
    122 
    123         process = subprocess.check_output([self.exec_path, "config={}".format(conf_filepath)],
--> 124                                           universal_newlines=True)
    125 
    126         if self.verbose:

/usr/lib/python2.7/subprocess.pyc in check_output(*popenargs, **kwargs)
    572         if cmd is None:
    573             cmd = popenargs[0]
--> 574         raise CalledProcessError(retcode, cmd, output=output)
    575     return output
    576 

CalledProcessError: Command '['/home/lyz/Workspace/Github/LightGBM/lightgbm', 'config=/tmp/tmphdAl6R/predict.conf']' returned non-zero exit status 1

Really appreciate your help.

ardalanm / pylightgbm Goto Github PK

pylightgbm's People

Contributors

Stargazers

Watchers

Forkers

pylightgbm's Issues

edit

edit2

Recommend Projects

Recommend Topics

Recommend Org

Jobs