ardalanm / pylightgbm Goto Github PK
View Code? Open in Web Editor NEWPython binding for Microsoft LightGBM
License: Other
Python binding for Microsoft LightGBM
License: Other
Hi Ardalan,
Thank you for making the changes for early stopping, max_bin and verbose. Would you mind adding the parameter is_unbalance as well?
Thanks again,
Chris,
Hi.
It seems that I cannot train the model when setting num_iterations greater than 3000? Setting to 5000 throws error:
[LightGBM] [Info] 209.081899 seconds elapsed, finished iteration 2600
Traceback (most recent call last):
File "/home/lemma/miniconda2/lib/python2.7/site-packages/pylightgbm/models.py", line 143, in fit
with open(self.param['output_model'], mode='r') as file:
IOError: [Errno 2] No such file or directory: '/tmp/tmpNEm79g/LightGBM_model.txt'
Command exited with non-zero status 1
Is there a need to fix this? or no need because no matter how many num_iterations, the result will be the same?
Below 3000 is fine though.
Can I contribute for this repo?
Hello,
Whenever I have tried installing it using the ipython terminal or the anaconda command prompt , it throws the following error, saying the procedure entry point SSL_COMP_free_compression_methods could not be located.
The screen shot is attached. Would you possibly know how to fix this error ?? Apparently, i have been unable to search it using keywords on google , so apologies if it's not entirely related. Any help is appreciated
There is possibility to give init score (as array) in LightGBM in form additioonal file (train.txt.init).
Can you support this as well? As input to fit() function?
It is very suitable for regression task where init in form of zeros is not good and better choice is mean of target.
thx
Hi @ArdalanM , can you please support also dart (boosting, drop_rate) and max_depth params?
thx
Thank you @ArdalanM for creating the wrapper which looks great!
Is it possible to:
Thanks,
The model trains and then breaks at the last instance.
The model does output a prediction when called to do so
[LightGBM] [Info] 0.018901 seconds elapsed, finished iteration 99
[LightGBM] [Info] 0.019084 seconds elapsed, finished iteration 100
[LightGBM] [Info] Finished training
... Lots of whitespace ...
---------------------------------------------------------------------------
PermissionError Traceback (most recent call last)
<ipython-input-11-f6bccd7e15ae> in <module>()
17 #x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2)
18
---> 19 clf.fit(X, y)
20 print("Mean Square Error: ", metrics.mean_squared_error(y, clf.predict(X)))
C:\Anaconda\envs\py35\lib\site-packages\pylightgbm\models.py in fit(self, X, y, test_data)
110 with open(self.param['output_model'], mode='r') as file:
111 self.model = file.read()
--> 112 shutil.rmtree(tmp_dir)
113
114 if test_data and self.param['early_stopping_round'] > 0:
C:\Anaconda\envs\py35\lib\shutil.py in rmtree(path, ignore_errors, onerror)
486 os.close(fd)
487 else:
--> 488 return _rmtree_unsafe(path, onerror)
489
490 # Allow introspection of whether or not the hardening against symlink
C:\Anaconda\envs\py35\lib\shutil.py in _rmtree_unsafe(path, onerror)
381 os.unlink(fullname)
382 except OSError:
--> 383 onerror(os.unlink, fullname, sys.exc_info())
384 try:
385 os.rmdir(path)
C:\Anaconda\envs\py35\lib\shutil.py in _rmtree_unsafe(path, onerror)
379 else:
380 try:
--> 381 os.unlink(fullname)
382 except OSError:
383 onerror(os.unlink, fullname, sys.exc_info())
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\admin\\AppData\\Local\\Temp\\tmpehya862g\\LightGBM_model.txt'
Model prediction call,
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished loading 100 models
[LightGBM] [Info] Finished initializing prediction
[LightGBM] [Info] Finished prediction
... Lots of whitespace ...
Mean Square Error: 668.323460051
Just wondering if this error affects the model or what can be done to stop the error being thrown? Any other info regarding this please specify. Thanks.
The examples show:
path_to_exec = "~/Documents/apps/LightGBM/lightgbm"
This path does not exist in the package
ls /Users/me/LightGBM/
CMakeLists.txt README.md docs include python-package tests
LICENSE build examples pmml src windows
...and when I try to build a model I get an error
from pylightgbm.models import GBMClassifier
df = pd.read_csv('my_data.csv')
params = {'exec_path': path_to_exec,
'num_iterations': 1000, 'learning_rate': 0.01,
'min_data_in_leaf': 1, 'num_leaves': 5,
'metric': 'binary_error', 'verbose': False,
'early_stopping_round': 20}
GBMClassifier(params).fit(df['X_var'], df['y_var'])
pyLightGBM is looking for 'LIGHTGBM_EXEC' environment variable, cannot be found.
exec_path will be deprecated in favor of environment variable
/Users/me/anaconda/lib/python2.7/site-packages/pylightgbm/models.pyc in fit(self, X, y, test_data, init_scores)
129
130 process = subprocess.Popen([self.exec_path, "config={}".format(conf_filepath)],
--> 131 stdout=subprocess.PIPE, bufsize=1)
132
133 else:
/Users/me/anaconda/lib/python2.7/subprocess.pyc in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
708 p2cread, p2cwrite,
709 c2pread, c2pwrite,
--> 710 errread, errwrite)
711 except Exception:
712 # Preserve original exception in case os.close raises.
/Users/me/anaconda/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
1333 raise
1334 child_exception = pickle.loads(data)
-> 1335 raise child_exception
1336
1337
OSError: [Errno 13] Permission denied
Hello.
It's my first time to use cpp file and headfile for python.
If I want to use this package, May I put the LightGBM's src into LighGBM/lightgbm? And then, I can design a classifier.
ps. Is the directory wrong? LighGBM -> LightGBM?
I got the error after validation.
Here is the code.
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from pylightgbm.models import GBMClassifier
from sklearn.metrics import roc_auc_score
diabetes = load_diabetes()
split_ind = 200
X = diabetes['data']
y = diabetes['target']
X = pd.DataFrame(X).add_prefix('c')
y = pd.Series(y)
y = (y>150)*1
X_train, X_test = X[:split_ind], X[split_ind:]
y_train, y_test = y[:split_ind], y[split_ind:]
exec = "~/LightGBM/lightgbm"
clf = GBMClassifier(exec_path=exec,
num_iterations=3000,
metric='auc',
early_stopping_round=20)
clf.fit(X_train, y_train,
test_data=[(X_test, y_test)])
clf.param['num_iterations'] = clf.best_round # Also .set_params wouldn't work
clf.fit(X_train, y_train)
The second 'clf.fit' occurs 'FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/~', since in 'def fit' the 'with process.stdout:' does not write LightGBM_model.
LightGBM can use categorical feature directly.
There is a categorical_feature
parameter in LightGBM docs to deal with this behavior:
https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.md
It would be nice to add categorical feature support to pyLightGBM.
At the moment you need to put LightGBM path in the constructor, which can be a bad idea since your code won't work on other environment.
One of the solution is to check the existence of an environment variable, like LIGHTGBM_PATH
, which should contain the LightGBM path.
Hi,
in LightGBM there is new setting for control overfitting for small datasets: max_depth
thx for adding
Hello,
I have been trying to run the code of pylightgbm but it raises the exception that
" pyLightGBM is looking for 'LIGHTGBM_EXEC' environment variable, cannot be found.
exec_path will be deprecated in favor of environment variable "
I have also tried specifying the path to the lightgbm package installed in the library as well as pointed the path towards the lightgbm or pylightgbm package files which were downloaded from their respective github sources, but none of the files seem to provide the 'LIGHTGBM_EXEC' file/folder.
That is why i think the exception is getting raised again and again, ?? Is there a workaround this Issue.
Also, please see that I have tried using the following link which i think could probably provide a work around, but the last two lines while using conda prompt are NOT clear. That is, pip install requirements points to which source, they are certainly not present in pep8 package that was downloaded. Why are we using setup.py at the end ?? What package does this try install considering pip was already used to install packages
https://github.com/ArdalanM/pyLightGBM/blob/master/.travis.yml
Any suggestion, feedback is welcome
even though verbose= True I will not get all the log output of lightgbm.
This would be helpful to better debug problems in the config of lightGBM.
It seems there is an issue for getting best round for multi-class model.
From [LightGBM] [Info] the best iteration is 450 but clf.best_round is 9429.
Thanks!
The xgboost sklearn wrapper for python has pickle support. It would be great if lightgbm models could be serialized just as easy.
sometimes I get
input format error, should be LibSVMinput format error, should be LibSVM
but have no real clue how to debug it. Do you have any suggestions?
The strange thing is that the training worked just fine:
[LightGBM] [Info] 3.972398 seconds elapsed, finished 59 iteration
[LightGBM] [Info] 4.042168 seconds elapsed, finished 60 iteration
[LightGBM] [Info] Finish train
The problem occurs when I try to predict new values.
A clf.predict(X) seems to cause a
FileNotFoundError: [Errno 2] No such file or directory: 'pathToLightGBM/lightgbm_models/32028_1476974477/LightGBM_predict_result_32028_1476974523.txt'
for me.
I just downloaded the package from github and created a GBMClassifier - not sure if install via pip ... would be required.
I am using a mac / osx 10.11.6
but I can see the files in the finder:
I noticed, that no LightGBM_predict_result_32028_1476974809.txt
result.txt but rather only 3 other files were created.
Hi, what is the difference of pyLightGBM compared to https://github.com/Microsoft/LightGBM/tree/master/python-package is this package still maintained?
Is it possible to output the path where the ligtgbm model is generated at?
I am trying to figure out a problem with ligtgbm microsoft/LightGBM#46 and would like to look at the raw files.
Can someone explain what is the meaning of parameter "bagging_freq"?
clf.fit(x_train, y_train, test_data=[(x_test, y_test)])
Traceback (most recent call last):
File "", line 1, in
File "/home/apps/duane_tmp/anaconda2/lib/python2.7/site-packages/pylightgbm/models.py", line 131, in fit
stdout=subprocess.PIPE, bufsize=1)
File "/home/apps/duane_tmp/anaconda2/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/home/apps/duane_tmp/anaconda2/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
hello,when try the python example at clf.fit(x_train, y_train, test_data=[(x_test, y_test)]) this line,it response me that above error.could you help me fix that?tks!
IOError when calling fit()
OS = Windows 10 x64
Cloned and built it just a few hours ago.
est = GBMRegressor(exec_path="O:/Coding/LightGBM/",
config='',
application='regression',
num_iterations=2500,
learning_rate=0.1,
num_leaves=127,
tree_learner='serial',
num_threads=4,
min_data_in_leaf=100,
metric='l2',
feature_fraction=1.0,
feature_fraction_seed=2,
bagging_fraction=1.0,
bagging_freq=0,
bagging_seed=3,
metric_freq=1,
early_stopping_round=0)
est.fit(X, y, test_data=[(X_holdout, y_holdout)])
IOErrorTraceback (most recent call last)
in ()
----> 1 est.fit(X, y, test_data=[(X_holdout, y_holdout)])
C:\Users\ihopethiswillfi\Anaconda2\lib\site-packages\pylightgbm-0.2-py2.7.egg\pylightgbm\models.pyc in fit(self, X, y, test_data)
71 os.system("{} config={}".format(self.exec_path, self.config))
72
---> 73 with open(self.param['output_model'], mode='rb') as file:
74 self.model = file.read()
75
IOError: [Errno 2] No such file or directory: 'c:\users\ihopethiswillfi\appdata\local\temp\tmpksw8jv\LightGBM_model.txt'
Hi,
can you please provide some example how to save and load model?
Thx
Hey!
Thanks for creating this wrapper, really appreciate it. I'm getting the following error when attemping to use gridsearchCV with pyLightGBM, not sure if its an issue with LightGBM or pyLightGBM; note that the same error occurs if I try to use it within loops as well outside of gridserachcv.
Link to notebook:
Hello ArdalanM, it's great and simple to use GBM in this py wrapper code.
Now i am run your notebook regression_example_kaggle_allstate.ipynb
, get one error in gbmr.predict
line, the output message is
<ipython-input-10-0c40d9335a9b> in <module>()
22
23 gbmr.fit(X_train, y_train, test_data=[(X_valid, y_valid)])
---> 24 print("Mean Square Error: ", metrics.mean_absolute_error(y_true=(np.exp(y_valid)-1), y_pred=(np.exp(gbmr.predict(X_valid))-1)))
/usr/local/lib/python2.7/dist-packages/pylightgbm/models.pyc in predict(self, X)
122
123 process = subprocess.check_output([self.exec_path, "config={}".format(conf_filepath)],
--> 124 universal_newlines=True)
125
126 if self.verbose:
/usr/lib/python2.7/subprocess.pyc in check_output(*popenargs, **kwargs)
572 if cmd is None:
573 cmd = popenargs[0]
--> 574 raise CalledProcessError(retcode, cmd, output=output)
575 return output
576
CalledProcessError: Command '['/home/lyz/Workspace/Github/LightGBM/lightgbm', 'config=/tmp/tmphdAl6R/predict.conf']' returned non-zero exit status 1
Really appreciate your help.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.