kaz-anova / stacknet Goto Github PK
View Code? Open in Web Editor NEWStackNet is a computational, scalable and analytical Meta modelling framework
License: MIT License
StackNet is a computational, scalable and analytical Meta modelling framework
License: MIT License
I have tried some of the algorithms like svm and knn but it seems too slow to run on my computer.
Are there any ways to make those algorithms running faster?
I am seeing this in output
[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished loading 1001 models
[LightGBM] [Info] Finished initializing prediction
[LightGBM] [Info] Data file /Users/.../dev/snexample/models/9ogh6b9bl344q13keqvq3k7bco.test doesn't contain a label column
[LightGBM] [Info] Finished prediction
Is the missing lable column
significant?
Hi,
I want to try your package, but I can't run it on Linux:
java -jar StackNet.jar train sparse=false has_head=false model=model pred_file=pred.csv train_file=train.csv params=stakcnet.txt verbose=true threads=7 metric=logloss stackdata=false seed=1 folds=5
it gives immediately the following error:
Error: Could not find or load main class –jar
It seems some namespace/classpath issue, could you please either provide instruction how to build jar or specify the namespace/classpath.
Thanks,
Valentin.
I am trying zillow model with a different dataset. But I keep running out of memory even with -Xmx128g
setting
[2.0, <=2.0]
[7.0, <=7.0]
[44142.0, <=44142.0]
Level: 1 dimensionality: 11
Starting cross validation
Fitting model: 1
Fitting model: 2
Fitting model: 3
Fitting model: 4
Fitting model: 5
Fitting model: 6
Fitting model: 7
Fitting model: 8
Exception in thread "Thread-1" Exception in thread "Thread-3" Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
at manipulate.copies.copies.Copy(copies.java:183)
at matrix.fsmatrix.GetData(fsmatrix.java:220)
at matrix.fsmatrix.Copy(fsmatrix.java:388)
at ml.Bagging.BaggingRegressor.fit(BaggingRegressor.java:1443)
at ml.Bagging.BaggingRegressor.run(BaggingRegressor.java:241)
at java.lang.Thread.run(Thread.java:745)
I tried deleting model 8 which is GradientBoostingForestRegressor
but still the same result - I guess the next model still ran out of memory.
Any suggestions?
Hi Marios,
Thanks for sharing Stacknet, great tool for stacking method, but still i'm not clear how to tune a single model, for example, if my paramter file is as following:
_XgboostRegressor booster:gblinear objective:reg:linear max_leaves:0 num_round:500 eta:0.1 threads:3 gamma:1 max_depth:4 colsample_bylevel:1.0 min_child_weight:4.0 max_delta_step:0.0 subsample:0.8 colsample_bytree:0.5 scale_pos_weight:1.0 alpha:10.0 lambda:1.0 seed:1 verbose:false
LightgbmRegressor verbose:false_
What should i put in the command line? Is it same as this one?
java -Xmx12048m -jar StackNet.jar train task=regression sparse=true has_head=false output_name=datasettwo model=model2 pred_file=pred2.csv train_file=dataset2_train.txt test_file=dataset2_test.txt test_target=false params=dataset2_params.txt verbose=true threads=1 metric=mae stackdata=false seed=1 folds=4 bins=3
Thank you!
when I have the test_file specified in train task it works but when run predict separately with task=regression I get:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.ClassCastException: ml.stacknet.StackNetRegressor cannot be cast to ml.stacknet.StackNetClassifier
at stacknetrun.runstacknet.main(runstacknet.java:775)
... 5 more
I encountered this problem with my own features in the Kaggle of Quora、
Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 750
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3002)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.fit(DecisionTreeRegressor.java:2382)
at ml.Tree.DecisionTreeRegressor.run(DecisionTreeRegressor.java:483)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-3" java.lang.NullPointerException
at ml.Tree.DecisionTreeRegressor.isfitted(DecisionTreeRegressor.java:3275)
at ml.Tree.scoringhelperv2.(scoringhelperv2.java:107)
at ml.Tree.RandomForestRegressor.predict2d(RandomForestRegressor.java:744)
at ml.Tree.GradientBoostingForestClassifier.fit(GradientBoostingForestClassifier.java:2353)
at ml.Tree.GradientBoostingForestClassifier.run(GradientBoostingForestClassifier.java:382)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-1112" java.lang.ArrayIndexOutOfBoundsException: 983
at ml.Tree.DecisionTreeClassifier.expand_node(DecisionTreeClassifier.java:3185)
at ml.Tree.DecisionTreeClassifier.expand_node(DecisionTreeClassifier.java:3225)
at ml.Tree.DecisionTreeClassifier.expand_node(DecisionTreeClassifier.java:3225)
at ml.Tree.DecisionTreeClassifier.expand_node(DecisionTreeClassifier.java:3221)
at ml.Tree.DecisionTreeClassifier.expand_node(DecisionTreeClassifier.java:3225)
at ml.Tree.DecisionTreeClassifier.fit(DecisionTreeClassifier.java:2576)
at ml.Tree.DecisionTreeClassifier.run(DecisionTreeClassifier.java:537)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-1137" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-1617" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-1642" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2075" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2284" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2365" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2382" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2463" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2608" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2721" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2802" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2883" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2996" java.lang.ArrayIndexOutOfBoundsException
logloss : 0.257352525104832
logloss : 2.4462523363834303
Exception in thread "main" java.lang.NullPointerException
at ml.Tree.DecisionTreeClassifier.isfitted(DecisionTreeClassifier.java:3458)
at ml.Tree.scoringhelpercatv2.(scoringhelpercatv2.java:107)
at ml.Tree.RandomForestClassifier.predict_proba(RandomForestClassifier.java:795)
at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:3532)
at stacknetrun.runstacknet.main(runstacknet.java:437)
I am sure that there is no zero like 0.000000 in the sparse file and all the data is in order。
what's more, I have 489 features. Is it too much?
/(ㄒoㄒ)/~~
I am getting this error on Mac. The console output is
$ java -Xmx12048m -jar StackNet.jar train task=regression sparse=true has_head=false output_name=datasettwo model=model2 pred_file=pred2.csv train_file=dataset2_train.txt test_file=dataset2_test.txt test_target=false params=dataset2_params.txt verbose=true threads=1 metric=mae stackdata=false seed=1 folds=4 bins=3
parameter name : task value : regression
parameter name : sparse value : true
parameter name : has_head value : false
parameter name : output_name value : datasettwo
parameter name : model value : model2
parameter name : pred_file value : pred2.csv
parameter name : train_file value : dataset2_train.txt
parameter name : test_file value : dataset2_test.txt
parameter name : test_target value : false
parameter name : params value : dataset2_params.txt
parameter name : verbose value : true
parameter name : threads value : 1
parameter name : metric value : mae
parameter name : stackdata value : false
parameter name : seed value : 1
parameter name : folds value : 4
parameter name : bins value : 3
[4793209, 88528]
Loaded File: dataset2_train.txt
Total rows in the file: 88528
Total columns in the file: undetrmined-Sparse
Number of elements : 4793209
The filedataset2_train.txt was loaded successfully with :
Rows : 88528
Columns (excluding target) : 1
Delimeter was :
Loaded sparse train data with 88528 and columns 58
loaded data in : 6.587000
Binning parameters
[-0.0131, <=-0.0131]
[0.0247, <=0.0247]
[0.4187, <=0.4187]
Level: 1 dimensionality: 12
Starting cross validation
Fitting model : 0
mae : 0.0534568661948753
Fitting model : 1
mae : 0.0531896123303922
Fitting model : 2
mae : 0.053078546424724905
Fitting model : 3
mae : 0.053564587280493355
Fitting model : 4
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException: Tree is not fitted
at ml.Bagging.scoringhelperbagv2.<init>(scoringhelperbagv2.java:109)
at ml.Bagging.BaggingRegressor.predict2d(BaggingRegressor.java:669)
at ml.Bagging.BaggingRegressor.predict_proba(BaggingRegressor.java:1875)
at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:3065)
at stacknetrun.runstacknet.main(runstacknet.java:522)
... 5 more
Python version 3.6.1
Java version: 1.8.0_121
Any suggestions on how to get around this issue?
After building 2 layers of stacking by:
java -Xmx12048m -jar StackNet.jar train task=regression sparse=true has_head=false output_name=datasettwo model=model2 pred_file=pred2.csv train_file=dataset2_train.txt test_file=dataset2_test.txt test_target=false params=dataset2_params.txt verbose=true threads=1 metric=mae stackdata=false seed=1 folds=4 bins=3
The following error was raised:
Loaded File: dataset2_test.txt Total rows in the file: 2985217 Total columns in the file: undetrmined-Sparse Number of elements : 386731415 Loaded sparse test data with 2985217 and columns 170 loading test data lasted : 613.503000 Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.OutOfMemoryError: Java heap space at utilis.map.intint.IntIntMapminus4a.<init>(IntIntMapminus4a.java:47) at matrix.smatrix.buildmap(smatrix.java:76) at ml.stacknet.StackNetRegressor.predict2d(StackNetRegressor.java:1496) at ml.stacknet.StackNetRegressor.predict_proba(StackNetRegressor.java:3942) at stacknetrun.runstacknet.main(runstacknet.java:691) ... 5 more
Any advice ? Thanks!
I wonder whether the output can give some hint about whether I select the right base models.
I saw something like:
Total ranks after target inclusion: 212409
Gain from before : 152317
percentage of unique ranks versus elements size: 13.892484314388511%
Gain of percentage of unique ranks versus elements size: 9.962202794207%
What does it mean? Are they for monitoring?
Thanks
Hi !
I'm having some troubles trying to make StackNet work. I have a very humble PC( reason why I dont want to test the examples included, they will probably kill my potato-PC ) and i would like to try some small models, in order to get a feeling of the software and how to use it :)
So for example I setup a parameters file with the next contents
LogisticRegression Type:Liblinear C:0.8 threads:1 usescale:True maxim_Iteration:100 seed:1 verbose:false
LogisticRegression Type:Liblinear C:0.5 RegularizationType:L1 threads:1 usescale:True maxim_Iteration:20 seed:1 verbose:false
RandomForestClassifier verbose:false
I run the java -jar etc etc and it seems to be running, but after a while i get this error
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: exceptions.IllegalStateException: The last layer of StackNet cannot have a classifier
at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:2516)
at stacknetrun.runstacknet.main(runstacknet.java:524)
So, i what to know what is causing this error and how to fix it.
Thanks in advance,
This is a really interesting piece of software, amazing job ! :)
I am a Kaggler and try to improve myself. :). Thanks for the great tool!
I saw you have cases combining two StackNets. So, I am wondering the typical strategy to use StackNet. After some data cleaning and feature engineering, then run StackNet. How to do model diagnosis with the result of StackNet? How to gradually improve the final model?
Thanks,
Jing
Is it possible to use different feature subsets for individual models? If yes, how would it can be configured?
Hello @kaz-Anova , have you ever met this issue (system can not find files for several models, which i guess maybe caused vide value of several columns for datasettwo_test files) when you run program? any ideas how to deal with this? thanks.
Would it be better if the output of output_name is in .txt format with target variable appended. it can be processed right into the next level without additional conversion required.
Hi
I'm trying to train a StackNet using sparse data. The problem is a classification problem with 9 possible categories. I had my training file in sparse format like this :
0 2:1 6:1 13:1 17:1 22:1 23:1 30:1 42:1 47:1 59:1 67:1 71:1 72:1 84:1 86:1
1 2:1 17:1 22:1 42:1 43:1 45:1 47:1 57:1 59:1 67:1 70:1 72:1 86:1 88:1 99:1
etc etc
And in the parameters file I have a list of classifiers, when i start the training, It gives an error after some time.
Fitting model : 9
( this model is a LightgbmClassifier )
Exception in thread "Thread-11935" java.lang.IllegalStateException: The produced score in temporary file /home/andresh/data-science/StackNet/models/nlr9tp06r037rshv48jde0rupi.pred is not of correct size
at ml.lightgbm.LightgbmClassifier.predict_proba(LightgbmClassifier.java:806)
at ml.Bagging.scoringhelpercatbagv2.score(scoringhelpercatbagv2.java:158)
at ml.Bagging.scoringhelpercatbagv2.run(scoringhelpercatbagv2.java:188)
at java.lang.Thread.run(Thread.java:745)
and the process doesn't stop, it keep training and even finishes the training. But I'm concerned about what this means and how is affecting training.
In other experiments, the process freezes, when trying to fit the models in the next fold.
Thanks, and any help is really appreciated :)
When I was trying to predict on a new file with the trained model. I specified the model directory by "model=models". However, StackNet failed to load models. Does that mean I need to rerun StackNet with the new pred_file?
When I use KerasnnRegressor
on Mac , it gives me the error:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException: Tree is not fitted
at ml.Bagging.scoringhelperbagv2.(scoringhelperbagv2.java:109)
at ml.Bagging.BaggingRegressor.predict2d(BaggingRegressor.java:669)
at ml.Bagging.BaggingRegressor.predict_proba(BaggingRegressor.java:1875)
at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:3065)
at stacknetrun.runstacknet.main(runstacknet.java:522)
... 5 more
Can you help me with that? Thanks!
I used:
KerasnnRegressor loss:mean_absolute_error standardize:true use_log1p:true shuffle:true batch_normalization:true weight_init:lecun_uniform momentum:0.9 optimizer:adam use_dense:true l2:0.000001,0.000001 hidden:50,50 activation:relu,relu droupouts:0.1,0.1 epochs:20 lr:0.01 batch_size:8 stopping_rounds:10 validation_split:0.2 seed:1 verbose:false
What's the reason that no test was included in the source?
Hi @kaz-Anova,
Encounter the following issue when calling XGBRegressor during prediction stage for Zillow Competition and default parameters were used according to the tutorial, appreciate it if you could look into the issue and help out on this.
C:\Users\meipaopao\StackNet\models\n6ut4jf5f4n3e5hjdl51n1p0ji0.pred (The system cannot find the file specified)
Exception in thread "Thread-46536" java.lang.NegativeArraySizeException
at io.input.Retrievecolumn(input.java:1455)
at ml.xgboost.XgboostRegressor.predict2d(XgboostRegressor.java:638)
at ml.Bagging.scoringhelperbagv2.score(scoringhelperbagv2.java:158)
at ml.Bagging.scoringhelperbagv2.run(scoringhelperbagv2.java:185)
at java.lang.Thread.run(Unknown Source)
C:\Users\meipaopao\StackNet\models\doh9b4mmpej0cem7eik7q170h20.pred (The system cannot find the file specified)
Exception in thread "Thread-46537" java.lang.NegativeArraySizeException
at io.input.Retrievecolumn(input.java:1455)
at ml.xgboost.XgboostRegressor.predict2d(XgboostRegressor.java:638)
at ml.Bagging.scoringhelperbagv2.score(scoringhelperbagv2.java:158)
at ml.Bagging.scoringhelperbagv2.run(scoringhelperbagv2.java:185)
at java.lang.Thread.run(Unknown Source)
Thank you very much!
Andy
I follow the steps for Zillow sample. Every time when I tuned the parameters it seems file in model folder changed. Do I have to delete the file or just keep them. Does this affect the results?
I know how to tune the model in level1, just put that model to first line in param.txt and rerun it once k-fold is finished.
However, i don't know how to tune the model at level2, do any one know ?
Hi, I just run the file make_stacknet_data.py
It shows the error as following:
Traceback (most recent call last):
File "D:/J/stacknet/make_stacknet_data.py", line 113, in
main()
File "D:/J/stacknet/make_stacknet_data.py", line 104, in main
dataset2()
File "D:/J/stacknet/make_stacknet_data.py", line 93, in dataset2
fromsparsetofile("dataset2_train.txt", x_train, deli1=" ", deli2=":",ytarget=y_train)
File "D:/J/stacknet/make_stacknet_data.py", line 25, in fromsparsetofile
if ytarget!=None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I don't understand how it comes.
ATT
Hello,
Thank you for providing an example with the zillow competition. I tried running the example but came with exactly the same problem as here: #16
I can't run lgbm and I have exactly the same output error:
Starting cross validation
Fitting model : 0
Exception in thread "Thread-1" java.lang.IllegalStateException: failed to create LIGHTgbm subprocess with config name /Users/hadoop/StackNet/models/ucsbmggugdanr6qc19a64a7sv10.conf
at ml.lightgbm.LightgbmRegressor.create_light_suprocess(LightgbmRegressor.java:426)
at ml.lightgbm.LightgbmRegressor.fit(LightgbmRegressor.java:1885)
at ml.lightgbm.LightgbmRegressor.run(LightgbmRegressor.java:516)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException: Tree is not fitted
at ml.Bagging.scoringhelperbagv2.(scoringhelperbagv2.java:109)
at ml.Bagging.BaggingRegressor.predict2d(BaggingRegressor.java:669)
at ml.Bagging.BaggingRegressor.predict_proba(BaggingRegressor.java:1875)
at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:3065)
at stacknetrun.runstacknet.main(runstacknet.java:522)
... 5 more
I tried to run lightgbm by itself by running it with the config file:
./lightgbm config=~ ./models/ucsbmggugdanr6qc19a64a7sv10.conf task=train
But I receive a 'Permission denied' error. I tried running the same command with sudo but I get
sudo: ./lightgbm: command not found
I checked and the jar file is in the same folder as the lib folder. Here is a screenshot of how my Stacknet folder is organized:
Thank you for your help.
I am guessing they are the h2o components.
I cannot compile the src cause of them,
Can you include instructions on your setup so that I can compile StackNet with these components?
If I want to use xgboost as first layer model, what should I do?
Hi dear stacknet developers and the author Mr.@kaz-Anova
I am new to kaggle, and i found stacknet a COOL tool to use. However it is not as GOOD as i expected using the default parameter.... I tried tuning/adding features.. but none of them give me any improvment yet. So that's why i post this passage, hopefully someone can guide me how to tune a BETTER stacknet (paramaters, folds, regressors, etc) for Zillow Competition.
As far as the doc said:
https://github.com/kaz-Anova/StackNet/tree/master/example/zillow_regression_sparse
the performance of StackeNet alone ONLY achieve around 0.0647 on LB, which is not good, comparing with other kernels. Some kernels can achieve a high score with a single model (lightgbm alone up to 0.0644). And there are also Kernels that use a 2-layer traditional-way stacking that achieve 0.0645 (https://www.kaggle.com/wangsg/ensemble-stacking-lb-0-644/comments), which is much better than stacknet as far as the LeaderBoard concerned.
So my question is that, why stacknet is not working very well on LB in Zillow competition?
If so, could @kaz-Anova please help update the paramater and regressors in the example so that the performance can gets better? (i tried a lot, but not imporved) . If so, more people (especially freshman like me) will be more happy to use StackNet.
( As far as i tried, my 5-fold LightGBM average works much worse than my 1-time Lightgbm.)
In my opinion, the Zillow LeaderBoard test data is evaluated on 2016.10 ~ 2016.12 . However, the data between 2016.10~2016.12 are very few in the training set. So , a K-FOLD may be is a bad ways to do the competition.
If so, would it be possible that StackedNet will ** in the future support a DIFFERENT out-of-bag metrics, not just KFOLD**, so that more flexible blending(i.e. devided data into two parts. Then only use history data to train, predict on future data, and use the future data to do stacking) or sliding window algorithm would be supported( you know, especially for time-related problem, sometimes it is bad to leak the future into the past with K-fold or reusable K-fold)
to Mr.@kaz-Anova: Sincerely congrats on the high score that you and your team achieve. However, considering the bad performance of Stacknet now, I am very curious if you are still using the StackNet in the competition as a strong predictor (instead as a weak model for averaging..)
Apologize for my rudeness (if that is the case) and surely I know that one can achieve better LB scoring just by combing with other kernels... .But my question is, the baseline of Stackenet now is so far away from other kernels. Are there any practical methods(or tricks) that you would like to share in order to make stacknet works better?
p.s. I am now a fan of stacknet, i want to express gratitude to Mr.@kaz-Anova for the converience that powerful stacknet brings us. I wish it could be even BETTER in the future.
Sincerely
What is the best way to go about getting a dataset into .libsvm format? Trying to work on the zillow dataset on kaggle, and have done the feature manipulation I want to do in R, but now I want to get the datsset into .libsvm format so I can try StackNet out for the first time, but I have no idea how to get the data into this format, and no solution from googling is working either.
How do I use R, or something else to get the data into the desired format?
Thanks
Hi, I tried to replicate one of the example that provided in this repo. In this case, I tried to replicate the Amazon one. I ran the code using param_amazon_linear like the one documented in that example, but all I got was this:
> java -Xmx3048m -jar StackNet.jar train train_file=train.sparse test_file=test.sparse params=param_amazon_linear.txt pred_file=amazon_linear_pred.csv test_target=false verbose=true Threads=1 sparse=true folds=5 seed=1 metric=auc
parameter name : train_file value : train.sparse
parameter name : test_file value : test.sparse
parameter name : params value : param_amazon_linear.txt
parameter name : pred_file value : amazon_linear_pred.csv
parameter name : test_target value : false
parameter name : verbose value : true
parameter name : threads value : 1
parameter name : sparse value : true
parameter name : folds value : 5
parameter name : seed value : 1
parameter name : metric value : auc
a train method needs to have a task which may be regression or classification
After I checked for awhile, it didn't produce any output file. Is there something that I did wrong?
Additional note: I also already produced train.sparse and test.sparse by running prepare_data.py
CatBoost (https://tech.yandex.com/catboost/) is a very powerful LGBM tools for machine learning.
I hope someone can add this tool to the StackNet.
Hi. Thanks for stacknet classifier.
I encountered a exception when I try to add some more features to the kaggle quora problem.
Exception in thread "Thread-23" java.lang.ArrayIndexOutOfBoundsException: 12
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3011)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.fit(DecisionTreeRegressor.java:2382)
at ml.Tree.DecisionTreeRegressor.run(DecisionTreeRegressor.java:483)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-5" java.lang.NullPointerException
at ml.Tree.DecisionTreeRegressor.isfitted(DecisionTreeRegressor.java:3275)
at ml.Tree.scoringhelperv2.<init>(scoringhelperv2.java:107)
at ml.Tree.RandomForestRegressor.predict2d(RandomForestRegressor.java:744)
at ml.Tree.GradientBoostingForestClassifier.fit(GradientBoostingForestClassifier.java:2353)
at ml.Tree.GradientBoostingForestClassifier.run(GradientBoostingForestClassifier.java:382)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "main" java.lang.NullPointerException
at ml.Tree.scoringhelperfv2.<init>(scoringhelperfv2.java:107)
at ml.Tree.GradientBoostingForestClassifier.predict_proba(GradientBoostingForestClassifier.java:603)
at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2438)
at stacknetrun.runstacknet.main(runstacknet.java:385)
Exception in thread "Thread-28783" java.lang.NullPointerException
at ml.Tree.DecisionTreeRegressor.isfitted(DecisionTreeRegressor.java:3275)
at ml.Tree.scoringhelperv2.<init>(scoringhelperv2.java:107)
at ml.Tree.RandomForestRegressor.predictfs(RandomForestRegressor.java:590)
at ml.Tree.scoringhelperfv2.score(scoringhelperfv2.java:149)
at ml.Tree.scoringhelperfv2.run(scoringhelperfv2.java:175)
at java.lang.Thread.run(Thread.java:745)
I used the paramsv1.txt but add more threads to each base classifier.
hi, if i try to solve a multi-label problem, how can i prepare the data file and parameter file?
I see your scripts saving train file with the following python command
np.savetxt(train_file, X, delimiter=",", fmt='%.5f')
I have a couple of questions
I accidentally misspelled UseConstant:true
as UseConstance:true
in my params file. After correcting my mistake I noticed that my CV results got worse. I then changed to UseConstant:false
and I get the same exact result as UseConstant:true
. I then got rid of UseConstant
completely and got the same results as when I misspelled it, as I'd expect. It seems as if the mere presence of UseConstant
determines whether a constant is used and not the actual true/false setting. Is this the expected behavior?
Thanks for sharing your code, and it helped me a lot! I wonder how you can determine the very parameters for each models when you merging them. In the example of Zillow, you chose 0.25 for your own model and 0.75 for the other, and I really want to comprehend the method you used for determining the parameters. I will appreciate it if you can teach me about this~ ^_^
Thanks for this example, I just submitted - I am ranked 78 atm :-)
In your code, the sparse data is treated differently from dense/full data. How is that handled? Are you estimating the missing data before fit or the fit is simply ignoring the missing data?
I tried using the example as is except changing the Bag and Bin counts.. I get the following error.
I have ubuntu 14.04 (trusty)
Java 1.8
Command and error
StackNet$ java -Xmx12048m -jar StackNet.jar train task=regression sparse=true has_head=false output_name=datasettwo model=model2 pred_file=pred2.csv train_file=dataset2_train.txt test_file=dataset2_test.txt test_target=false params=dataset2_params.txt verbose=true threads=1 metric=mae stackdata=false seed=1 folds=5 bins=2
parameter name : task value : regression
parameter name : sparse value : true
parameter name : has_head value : false
parameter name : output_name value : datasettwo
parameter name : model value : model2
parameter name : pred_file value : pred2.csv
parameter name : train_file value : dataset2_train.txt
parameter name : test_file value : dataset2_test.txt
parameter name : test_target value : false
parameter name : params value : dataset2_params.txt
parameter name : verbose value : true
parameter name : threads value : 1
parameter name : metric value : mae
parameter name : stackdata value : false
parameter name : seed value : 1
parameter name : folds value : 5
parameter name : bins value : 2
[4793209, 88528]
Loaded File: dataset2_train.txt
Total rows in the file: 88528
Total columns in the file: undetrmined-Sparse
Number of elements : 4793209
The filedataset2_train.txt was loaded successfully with :
Rows : 88528
Columns (excluding target) : 1
Delimeter was :
Loaded sparse train data with 88528 and columns 58
loaded data in : 10.403000
Binning parameters
[0.005, <=0.005]
[0.4187, <=0.4187]
Level: 1 dimensionality: 12
Starting cross validation
Fitting model : 0
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException: Tree is not fitted
at ml.Bagging.scoringhelperbagv2.(scoringhelperbagv2.java:109)
at ml.Bagging.BaggingRegressor.predict2d(BaggingRegressor.java:669)
at ml.Bagging.BaggingRegressor.predict_proba(BaggingRegressor.java:1875)
at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:3065)
at stacknetrun.runstacknet.main(runstacknet.java:522)
... 5 more
Hi, I was running the code from my computer (Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz and 12 GB RAM). I tried to train with 30k rows and 318 variables. But, it seems strange that there was sudden drop in the computing task that I didn't make any progress in fitting model/continue to next fold (I use 5 fold cross validation, 8 models in first layer and 1 model in second layer. And, I also use all of my 4 threads). Basically, with this data, I had sudden drop after 30 minutes running StackNet. Here is my screenshot from my last work.
I also tried to play with threads command, but nothing changed.
I believe there is a wrong error check in the file SklearnknnClassifier.java
if ( !metric.equals("rbf") && !metric.equals("poly")&& !metric.equals("sigmoid") && !metric.equals("linear") ){
throw new IllegalStateException(" metric has to be between 'rbf', 'poly', 'sigmoid' or 'linear'" );
}
if ( !metric.equals("uniform") && !metric.equals("distance") ){
throw new IllegalStateException(" metric has to be between 'uniform' or 'distance'" );
}
I think the first metric.equals check block is incorrect and you only want the second one. I couldn't get the SklearnknnClassifier to work.
Also, in the docs the parameters section for KerasnnClassifier section has the wrong header (it has the header 'SklearnsvmClassifier'). There are also some typos like dropout being labeled 'droupout' and 'Toral' instead of 'Total.'
Thanks for creating and sharing the StackNet tool.
Hello. Now I tried to use stacknet using csv files. But I got an error like this:
> java -Xmx3048m -jar StackNet.jar train task=classification train_file=train.vm2.csv test_file=test.vm2.csv params=param_amazon_linear.txt pred_file=linear_pred.csv test_target=false verbose=true Threads=4 sparse=true folds=5 seed=1 metric=auc has_head=true
parameter name : task value : classification
parameter name : train_file value : train.vm2.csv
parameter name : test_file value : test.vm2.csv
parameter name : params value : param_amazon_linear.txt
parameter name : pred_file value : linear_pred.csv
parameter name : test_target value : false
parameter name : verbose value : true
parameter name : threads value : 4
parameter name : sparse value : true
parameter name : folds value : 5
parameter name : seed value : 1
parameter name : metric value : auc
parameter name : has_head value : true
[0, 29943]
Exception in thread "main" java.lang.IllegalStateException: File train.vm2.csv failed to import at bufferreader
at io.input.readsmatrixdata(input.java:1327)
at stacknetrun.runstacknet.main(runstacknet.java:425)
Something seems wrong with the dimension provided above. I also check my csv files which are comma delimited as suggested by stacknet (the true dimensions are 29943 rows x 319 columns for train and 318 columns for test).
If I am using 2 layer StackNet for a regression problem. Can I use a regression algorithm in the second layer ? The documentation says
"A train method needs at least a train_file and a params_file. It also needs at least two algorithms, and the and last layer must not contain a regressor unless the metric is auc and the problem is binary"
Also are the prediction order preserved (rows) ? Or will the outputs be shuffled?
Thanks
I my 2 runs, for Zillow, on the same set of data resulted in public score difference of 0.0002. While this is not a huge difference and is likely expected in typical statistical processes, this difference is significant in this competition. More importantly, it makes step-by-step model improvement difficult due to noise.
Any suggestion on making runs more repeatable? I'am willing to accept larger computation time for better repeatability.
Hi,
I encountered an error for running StackNet. Here is the command:
java -Xmx12144m -jar StackNet.jar train train_file='/home/jlu/Experiments/Examples/Instacart/imba/data/nz_train_slim.csv' test_file='/home/jlu/
Experiments/Examples/Instacart/imba/data/all_data_test_V1.csv' has_head=true params='/home/jlu/Experiments/Examples/Instacart/imba/paramsv1.txt' sparse=false pred_file='/home/jlu/Experiments/Exam
ples/Instacart/imba/data/stacknet_pred_V1.csv' test_target=false verbose=true Threads=10 folds=5 seed=1 metric=auc output_name=restack_instacart folds=10 seed=1 task=classification
Here is the error message. What does InvocationTargetException error here imply?
parameter name : train_file value : /home/jlu/experiments/examples/instacart/imba/data/nz_train_slim.csv
parameter name : test_file value : /home/jlu/experiments/examples/instacart/imba/data/all_data_test_v1.csv
parameter name : has_head value : true
parameter name : params value : /home/jlu/experiments/examples/instacart/imba/paramsv1.txt
parameter name : sparse value : false
parameter name : pred_file value : /home/jlu/experiments/examples/instacart/imba/data/stacknet_pred_v1.csv
parameter name : test_target value : false
parameter name : verbose value : true
parameter name : threads value : 10
parameter name : folds value : 5
parameter name : seed value : 1
parameter name : metric value : auc
parameter name : output_name value : restack_instacart
parameter name : folds value : 10
parameter name : seed value : 1
parameter name : task value : classification
Completed: 5.00 %
Completed: 10.00 %
Completed: 15.00 %
Completed: 20.00 %
Completed: 25.00 %
Completed: 30.00 %
Completed: 35.00 %
Completed: 40.00 %
Completed: 45.00 %
Completed: 50.00 %
Completed: 55.00 %
Completed: 60.00 %
Completed: 65.00 %
Completed: 70.00 %
Completed: 75.00 %
Completed: 80.00 %
Completed: 85.00 %
Completed: 90.00 %
Completed: 95.00 %
Completed: 100.00 %
Loaded File: /home/jlu/Experiments/Examples/Instacart/imba/data/nz_train_slim.csv
Total rows in the file: 8474661
Total columns in the file: 78
Weighted variable : -1 counts: 0
Int Id variable : -1 str id: -1 counts: 0
Target Variables : 1 values : [0]
Actual columns number : 77
Number of Skipped rows : 0
Actual Rows (removing the skipped ones) : 8474661
Loaded dense train data with 8474661 and columns 77
loaded data in : 125.971000
Level: 1 dimensionality: 893
Starting cross validation
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.NegativeArraySizeException
at matrix.fsmatrix.<init>(fsmatrix.java:85)
at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2749)
at stacknetrun.runstacknet.main(runstacknet.java:471)
... 5 more
I got the following error when using lightGBM:
Exception in thread "Thread-376" java.lang.IllegalStateException: failed to create LIGHTgbm subprocess with config name ~/models/fjufcncb20qtl7f7ehcpm5b6tn0.conf
at ml.lightgbm.LightgbmRegressor.create_light_suprocess(LightgbmRegressor.java:426)
at ml.lightgbm.LightgbmRegressor.fit(LightgbmRegressor.java:1566)
at ml.lightgbm.LightgbmRegressor.run(LightgbmRegressor.java:514)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException: Tree is not fitted
at ml.Bagging.scoringhelperbagv2.<init>(scoringhelperbagv2.java:95)
at ml.Bagging.BaggingRegressor.predict2d(BaggingRegressor.java:350)
at ml.Bagging.BaggingRegressor.predict_proba(BaggingRegressor.java:1785)
at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:2632)
at stacknetrun.runstacknet.main(runstacknet.java:525)
... 5 more
However, if I run lightgbm with the config file, it seems to be fine.
./lightgbm config=~/models/fjufcncb20qtl7f7ehcpm5b6tn0.conf task=train
[LightGBM] [Warning] Unknown parameter in config file: categorical_feature=
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements.
And I can see some output in the model file after I run with lightgbm only:
tree
num_class=1
num_tree_per_iteration=1
label_index=0
max_feature_idx=18
objective=regression
boost_from_average
feature_names=Column_0 Column_1 Column_2 Column_3 Column_4 Column_5 Column_6 Column_7 Column_8 Column_9 Column_10 Column_11 Column_12 Column_13 Column_14 Column_15 Column_16 Column_17 Column_18
feature_infos=[16801:17165] [0:20] [0:16] [2:22741] [6037:6111] [33339295:34816009] none [1:240] [31:275] [1:5637] [60371011.101001002:61110091.001012005] [1286:3101] [95982:399675] [0:18] [1885:2015] [100:9948100] [1044:27750000] [278:24499999.999999996] [49.079999999999998:321936.09000000003]
Tree=0
num_leaves=2
split_feature=0
split_gain=-1
threshold=0
decision_type=0
default_value=0
left_child=-1
right_child=-2
leaf_parent=0 0
leaf_value=0.01149876969316053 0.01149876969316053
leaf_count=0 0
internal_value=0
internal_count=0
shrinkage=1
has_categorical=0
Here is my config file
boosting=gbdt
objective=regression
learning_rate=0.002
min_sum_hessian_in_leaf=0.001
min_data_in_leaf=20
feature_fraction=0.5
min_gain_to_split=1.0
bagging_fraction=0.9
poission_max_delta_step=0.0
lambda_l1=0.0
lambda_l2=0.0
scale_pos_weight=1.0
max_depth=4
num_threads=10
num_iterations=100
feature_fraction_seed=2
bagging_seed=2
drop_seed=2
data_random_seed=2
num_leaves=60
bagging_freq=1
xgboost_dart_mode=false
drop_rate=0.1
skip_drop=0.5
max_drop=50
top_rate=0.1
other_rate=0.1
huber_delta=0.1
fair_c=0.1
max_bin=255
min_data_in_bin=5
uniform_drop=false
two_round=false
is_unbalance=false
categorical_feature=
bin_construct_sample_cnt=1000000
is_sparse=true
verbosity=0
data=~/models/fjufcncb20qtl7f7ehcpm5b6tn0.train
output_model=~/model/models/fjufcncb20qtl7f7ehcpm5b6tn0.mod
I have tried StackNet example with CMD under Windows, following problem happens. @kaz-Anova or someone else could give me tips how to fix it? Thanks a lot.
C:\Users\User>java -jar StackNet.jar train task=classification sparse=false has_head=false model=model train_file=train_iris.csv test_file=test_iris.csv test_target=true params=params.txt verbose=true threads=4 metric=logloss stackdata=false
parameter name : task value : classification
parameter name : sparse value : false
parameter name : has_head value : false
parameter name : model value : model
parameter name : train_file value : train_iris.csv
parameter name : test_file value : test_iris.csv
parameter name : test_target value : true
parameter name : params value : params.txt
parameter name : verbose value : true
parameter name : threads value : 4
parameter name : metric value : logloss
parameter name : stackdata value : false
Completed: 4.04 %
Completed: 8.08 %
Completed: 12.12 %
Completed: 16.16 %
Completed: 20.20 %
Completed: 24.24 %
Completed: 28.28 %
Completed: 32.32 %
Completed: 36.36 %
Completed: 40.40 %
Completed: 44.44 %
Completed: 48.48 %
Completed: 52.53 %
Completed: 56.57 %
Completed: 60.61 %
Completed: 64.65 %
Completed: 68.69 %
Completed: 72.73 %
Completed: 76.77 %
Completed: 80.81 %
Completed: 84.85 %
Completed: 88.89 %
Completed: 92.93 %
Completed: 96.97 %
Loaded File: train_iris.csv
Total rows in the file: 99
Total columns in the file: 5
Weighted variable : -1 counts: 0
Int Id variable : -1 str id: -1 counts: 0
Target Variables : 1 values : [0]
Actual columns number : 4
Number of Skipped rows : 0
Actual Rows (removing the skipped ones) : 99
Loaded dense train data with 99 and columns 4
loaded data in : 0.100000
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException: File params.txt failed to import at bufferreader params.txt (系统找不到指定的文件。)
at io.input.StackNet_Configuration(input.java:1650)
at stacknetrun.runstacknet.main(runstacknet.java:441)
Is there a reason for removing outliers? Could it throw away valuable information?
this work very smart, support many algorithm and data formate,
in the actual work ,training data may be millions data and millions samples,
can this supoort algorithm for online learning?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.