kaz-anova / stacknet Goto Github PK

View Code? Open in Web Editor NEW

1.3K 1.3K 348.0 81.86 MB

StackNet is a computational, scalable and analytical Meta modelling framework

License: MIT License

Java 96.04% Python 3.96% Shell 0.01%

stacknet's People

Contributors

Stargazers

Watchers

Forkers

mindis parinfuture gubobo scutyjx bjybjy qqgeogor zhuyuying shixw1991 destinyjlu chdd wyxingyux muzhenxv hhh920406 jiongxiao hutaohutc liansy e1ias bayesquant mathlf2015 hj3938 talkwei anuragreddygv323 wanjinchang netivs lstmemery skyjiao hcuny stansilas vkuznet imutlab linjielangdang melody-xiaomi hangtongluo liuxiang6622 withli ryanpmccaffrey yukatherin yychenca ptisseur zhenchentl ehumss andandandand ghellstern sainiudit answer1992 oussamaerra robinseaside geelisready jojo62000 dantodor fulquan jayinai achaiah jintw xwc940512 yeonmin arenahyperlee ambier huminpurin zedyang jz3707 spongebbob fangcaotank drxan jtchaoren hades210 sandy4321 zgsxwsdxg ambicapramod fdoperezi tongli12 wondersimiliar linlinchn xuelun yyljlyy ywl0911 mathematixy olveirap sbarman-mi9 pjpan gdtm86 hengqujushi vojkog sagashin dreadlord1984 bluetyson huangpeng1126 tinkleing hackthings kartikvega jadielam ploxoy worgenzhang jq arhontmw dayeren achoczaj jayvae hedgefair wenyanghan

stacknet's Issues

How to set gpu computing for some of the algorithms?

I have tried some of the algorithms like svm and knn but it seems too slow to run on my computer.
Are there any ways to make those algorithms running faster?

Is this message in verbose output significant?

I am seeing this in output

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished loading 1001 models
[LightGBM] [Info] Finished initializing prediction
[LightGBM] [Info] Data file /Users/.../dev/snexample/models/9ogh6b9bl344q13keqvq3k7bco.test doesn't contain a label column
[LightGBM] [Info] Finished prediction

Is the missing lable column significant?

Error: Could not find or load main class –jar

Hi,
I want to try your package, but I can't run it on Linux:

java -jar StackNet.jar train sparse=false has_head=false model=model pred_file=pred.csv train_file=train.csv params=stakcnet.txt verbose=true threads=7 metric=logloss stackdata=false seed=1 folds=5

it gives immediately the following error:

Error: Could not find or load main class –jar

It seems some namespace/classpath issue, could you please either provide instruction how to build jar or specify the namespace/classpath.

Thanks,
Valentin.

Getting around java.lang.OutOfMemoryError

I am trying zillow model with a different dataset. But I keep running out of memory even with -Xmx128g setting

[2.0, <=2.0]
[7.0, <=7.0]
[44142.0, <=44142.0]
 Level: 1 dimensionality: 11
 Starting cross validation
Fitting model: 1
Fitting model: 2
Fitting model: 3
Fitting model: 4
Fitting model: 5
Fitting model: 6
Fitting model: 7
Fitting model: 8
Exception in thread "Thread-1" Exception in thread "Thread-3" Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
	at manipulate.copies.copies.Copy(copies.java:183)
	at matrix.fsmatrix.GetData(fsmatrix.java:220)
	at matrix.fsmatrix.Copy(fsmatrix.java:388)
	at ml.Bagging.BaggingRegressor.fit(BaggingRegressor.java:1443)
	at ml.Bagging.BaggingRegressor.run(BaggingRegressor.java:241)
	at java.lang.Thread.run(Thread.java:745)

I tried deleting model 8 which is GradientBoostingForestRegressor but still the same result - I guess the next model still ran out of memory.

Any suggestions?

Unknown error

Any another clue where the error and how to fix it ?

How to tune a single model?

Hi Marios,

Thanks for sharing Stacknet, great tool for stacking method, but still i'm not clear how to tune a single model, for example, if my paramter file is as following:

_XgboostRegressor booster:gblinear objective:reg:linear max_leaves:0 num_round:500 eta:0.1 threads:3 gamma:1 max_depth:4 colsample_bylevel:1.0 min_child_weight:4.0 max_delta_step:0.0 subsample:0.8 colsample_bytree:0.5 scale_pos_weight:1.0 alpha:10.0 lambda:1.0 seed:1 verbose:false

LightgbmRegressor verbose:false_

What should i put in the command line? Is it same as this one?

java -Xmx12048m -jar StackNet.jar train task=regression sparse=true has_head=false output_name=datasettwo model=model2 pred_file=pred2.csv train_file=dataset2_train.txt test_file=dataset2_test.txt test_target=false params=dataset2_params.txt verbose=true threads=1 metric=mae stackdata=false seed=1 folds=4 bins=3

Thank you!

issue with regression prediction?

when I have the test_file specified in train task it works but when run predict separately with task=regression I get:

Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.ClassCastException: ml.stacknet.StackNetRegressor cannot be cast to ml.stacknet.StackNetClassifier
at stacknetrun.runstacknet.main(runstacknet.java:775)
... 5 more

ArrayIndexOutOfBoundsException & java.lang.NullPointerException

I encountered this problem with my own features in the Kaggle of Quora、

Exception in thread "Thread-6" java.lang.ArrayIndexOutOfBoundsException: 750
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3002)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
at ml.Tree.DecisionTreeRegressor.fit(DecisionTreeRegressor.java:2382)
at ml.Tree.DecisionTreeRegressor.run(DecisionTreeRegressor.java:483)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-3" java.lang.NullPointerException
at ml.Tree.DecisionTreeRegressor.isfitted(DecisionTreeRegressor.java:3275)
at ml.Tree.scoringhelperv2.(scoringhelperv2.java:107)
at ml.Tree.RandomForestRegressor.predict2d(RandomForestRegressor.java:744)
at ml.Tree.GradientBoostingForestClassifier.fit(GradientBoostingForestClassifier.java:2353)
at ml.Tree.GradientBoostingForestClassifier.run(GradientBoostingForestClassifier.java:382)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-1112" java.lang.ArrayIndexOutOfBoundsException: 983
at ml.Tree.DecisionTreeClassifier.expand_node(DecisionTreeClassifier.java:3185)
at ml.Tree.DecisionTreeClassifier.expand_node(DecisionTreeClassifier.java:3225)
at ml.Tree.DecisionTreeClassifier.expand_node(DecisionTreeClassifier.java:3225)
at ml.Tree.DecisionTreeClassifier.expand_node(DecisionTreeClassifier.java:3221)
at ml.Tree.DecisionTreeClassifier.expand_node(DecisionTreeClassifier.java:3225)
at ml.Tree.DecisionTreeClassifier.fit(DecisionTreeClassifier.java:2576)
at ml.Tree.DecisionTreeClassifier.run(DecisionTreeClassifier.java:537)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-1137" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-1617" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-1642" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2075" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2284" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2365" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2382" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2463" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2608" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2721" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2802" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2883" java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-2996" java.lang.ArrayIndexOutOfBoundsException
logloss : 0.257352525104832
logloss : 2.4462523363834303
Exception in thread "main" java.lang.NullPointerException
at ml.Tree.DecisionTreeClassifier.isfitted(DecisionTreeClassifier.java:3458)
at ml.Tree.scoringhelpercatv2.(scoringhelpercatv2.java:107)
at ml.Tree.RandomForestClassifier.predict_proba(RandomForestClassifier.java:795)
at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:3532)
at stacknetrun.runstacknet.main(runstacknet.java:437)

I am sure that there is no zero like 0.000000 in the sparse file and all the data is in order。
what's more, I have 489 features. Is it too much?
/(ㄒoㄒ)/~~

java.lang.IllegalStateException: Tree is not fitted

I am getting this error on Mac. The console output is

$ java -Xmx12048m -jar StackNet.jar train task=regression sparse=true has_head=false output_name=datasettwo model=model2 pred_file=pred2.csv train_file=dataset2_train.txt test_file=dataset2_test.txt test_target=false params=dataset2_params.txt verbose=true threads=1 metric=mae stackdata=false seed=1 folds=4 bins=3
parameter name : task value :  regression
parameter name : sparse value :  true
parameter name : has_head value :  false
parameter name : output_name value :  datasettwo
parameter name : model value :  model2
parameter name : pred_file value :  pred2.csv
parameter name : train_file value :  dataset2_train.txt
parameter name : test_file value :  dataset2_test.txt
parameter name : test_target value :  false
parameter name : params value :  dataset2_params.txt
parameter name : verbose value :  true
parameter name : threads value :  1
parameter name : metric value :  mae
parameter name : stackdata value :  false
parameter name : seed value :  1
parameter name : folds value :  4
parameter name : bins value :  3
[4793209, 88528]
 Loaded File: dataset2_train.txt
 Total rows in the file: 88528
 Total columns in the file: undetrmined-Sparse
 Number of elements : 4793209
 The filedataset2_train.txt was loaded successfully with :
 Rows : 88528
 Columns (excluding target) : 1
 Delimeter was  :
Loaded sparse train data with 88528 and columns 58
 loaded data in : 6.587000
 Binning parameters
[-0.0131, <=-0.0131]
[0.0247, <=0.0247]
[0.4187, <=0.4187]
 Level: 1 dimensionality: 12
 Starting cross validation
Fitting model : 0
 mae : 0.0534568661948753
Fitting model : 1
 mae : 0.0531896123303922
Fitting model : 2
 mae : 0.053078546424724905
Fitting model : 3
 mae : 0.053564587280493355
Fitting model : 4
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException:  Tree is not fitted
	at ml.Bagging.scoringhelperbagv2.<init>(scoringhelperbagv2.java:109)
	at ml.Bagging.BaggingRegressor.predict2d(BaggingRegressor.java:669)
	at ml.Bagging.BaggingRegressor.predict_proba(BaggingRegressor.java:1875)
	at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:3065)
	at stacknetrun.runstacknet.main(runstacknet.java:522)
	... 5 more

Python version 3.6.1
Java version: 1.8.0_121

Any suggestions on how to get around this issue?

Error Caused by java.lang.OutOfMemoryError: Java heap space

After building 2 layers of stacking by:
java -Xmx12048m -jar StackNet.jar train task=regression sparse=true has_head=false output_name=datasettwo model=model2 pred_file=pred2.csv train_file=dataset2_train.txt test_file=dataset2_test.txt test_target=false params=dataset2_params.txt verbose=true threads=1 metric=mae stackdata=false seed=1 folds=4 bins=3

The following error was raised:

Loaded File: dataset2_test.txt Total rows in the file: 2985217 Total columns in the file: undetrmined-Sparse Number of elements : 386731415 Loaded sparse test data with 2985217 and columns 170 loading test data lasted : 613.503000 Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.OutOfMemoryError: Java heap space at utilis.map.intint.IntIntMapminus4a.<init>(IntIntMapminus4a.java:47) at matrix.smatrix.buildmap(smatrix.java:76) at ml.stacknet.StackNetRegressor.predict2d(StackNetRegressor.java:1496) at ml.stacknet.StackNetRegressor.predict_proba(StackNetRegressor.java:3942) at stacknetrun.runstacknet.main(runstacknet.java:691) ... 5 more

Any advice ? Thanks!

How to interpret the StackNet program output?

I wonder whether the output can give some hint about whether I select the right base models.

I saw something like:

 Total ranks after target inclusion: 212409
 Gain from before : 152317
 percentage of unique ranks versus elements size: 13.892484314388511%
 Gain of percentage of unique ranks versus elements size: 9.962202794207%

What does it mean? Are they for monitoring?

Thanks

The last layer of StackNet cannot have a classifier

Hi !

I'm having some troubles trying to make StackNet work. I have a very humble PC( reason why I dont want to test the examples included, they will probably kill my potato-PC ) and i would like to try some small models, in order to get a feeling of the software and how to use it :)

So for example I setup a parameters file with the next contents

LogisticRegression Type:Liblinear C:0.8 threads:1 usescale:True maxim_Iteration:100 seed:1 verbose:false
LogisticRegression Type:Liblinear C:0.5 RegularizationType:L1 threads:1 usescale:True maxim_Iteration:20 seed:1 verbose:false

RandomForestClassifier verbose:false

I run the java -jar etc etc and it seems to be running, but after a while i get this error

Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: exceptions.IllegalStateException: The last layer of StackNet cannot have a classifier
at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:2516)
at stacknetrun.runstacknet.main(runstacknet.java:524)

So, i what to know what is causing this error and how to fix it.

Thanks in advance,
This is a really interesting piece of software, amazing job ! :)

What is a typical way to use StackNet?

I am a Kaggler and try to improve myself. :). Thanks for the great tool!

I saw you have cases combining two StackNets. So, I am wondering the typical strategy to use StackNet. After some data cleaning and feature engineering, then run StackNet. How to do model diagnosis with the result of StackNet? How to gradually improve the final model?

Thanks,
Jing

using different feature subsets for individual models

Is it possible to use different feature subsets for individual models? If yes, how would it can be configured?

C:\Users\User\Stacknet\models\rjpt72fbrn75uj59unhgkvecdf0.pred(sysmtem can not find files)

Hello @kaz-Anova , have you ever met this issue (system can not find files for several models, which i guess maybe caused vide value of several columns for datasettwo_test files) when you run program? any ideas how to deal with this? thanks.

output of output_name should be in .txt format for ease of sequential process

Would it be better if the output of output_name is in .txt format with target variable appended. it can be processed right into the next level without additional conversion required.

java.lang.IllegalStateException: The produced score in temporary -file- is not of correct size

I'm trying to train a StackNet using sparse data. The problem is a classification problem with 9 possible categories. I had my training file in sparse format like this :

0 2:1 6:1 13:1 17:1 22:1 23:1 30:1 42:1 47:1 59:1 67:1 71:1 72:1 84:1 86:1
1 2:1 17:1 22:1 42:1 43:1 45:1 47:1 57:1 59:1 67:1 70:1 72:1 86:1 88:1 99:1
etc etc

And in the parameters file I have a list of classifiers, when i start the training, It gives an error after some time.

Fitting model : 9
( this model is a LightgbmClassifier )

Exception in thread "Thread-11935" java.lang.IllegalStateException:  The produced score in temporary file /home/andresh/data-science/StackNet/models/nlr9tp06r037rshv48jde0rupi.pred  is not of correct size
	at ml.lightgbm.LightgbmClassifier.predict_proba(LightgbmClassifier.java:806)
	at ml.Bagging.scoringhelpercatbagv2.score(scoringhelpercatbagv2.java:158)
	at ml.Bagging.scoringhelpercatbagv2.run(scoringhelpercatbagv2.java:188)
	at java.lang.Thread.run(Thread.java:745)

and the process doesn't stop, it keep training and even finishes the training. But I'm concerned about what this means and how is affecting training.

In other experiments, the process freezes, when trying to fit the models in the next fold.

Thanks, and any help is really appreciated :)

predict with StackNet

When I was trying to predict on a new file with the trained model. I specified the model directory by "model=models". However, StackNet failed to load models. Does that mean I need to rerun StackNet with the new pred_file?

Apply changes and generate jar file after .java files was edited

Tree is not fitted when using KerasnnRegressor

When I use KerasnnRegressor on Mac , it gives me the error:

Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException: Tree is not fitted
at ml.Bagging.scoringhelperbagv2.(scoringhelperbagv2.java:109)
at ml.Bagging.BaggingRegressor.predict2d(BaggingRegressor.java:669)
at ml.Bagging.BaggingRegressor.predict_proba(BaggingRegressor.java:1875)
at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:3065)
at stacknetrun.runstacknet.main(runstacknet.java:522)
... 5 more

Can you help me with that? Thanks!

I used:

KerasnnRegressor loss:mean_absolute_error standardize:true use_log1p:true shuffle:true batch_normalization:true weight_init:lecun_uniform momentum:0.9 optimizer:adam use_dense:true l2:0.000001,0.000001 hidden:50,50 activation:relu,relu droupouts:0.1,0.1 epochs:20 lr:0.01 batch_size:8 stopping_rounds:10 validation_split:0.2 seed:1 verbose:false

No test in src

What's the reason that no test was included in the source?

Issue when calling XGBOOSTRegressor

Hi @kaz-Anova,

Encounter the following issue when calling XGBRegressor during prediction stage for Zillow Competition and default parameters were used according to the tutorial, appreciate it if you could look into the issue and help out on this.

C:\Users\meipaopao\StackNet\models\n6ut4jf5f4n3e5hjdl51n1p0ji0.pred (The system cannot find the file specified)
Exception in thread "Thread-46536" java.lang.NegativeArraySizeException
at io.input.Retrievecolumn(input.java:1455)
at ml.xgboost.XgboostRegressor.predict2d(XgboostRegressor.java:638)
at ml.Bagging.scoringhelperbagv2.score(scoringhelperbagv2.java:158)
at ml.Bagging.scoringhelperbagv2.run(scoringhelperbagv2.java:185)
at java.lang.Thread.run(Unknown Source)
C:\Users\meipaopao\StackNet\models\doh9b4mmpej0cem7eik7q170h20.pred (The system cannot find the file specified)
Exception in thread "Thread-46537" java.lang.NegativeArraySizeException
at io.input.Retrievecolumn(input.java:1455)
at ml.xgboost.XgboostRegressor.predict2d(XgboostRegressor.java:638)
at ml.Bagging.scoringhelperbagv2.score(scoringhelperbagv2.java:158)
at ml.Bagging.scoringhelperbagv2.run(scoringhelperbagv2.java:185)
at java.lang.Thread.run(Unknown Source)

Thank you very much!
Andy

How to deal with the 'model' folder and 'model2' file everytime we run

I follow the steps for Zillow sample. Every time when I tuned the parameters it seems file in model folder changed. Do I have to delete the file or just keep them. Does this affect the results?

Question: How to tune level 2 or level 3 model?

I know how to tune the model in level1, just put that model to first line in param.txt and rerun it once k-fold is finished.

However, i don't know how to tune the model at level2, do any one know ?

Error when trying step 6 of the example

Hi, I just run the file make_stacknet_data.py
It shows the error as following:
Traceback (most recent call last):
File "D:/J/stacknet/make_stacknet_data.py", line 113, in
main()
File "D:/J/stacknet/make_stacknet_data.py", line 104, in main
dataset2()
File "D:/J/stacknet/make_stacknet_data.py", line 93, in dataset2
fromsparsetofile("dataset2_train.txt", x_train, deli1=" ", deli2=":",ytarget=y_train)
File "D:/J/stacknet/make_stacknet_data.py", line 25, in fromsparsetofile
if ytarget!=None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I don't understand how it comes.

support comment symbol in parameter file for better experience

ATT

Can't run lightgbm with config file

Hello,
Thank you for providing an example with the zillow competition. I tried running the example but came with exactly the same problem as here: #16

I can't run lgbm and I have exactly the same output error:

Starting cross validation
Fitting model : 0
Exception in thread "Thread-1" java.lang.IllegalStateException: failed to create LIGHTgbm subprocess with config name /Users/hadoop/StackNet/models/ucsbmggugdanr6qc19a64a7sv10.conf
at ml.lightgbm.LightgbmRegressor.create_light_suprocess(LightgbmRegressor.java:426)
at ml.lightgbm.LightgbmRegressor.fit(LightgbmRegressor.java:1885)
at ml.lightgbm.LightgbmRegressor.run(LightgbmRegressor.java:516)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException: Tree is not fitted
at ml.Bagging.scoringhelperbagv2.(scoringhelperbagv2.java:109)
at ml.Bagging.BaggingRegressor.predict2d(BaggingRegressor.java:669)
at ml.Bagging.BaggingRegressor.predict_proba(BaggingRegressor.java:1875)
at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:3065)
at stacknetrun.runstacknet.main(runstacknet.java:522)
... 5 more

I tried to run lightgbm by itself by running it with the config file:

./lightgbm config=~ ./models/ucsbmggugdanr6qc19a64a7sv10.conf task=train

But I receive a 'Permission denied' error. I tried running the same command with sudo but I get

sudo: ./lightgbm: command not found

I checked and the jar file is in the same folder as the lib folder. Here is a screenshot of how my Stacknet folder is organized:

Thank you for your help.

Cannot build cause missing water and hex

I am guessing they are the h2o components.

I cannot compile the src cause of them,
Can you include instructions on your setup so that I can compile StackNet with these components?

What if I want to use xgboost as first layer model?

If I want to use xgboost as first layer model, what should I do?

Questions: Why stacknet performance bad in Zillow Competition

Hi dear stacknet developers and the author Mr.@kaz-Anova

I am new to kaggle, and i found stacknet a COOL tool to use. However it is not as GOOD as i expected using the default parameter.... I tried tuning/adding features.. but none of them give me any improvment yet. So that's why i post this passage, hopefully someone can guide me how to tune a BETTER stacknet (paramaters, folds, regressors, etc) for Zillow Competition.

As far as the doc said:
https://github.com/kaz-Anova/StackNet/tree/master/example/zillow_regression_sparse

the performance of StackeNet alone ONLY achieve around 0.0647 on LB, which is not good, comparing with other kernels. Some kernels can achieve a high score with a single model (lightgbm alone up to 0.0644). And there are also Kernels that use a 2-layer traditional-way stacking that achieve 0.0645 (https://www.kaggle.com/wangsg/ensemble-stacking-lb-0-644/comments), which is much better than stacknet as far as the LeaderBoard concerned.

So my question is that, why stacknet is not working very well on LB in Zillow competition?

1.Is it the problem of default parameter or regressors?

If so, could @kaz-Anova please help update the paramater and regressors in the example so that the performance can gets better? (i tried a lot, but not imporved) . If so, more people (especially freshman like me) will be more happy to use StackNet.

2. Is it a problem of reusable K-fold metrics?

( As far as i tried, my 5-fold LightGBM average works much worse than my 1-time Lightgbm.)
In my opinion, the Zillow LeaderBoard test data is evaluated on 2016.10 ~ 2016.12 . However, the data between 2016.10~2016.12 are very few in the training set. So , a K-FOLD may be is a bad ways to do the competition.
If so, would it be possible that StackedNet will ** in the future support a DIFFERENT out-of-bag metrics, not just KFOLD**, so that more flexible blending(i.e. devided data into two parts. Then only use history data to train, predict on future data, and use the future data to do stacking) or sliding window algorithm would be supported( you know, especially for time-related problem, sometimes it is bad to leak the future into the past with K-fold or reusable K-fold)

3.Some other questions about the Zillow Competition using StackNet.

to Mr.@kaz-Anova: Sincerely congrats on the high score that you and your team achieve. However, considering the bad performance of Stacknet now, I am very curious if you are still using the StackNet in the competition as a strong predictor (instead as a weak model for averaging..)

Apologize for my rudeness (if that is the case) and surely I know that one can achieve better LB scoring just by combing with other kernels... .But my question is, the baseline of Stackenet now is so far away from other kernels. Are there any practical methods(or tricks) that you would like to share in order to make stacknet works better?

p.s. I am now a fan of stacknet, i want to express gratitude to Mr.@kaz-Anova for the converience that powerful stacknet brings us. I wish it could be even BETTER in the future.

Sincerely

Issue getting data into .libsvm format from R

What is the best way to go about getting a dataset into .libsvm format? Trying to work on the zillow dataset on kaggle, and have done the feature manipulation I want to do in R, but now I want to get the datsset into .libsvm format so I can try StackNet out for the first time, but I have no idea how to get the data into this format, and no solution from googling is working either.

How do I use R, or something else to get the data into the desired format?

Thanks

Error while replicating example

Hi, I tried to replicate one of the example that provided in this repo. In this case, I tried to replicate the Amazon one. I ran the code using param_amazon_linear like the one documented in that example, but all I got was this:

> java -Xmx3048m -jar StackNet.jar train train_file=train.sparse test_file=test.sparse params=param_amazon_linear.txt pred_file=amazon_linear_pred.csv test_target=false verbose=true Threads=1 sparse=true folds=5 seed=1 metric=auc
parameter name : train_file value :  train.sparse
parameter name : test_file value :  test.sparse
parameter name : params value :  param_amazon_linear.txt
parameter name : pred_file value :  amazon_linear_pred.csv
parameter name : test_target value :  false
parameter name : verbose value :  true
parameter name : threads value :  1
parameter name : sparse value :  true
parameter name : folds value :  5
parameter name : seed value :  1
parameter name : metric value :  auc
a train method needs to have a task which may be regression or classification

After I checked for awhile, it didn't produce any output file. Is there something that I did wrong?

Additional note: I also already produced train.sparse and test.sparse by running prepare_data.py

CatBoost support

CatBoost (https://tech.yandex.com/catboost/) is a very powerful LGBM tools for machine learning.

I hope someone can add this tool to the StackNet.

ArrayIndexOutOfBounds when using GradientBoostingForestClassifier

Hi. Thanks for stacknet classifier.
I encountered a exception when I try to add some more features to the kaggle quora problem.

Exception in thread "Thread-23" java.lang.ArrayIndexOutOfBoundsException: 12
        at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3011)
        at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
        at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
        at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
        at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
        at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
        at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
        at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
        at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
        at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
        at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
        at ml.Tree.DecisionTreeRegressor.fit(DecisionTreeRegressor.java:2382)
        at ml.Tree.DecisionTreeRegressor.run(DecisionTreeRegressor.java:483)
        at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-5" java.lang.NullPointerException
        at ml.Tree.DecisionTreeRegressor.isfitted(DecisionTreeRegressor.java:3275)
        at ml.Tree.scoringhelperv2.<init>(scoringhelperv2.java:107)
        at ml.Tree.RandomForestRegressor.predict2d(RandomForestRegressor.java:744)
        at ml.Tree.GradientBoostingForestClassifier.fit(GradientBoostingForestClassifier.java:2353)
        at ml.Tree.GradientBoostingForestClassifier.run(GradientBoostingForestClassifier.java:382)
        at java.lang.Thread.run(Thread.java:745)


Exception in thread "main" java.lang.NullPointerException
        at ml.Tree.scoringhelperfv2.<init>(scoringhelperfv2.java:107)
        at ml.Tree.GradientBoostingForestClassifier.predict_proba(GradientBoostingForestClassifier.java:603)
        at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2438)
        at stacknetrun.runstacknet.main(runstacknet.java:385)
Exception in thread "Thread-28783" java.lang.NullPointerException
        at ml.Tree.DecisionTreeRegressor.isfitted(DecisionTreeRegressor.java:3275)
        at ml.Tree.scoringhelperv2.<init>(scoringhelperv2.java:107)
        at ml.Tree.RandomForestRegressor.predictfs(RandomForestRegressor.java:590)
        at ml.Tree.scoringhelperfv2.score(scoringhelperfv2.java:149)
        at ml.Tree.scoringhelperfv2.run(scoringhelperfv2.java:175)
        at java.lang.Thread.run(Thread.java:745)

I used the paramsv1.txt but add more threads to each base classifier.

multi-label problem

hi, if i try to solve a multi-label problem, how can i prepare the data file and parameter file?

Train file format?

I see your scripts saving train file with the following python command

        np.savetxt(train_file, X, delimiter=",", fmt='%.5f')

I have a couple of questions

Does it accept any CSV format?
Is the X_target the first column of this file?
Any other things I should be paying attention to when creating input files?

LibFMClassifier: Misspelling UseConstant gives different CV results

I accidentally misspelled UseConstant:true as UseConstance:true in my params file. After correcting my mistake I noticed that my CV results got worse. I then changed to UseConstant:false and I get the same exact result as UseConstant:true. I then got rid of UseConstant completely and got the same results as when I misspelled it, as I'd expect. It seems as if the mere presence of UseConstant determines whether a constant is used and not the actual true/false setting. Is this the expected behavior?

Any suggestion on merging your StackNet result with results from others on Zillow?

Thanks for sharing your code, and it helped me a lot! I wonder how you can determine the very parameters for each models when you merging them. In the example of Zillow, you chose 0.25 for your own model and 0.75 for the other, and I really want to comprehend the method you used for determining the parameters. I will appreciate it if you can teach me about this~ ^_^

How does your treatment of sparse data differ from full data?

Thanks for this example, I just submitted - I am ranked 78 atm :-)

In your code, the sparse data is treated differently from dense/full data. How is that handled? Are you estimating the missing data before fit or the fit is simply ignoring the missing data?

Error- no changes made to Zillo Example

I tried using the example as is except changing the Bag and Bin counts.. I get the following error.
I have ubuntu 14.04 (trusty)
Java 1.8

Command and error
StackNet$ java -Xmx12048m -jar StackNet.jar train task=regression sparse=true has_head=false output_name=datasettwo model=model2 pred_file=pred2.csv train_file=dataset2_train.txt test_file=dataset2_test.txt test_target=false params=dataset2_params.txt verbose=true threads=1 metric=mae stackdata=false seed=1 folds=5 bins=2
parameter name : task value : regression
parameter name : sparse value : true
parameter name : has_head value : false
parameter name : output_name value : datasettwo
parameter name : model value : model2
parameter name : pred_file value : pred2.csv
parameter name : train_file value : dataset2_train.txt
parameter name : test_file value : dataset2_test.txt
parameter name : test_target value : false
parameter name : params value : dataset2_params.txt
parameter name : verbose value : true
parameter name : threads value : 1
parameter name : metric value : mae
parameter name : stackdata value : false
parameter name : seed value : 1
parameter name : folds value : 5
parameter name : bins value : 2
[4793209, 88528]
Loaded File: dataset2_train.txt
Total rows in the file: 88528
Total columns in the file: undetrmined-Sparse
Number of elements : 4793209
The filedataset2_train.txt was loaded successfully with :
Rows : 88528
Columns (excluding target) : 1
Delimeter was :
Loaded sparse train data with 88528 and columns 58
loaded data in : 10.403000
Binning parameters
[0.005, <=0.005]
[0.4187, <=0.4187]
Level: 1 dimensionality: 12
Starting cross validation
Fitting model : 0
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException: Tree is not fitted
at ml.Bagging.scoringhelperbagv2.(scoringhelperbagv2.java:109)
at ml.Bagging.BaggingRegressor.predict2d(BaggingRegressor.java:669)
at ml.Bagging.BaggingRegressor.predict_proba(BaggingRegressor.java:1875)
at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:3065)
at stacknetrun.runstacknet.main(runstacknet.java:522)
... 5 more

Sudden drop in computing performance?

Hi, I was running the code from my computer (Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz and 12 GB RAM). I tried to train with 30k rows and 318 variables. But, it seems strange that there was sudden drop in the computing task that I didn't make any progress in fitting model/continue to next fold (I use 5 fold cross validation, 8 models in first layer and 1 model in second layer. And, I also use all of my 4 threads). Basically, with this data, I had sudden drop after 30 minutes running StackNet. Here is my screenshot from my last work.

I also tried to play with threads command, but nothing changed.

Error with SklearnknnClassifier 'metric has to be between 'rbf', 'poly', 'sigmoid' or 'linear''

I believe there is a wrong error check in the file SklearnknnClassifier.java

		if ( !metric.equals("rbf")  && !metric.equals("poly")&& !metric.equals("sigmoid") && !metric.equals("linear") ){
			throw new IllegalStateException(" metric has to be between 'rbf', 'poly', 'sigmoid' or 'linear'" );	
		}

		if ( !metric.equals("uniform")  && !metric.equals("distance") ){
			throw new IllegalStateException(" metric has to be between 'uniform' or 'distance'" );	
}

I think the first metric.equals check block is incorrect and you only want the second one. I couldn't get the SklearnknnClassifier to work.

Also, in the docs the parameters section for KerasnnClassifier section has the wrong header (it has the header 'SklearnsvmClassifier'). There are also some typos like dropout being labeled 'droupout' and 'Toral' instead of 'Total.'

Thanks for creating and sharing the StackNet tool.

Failed to read csv files

Hello. Now I tried to use stacknet using csv files. But I got an error like this:

> java -Xmx3048m -jar StackNet.jar train task=classification train_file=train.vm2.csv test_file=test.vm2.csv params=param_amazon_linear.txt pred_file=linear_pred.csv test_target=false verbose=true Threads=4 sparse=true folds=5 seed=1 metric=auc has_head=true
parameter name : task value :  classification
parameter name : train_file value :  train.vm2.csv
parameter name : test_file value :  test.vm2.csv
parameter name : params value :  param_amazon_linear.txt
parameter name : pred_file value :  linear_pred.csv
parameter name : test_target value :  false
parameter name : verbose value :  true
parameter name : threads value :  4
parameter name : sparse value :  true
parameter name : folds value :  5
parameter name : seed value :  1
parameter name : metric value :  auc
parameter name : has_head value :  true
[0, 29943]
Exception in thread "main" java.lang.IllegalStateException: File train.vm2.csv  failed to import at bufferreader
        at io.input.readsmatrixdata(input.java:1327)
        at stacknetrun.runstacknet.main(runstacknet.java:425)

Something seems wrong with the dimension provided above. I also check my csv files which are comma delimited as suggested by stacknet (the true dimensions are 29943 rows x 319 columns for train and 318 columns for test).

StackNet usage quesiton

If I am using 2 layer StackNet for a regression problem. Can I use a regression algorithm in the second layer ? The documentation says
"A train method needs at least a train_file and a params_file. It also needs at least two algorithms, and the and last layer must not contain a regressor unless the metric is auc and the problem is binary"
Also are the prediction order preserved (rows) ? Or will the outputs be shuffled?

Thanks

Any suggestion on making runs result in more repeatable scores?

I my 2 runs, for Zillow, on the same set of data resulted in public score difference of 0.0002. While this is not a huge difference and is likely expected in typical statistical processes, this difference is significant in this competition. More importantly, it makes step-by-step model improvement difficult due to noise.

Any suggestion on making runs more repeatable? I'am willing to accept larger computation time for better repeatability.

InvocationTargetException error

Hi,

I encountered an error for running StackNet. Here is the command:

java -Xmx12144m -jar StackNet.jar train train_file='/home/jlu/Experiments/Examples/Instacart/imba/data/nz_train_slim.csv' test_file='/home/jlu/
Experiments/Examples/Instacart/imba/data/all_data_test_V1.csv' has_head=true params='/home/jlu/Experiments/Examples/Instacart/imba/paramsv1.txt' sparse=false pred_file='/home/jlu/Experiments/Exam
ples/Instacart/imba/data/stacknet_pred_V1.csv' test_target=false verbose=true Threads=10 folds=5 seed=1 metric=auc output_name=restack_instacart folds=10 seed=1 task=classification

Here is the error message. What does InvocationTargetException error here imply?

parameter name : train_file value :  /home/jlu/experiments/examples/instacart/imba/data/nz_train_slim.csv
parameter name : test_file value :  /home/jlu/experiments/examples/instacart/imba/data/all_data_test_v1.csv
parameter name : has_head value :  true
parameter name : params value :  /home/jlu/experiments/examples/instacart/imba/paramsv1.txt
parameter name : sparse value :  false
parameter name : pred_file value :  /home/jlu/experiments/examples/instacart/imba/data/stacknet_pred_v1.csv
parameter name : test_target value :  false
parameter name : verbose value :  true
parameter name : threads value :  10
parameter name : folds value :  5
parameter name : seed value :  1
parameter name : metric value :  auc
parameter name : output_name value :  restack_instacart
parameter name : folds value :  10
parameter name : seed value :  1
parameter name : task value :  classification
 Completed: 5.00 %
 Completed: 10.00 %
 Completed: 15.00 %
 Completed: 20.00 %
 Completed: 25.00 %
 Completed: 30.00 %
 Completed: 35.00 %
 Completed: 40.00 %
 Completed: 45.00 %
 Completed: 50.00 %
 Completed: 55.00 %
 Completed: 60.00 %
 Completed: 65.00 %
 Completed: 70.00 %
 Completed: 75.00 %
 Completed: 80.00 %
 Completed: 85.00 %
 Completed: 90.00 %
 Completed: 95.00 %
 Completed: 100.00 %
 Loaded File: /home/jlu/Experiments/Examples/Instacart/imba/data/nz_train_slim.csv
 Total rows in the file: 8474661
 Total columns in the file: 78
 Weighted variable : -1 counts: 0
 Int Id variable : -1 str id: -1 counts: 0
 Target Variables  : 1 values : [0]
 Actual columns number  : 77
 Number of Skipped rows   : 0
 Actual Rows (removing the skipped ones)  : 8474661
Loaded dense train data with 8474661 and columns 77
 loaded data in : 125.971000
 Level: 1 dimensionality: 893
 Starting cross validation
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.NegativeArraySizeException
        at matrix.fsmatrix.<init>(fsmatrix.java:85)
        at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2749)
        at stacknetrun.runstacknet.main(runstacknet.java:471)
        ... 5 more

LightGBM error

I got the following error when using lightGBM:

Exception in thread "Thread-376" java.lang.IllegalStateException:  failed to create LIGHTgbm subprocess with config name ~/models/fjufcncb20qtl7f7ehcpm5b6tn0.conf
        at ml.lightgbm.LightgbmRegressor.create_light_suprocess(LightgbmRegressor.java:426)
        at ml.lightgbm.LightgbmRegressor.fit(LightgbmRegressor.java:1566)
        at ml.lightgbm.LightgbmRegressor.run(LightgbmRegressor.java:514)
        at java.lang.Thread.run(Thread.java:745)
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException:  Tree is not fitted
        at ml.Bagging.scoringhelperbagv2.<init>(scoringhelperbagv2.java:95)
        at ml.Bagging.BaggingRegressor.predict2d(BaggingRegressor.java:350)
        at ml.Bagging.BaggingRegressor.predict_proba(BaggingRegressor.java:1785)
        at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:2632)
        at stacknetrun.runstacknet.main(runstacknet.java:525)
        ... 5 more

However, if I run lightgbm with the config file, it seems to be fine.

./lightgbm config=~/models/fjufcncb20qtl7f7ehcpm5b6tn0.conf task=train
[LightGBM] [Warning] Unknown parameter in config file: categorical_feature=
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements.

And I can see some output in the model file after I run with lightgbm only:

tree
num_class=1
num_tree_per_iteration=1
label_index=0
max_feature_idx=18
objective=regression
boost_from_average
feature_names=Column_0 Column_1 Column_2 Column_3 Column_4 Column_5 Column_6 Column_7 Column_8 Column_9 Column_10 Column_11 Column_12 Column_13 Column_14 Column_15 Column_16 Column_17 Column_18
feature_infos=[16801:17165] [0:20] [0:16] [2:22741] [6037:6111] [33339295:34816009] none [1:240] [31:275] [1:5637] [60371011.101001002:61110091.001012005] [1286:3101] [95982:399675] [0:18] [1885:2015] [100:9948100] [1044:27750000] [278:24499999.999999996] [49.079999999999998:321936.09000000003]

Tree=0
num_leaves=2
split_feature=0
split_gain=-1
threshold=0
decision_type=0
default_value=0
left_child=-1
right_child=-2
leaf_parent=0 0
leaf_value=0.01149876969316053 0.01149876969316053
leaf_count=0 0
internal_value=0
internal_count=0
shrinkage=1
has_categorical=0

Here is my config file

boosting=gbdt
objective=regression
learning_rate=0.002
min_sum_hessian_in_leaf=0.001
min_data_in_leaf=20
feature_fraction=0.5
min_gain_to_split=1.0
bagging_fraction=0.9
poission_max_delta_step=0.0
lambda_l1=0.0
lambda_l2=0.0
scale_pos_weight=1.0
max_depth=4
num_threads=10
num_iterations=100
feature_fraction_seed=2
bagging_seed=2
drop_seed=2
data_random_seed=2
num_leaves=60
bagging_freq=1
xgboost_dart_mode=false
drop_rate=0.1
skip_drop=0.5
max_drop=50
top_rate=0.1
other_rate=0.1
huber_delta=0.1
fair_c=0.1
max_bin=255
min_data_in_bin=5
uniform_drop=false
two_round=false
is_unbalance=false
categorical_feature=
bin_construct_sample_cnt=1000000
is_sparse=true
verbosity=0
data=~/models/fjufcncb20qtl7f7ehcpm5b6tn0.train
output_model=~/model/models/fjufcncb20qtl7f7ehcpm5b6tn0.mod

Exception in thread "main" java.lang.reflect.InvocationTargetException

I have tried StackNet example with CMD under Windows, following problem happens. @kaz-Anova or someone else could give me tips how to fix it? Thanks a lot.

C:\Users\User>java -jar StackNet.jar train task=classification sparse=false has_head=false model=model train_file=train_iris.csv test_file=test_iris.csv test_target=true params=params.txt verbose=true threads=4 metric=logloss stackdata=false
parameter name : task value : classification
parameter name : sparse value : false
parameter name : has_head value : false
parameter name : model value : model
parameter name : train_file value : train_iris.csv
parameter name : test_file value : test_iris.csv
parameter name : test_target value : true
parameter name : params value : params.txt
parameter name : verbose value : true
parameter name : threads value : 4
parameter name : metric value : logloss
parameter name : stackdata value : false
Completed: 4.04 %
Completed: 8.08 %
Completed: 12.12 %
Completed: 16.16 %
Completed: 20.20 %
Completed: 24.24 %
Completed: 28.28 %
Completed: 32.32 %
Completed: 36.36 %
Completed: 40.40 %
Completed: 44.44 %
Completed: 48.48 %
Completed: 52.53 %
Completed: 56.57 %
Completed: 60.61 %
Completed: 64.65 %
Completed: 68.69 %
Completed: 72.73 %
Completed: 76.77 %
Completed: 80.81 %
Completed: 84.85 %
Completed: 88.89 %
Completed: 92.93 %
Completed: 96.97 %
Loaded File: train_iris.csv
Total rows in the file: 99
Total columns in the file: 5
Weighted variable : -1 counts: 0
Int Id variable : -1 str id: -1 counts: 0
Target Variables : 1 values : [0]
Actual columns number : 4
Number of Skipped rows : 0
Actual Rows (removing the skipped ones) : 99
Loaded dense train data with 99 and columns 4
loaded data in : 0.100000
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.IllegalStateException: File params.txt failed to import at bufferreader params.txt (系统找不到指定的文件。)
at io.input.StackNet_Configuration(input.java:1650)
at stacknetrun.runstacknet.main(runstacknet.java:441)

Why remove outliers?

Is there a reason for removing outliers? Could it throw away valuable information?

online learing support？

this work very smart, support many algorithm and data formate，
in the actual work ，training data may be millions data and millions samples，
can this supoort algorithm for online learning?

kaz-anova / stacknet Goto Github PK

stacknet's People

Contributors

Stargazers

Watchers

Forkers

stacknet's Issues

1.Is it the problem of default parameter or regressors?

2. Is it a problem of reusable K-fold metrics?

3.Some other questions about the Zillow Competition using StackNet.

Recommend Projects

Recommend Topics

Recommend Org

Jobs