Comments (9)
Sure, you can remove them. At least the first one. But I am little worried about this case. This function ( iterative biconjugate gradient method) usually should converge pretty fast and thus not much print out. Please check if your LASSO model works well for test data. Thanks!
from smile.
The Lasso model is not working well for the test data, but that is the actual goal this time. I'm writing an example of how it can go wrong, thank you for the heads up though! 👍
from smile.
Interesting. It doesn't work well because LASSO is not a good fit for the problem? Or the parameter settings (e.g. regularization factor)?
from smile.
It's because there is too few datapoints and no actual statistical relation within the data.
As John Tukey once said:
The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
This is the case in my example, to make people aware that Machine Learning cannot perform miracles on every single dataset.
from smile.
Is your data size less than the dimensionality?
from smile.
yes, its 100 datapoints (with rank 1 to 100) with 27000 + features, this can never go well :) But since it's an example for my blog rather than an actual dataset that I want to use for predictions, I still worked it out to show what goes wrong when you do these kind of things.
In the end result the trained LASSO model should predict a rank value for a new datapoint, but it predicts worse than just predicting the average (50). This makes it a perfect example for what can go wrong if you have no clue what you are doing 👍
from smile.
That is the true reason. It is known as small sample size problem. I have a paper (http://lectures.molgen.mpg.de/networkanalysis13/LDA_cancer_classif.pdf) to deal with it.
from smile.
Cool tnx! I'll read that and see if I can incorporate it if that's ok with you?
from smile.
Try FLD in SMILE for your case. I don't remember if I implemented this algorithm there. Thanks!
from smile.
Related Issues (20)
- What is the efficient way to fill null values in a column with an arbitrary string in a Dataframe? HOT 3
- ClassCastException when calling DataFrame.omitNullRows() HOT 1
- smile.plot.swing.BarPlot works with smile-plot 3.0.2 but not with 3.1.0 HOT 2
- IllegalArgumentException when suing SimpleImputer for data sourced from json file HOT 1
- Is there any possibility to use ID3 or C4.5 via the Smile Package in Java? HOT 1
- Issue: Error with Prediction Method in Random Forest and Gradient Boost Regression HOT 3
- F1, precision & recall for multi-class classification HOT 5
- Predict requires DataFrame that contains the predicted variable in 3.1.0 HOT 5
- Suspicious discontinuity in run time with LU decomposition HOT 2
- Hard Coded Max Limit In FP Tree HOT 1
- Unsupported Operation for SVR update with a mini-batch of new samples HOT 7
- ElasticNetRegression HOT 2
- Support save to image in smile-plot HOT 2
- IllegalArgumentException: Field coalProduction doesn't exist
- IllegalArgumentException:Field coalProduction doesn't exist HOT 1
- NoClassDefFoundError: org/apache/commons/csv/CSVFormat$Builder HOT 2
- DataFrame为什么提供那么多功能类似的接口 HOT 3
- Scala compiler crashes after importing smile.data.`type`.StructType (Scala v2.13.7 only) HOT 1
- ONNX support. HOT 3
- ArrayIndexOutOfBoundsException inside RegressionTree.findBestSplit HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from smile.