Comments (10)
I am really not too enthusiastic about such a proposal for the following reasons:
- Will create a large volume of boiler plate code, which will be a maintenance burden
- Hard to test
- Will get us users that do not want to learn Python, and thus give us questions that we cannot answer, non informative bug reports, and force us to write much more documentation
You may think that I am cynical, but I am trying to think in the long run and to make sure the that the project doesn't implode under its own weight.
from scikit-learn.
I'm not quite sure what kind of boilerplate you're thinking of. I expected the command line program to be standalone and quite small, actually.
Also, since the command would use pickle for persistence, this would mean that people can apply a few pre-processings (feature extraction, PCA, ...), get their pickle object and work from there, in Python.
So I guess the only of your arguments I really agree with is 2.
This feature is not a must for me so if people don't like it too much, no problem!
from scikit-learn.
Maybe I am wrong, but I expect the boiler plate code to come from impedance matching Python with a command line.
For point 3, I guess that my answer to your answer is that users that know Python and need to call the scikit via a command line (eg to work in a multi-language environment) can cook up the functionality they need very quickly.
I am not in favor at all of this feature as I think that it is extending a bit outside of the scope of the scikit, but as always, I can be convinced, if I see that enough developers feel strongly about this and would maintain it.
from scikit-learn.
I think this a really important feature for day-to-day practitioners who are not necessarily developers but more data annalists who want to quickly evaluate the output of algos implemented in the scikit on their own data without having to write boilerplate code themselves.
It will be even more important once we implement online API to be able to naturally handle infinite byte streams in a Unix pipe.
from scikit-learn.
Having the ability to quickly wrap algorithms and predictive models as Unix CLI tools that read stdin and write to stdout would also make it trivial to use the scikit in a Hadoop Streaming environment (or using Apache Pig with the STREAMING command as well).
from scikit-learn.
As for the priority I agree this is not a high priority task: we need to work on the online part first to make this really useful in practice IMHO.
from scikit-learn.
I think this is something that should be pioneered in a separate package. I feel like closing this issue as I don't see it happening any time soon (and the issue tracker is filling up with "we should implement such and such" as well as PRs).
from scikit-learn.
+1
from scikit-learn.
Thanks for doing the clean-up round ;)
from scikit-learn.
I believe we've created that separate package you're looking for at ETS, and we just publicly released it on Friday! We called in SciKit-Learn Laboratory (SKLL). You can install from pip with just pip install skll
.
Documentation: http://scikit-learn-laboratory.readthedocs.org
Github project: https://github.com/EducationalTestingService/skll
It lets you easily run experiments using a variety of classifiers/regressors when you have pre-generated feature files. We hope other people find it useful, and feedback is always welcome. We use it a lot internally.
from scikit-learn.
Related Issues (20)
- ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev ⚠️ HOT 11
- Two different versions for weighted lorenz curve calculation in the examples HOT 4
- Add metrics.gini_index_score() HOT 4
- ValidationCurveDisplay can't handle categorical/string parameters HOT 5
- Localization of scikit-learn website content. HOT 11
- Multiclass support in precision_recall_curve HOT 2
- Make pipeline cache ignore parameter `verbose` of transformers HOT 3
- Implement `SplineTransformer.inverse_transform` HOT 1
- Unexpected NotFittedError for a fitted transformer passed to ColumnTransformer HOT 2
- Interactive code examples HOT 1
- Inaccurate Attribute Listing with dir(obj) for Classes Using available_if Conditional Method Decorator HOT 3
- How to solve the AttributeError: 'LabelPowerset' object has no attribute 'classes_'? HOT 3
- ⚠️ CI failed on Wheel builder ⚠️ HOT 1
- Implement temperature scaling for (multi-class) calibration HOT 12
- GridSearchCV do not weight the score by the size of the fold when providing custom split for CV HOT 5
- RFECV docstring does not state how the `cv_results_` attribute is ordered by HOT 1
- Macro vs micro-averaging switched up in user guide HOT 9
- `DecisionTreeClassifier` does not handle `Nan` HOT 1
- ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev ⚠️ HOT 5
- Missing _ZdlPv symbol in _argkmin_classmode for manylinux wheels produced by meson HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scikit-learn.