GithubHelp home page GithubHelp logo

Comments (10)

GaelVaroquaux avatar GaelVaroquaux commented on May 4, 2024

I am really not too enthusiastic about such a proposal for the following reasons:

  1. Will create a large volume of boiler plate code, which will be a maintenance burden
  2. Hard to test
  3. Will get us users that do not want to learn Python, and thus give us questions that we cannot answer, non informative bug reports, and force us to write much more documentation

You may think that I am cynical, but I am trying to think in the long run and to make sure the that the project doesn't implode under its own weight.

from scikit-learn.

mblondel avatar mblondel commented on May 4, 2024

I'm not quite sure what kind of boilerplate you're thinking of. I expected the command line program to be standalone and quite small, actually.

Also, since the command would use pickle for persistence, this would mean that people can apply a few pre-processings (feature extraction, PCA, ...), get their pickle object and work from there, in Python.

So I guess the only of your arguments I really agree with is 2.

This feature is not a must for me so if people don't like it too much, no problem!

from scikit-learn.

GaelVaroquaux avatar GaelVaroquaux commented on May 4, 2024

Maybe I am wrong, but I expect the boiler plate code to come from impedance matching Python with a command line.

For point 3, I guess that my answer to your answer is that users that know Python and need to call the scikit via a command line (eg to work in a multi-language environment) can cook up the functionality they need very quickly.

I am not in favor at all of this feature as I think that it is extending a bit outside of the scope of the scikit, but as always, I can be convinced, if I see that enough developers feel strongly about this and would maintain it.

from scikit-learn.

ogrisel avatar ogrisel commented on May 4, 2024

I think this a really important feature for day-to-day practitioners who are not necessarily developers but more data annalists who want to quickly evaluate the output of algos implemented in the scikit on their own data without having to write boilerplate code themselves.

It will be even more important once we implement online API to be able to naturally handle infinite byte streams in a Unix pipe.

from scikit-learn.

ogrisel avatar ogrisel commented on May 4, 2024

Having the ability to quickly wrap algorithms and predictive models as Unix CLI tools that read stdin and write to stdout would also make it trivial to use the scikit in a Hadoop Streaming environment (or using Apache Pig with the STREAMING command as well).

from scikit-learn.

ogrisel avatar ogrisel commented on May 4, 2024

As for the priority I agree this is not a high priority task: we need to work on the online part first to make this really useful in practice IMHO.

from scikit-learn.

larsmans avatar larsmans commented on May 4, 2024

I think this is something that should be pioneered in a separate package. I feel like closing this issue as I don't see it happening any time soon (and the issue tracker is filling up with "we should implement such and such" as well as PRs).

from scikit-learn.

amueller avatar amueller commented on May 4, 2024

+1

from scikit-learn.

amueller avatar amueller commented on May 4, 2024

Thanks for doing the clean-up round ;)

from scikit-learn.

dan-blanchard avatar dan-blanchard commented on May 4, 2024

I believe we've created that separate package you're looking for at ETS, and we just publicly released it on Friday! We called in SciKit-Learn Laboratory (SKLL). You can install from pip with just pip install skll.

Documentation: http://scikit-learn-laboratory.readthedocs.org
Github project: https://github.com/EducationalTestingService/skll

It lets you easily run experiments using a variety of classifiers/regressors when you have pre-generated feature files. We hope other people find it useful, and feedback is always welcome. We use it a lot internally.

from scikit-learn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.