cstjean / scikitlearn.jl Goto Github PK

Julia implementation of the scikit-learn API https://cstjean.github.io/ScikitLearn.jl/dev/

License: Other

Jupyter Notebook 39.34% Julia 60.66%

scikitlearn.jl's Introduction

ScikitLearn.jl

ScikitLearn.jl implements the popular scikit-learn interface and algorithms in Julia. It supports both models from the Julia ecosystem and those of the scikit-learn library (via PyCall.jl).

Would you rather use a machine-learning framework specially-designed for Julia? Check out MLJ.jl, from the Alan Turing institute.

Disclaimer: ScikitLearn.jl borrows code and documentation from scikit-learn, but it is not an official part of that project. It is licensed under BSD-3.

Main features:

Around 150 Julia and Python models accessed through a uniform interface
Pipelines and FeatureUnions
Cross-validation
Hyperparameter tuning
DataFrames support

Check out the Quick-Start Guide for a tour.

Installation

To install ScikitLearn.jl, type ]add ScikitLearn at the REPL.

To import Python models (optional), ScikitLearn.jl requires the scikit-learn Python library, which will be installed automatically when needed. Most of the examples use PyPlot.jl

Known issue

On Linux builds, importing python models via @sk_import is known to fail for Julia v<0.8.4 when the PYTHON enviroment variable from PyCall.jl is set to "" or conda. This is becuase the version libstdcxx loaded by Julia v<0.8.4 isn't compatible with the version of scikit-learn installed via Conda. The easiest and recommended way to resolve this is to upgrade to Julia v>=1.8.4. If you must stick with your current julia version you can also resolve this issue by pre-appending your system's LD_LIBRARY_PATH enviroment variable as shown below

ROOT_ENV=`julia -e "using Conda; print(Conda.ROOTENV)`
export LD_LIBRARY_PATH=$ROOT_ENV"/lib":$LD_LIBRARY_PATH

Documentation

See the manual and example gallery.

Goal

ScikitLearn.jl aims for feature parity with scikit-learn. If you encounter any problem that is solved by that library but not this one, file an issue.

scikitlearn.jl's People

Contributors

Stargazers

Watchers

scikitlearn.jl's Issues

ScikitLearn.jl cannot find scikit-learn pkg

On Linux, Julia 0.5.1, I:

Pkg.add("ScikitLearn")
Conda.add("scikit-learn")

Then

using ScikitLearn
@sk_import linear_model: LinearRegression

yields:

ERROR: PyError (:PyImport_ImportModule) <type 'exceptions.ImportError'>
ImportError('No module named sklearn.linear_model',)

Are there some environment variables I need to assert are set up correctly?

fit model gives error "LoadError: UndefVarError: fit! not defined"

Somehow it seems impossible to fit a model and keeps giving errors.
When i run this simple example:

using RDatasets: dataset
using ScikitLearn

@sk_import linear_model: LogisticRegression

iris = dataset("datasets", "iris")

X = convert(Array, iris[[:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]])
y = convert(Array, iris[:Species])

model = fit!(LogisticRegression(), X, y)
accuracy = sum(predict(model, X) .== y) / length(y)
println("accuracy: $accuracy") # accuracy on training set

LoadError: UndefVarError: fit! not defined

pycall api updates

PyCall 1.90.0 is now released, which change o[:foo] and o["foo"] to o.foo and o."foo", respectively, for python objects o; see also JuliaPy/PyCall.jl#629.

The old getindex methods still work but are deprecated, so you'll want to put out a new release that uses the new methods and REQUIREs PyCall 1.90.0 to avoid having zillions of deprecation messages.

ScikitLearn declared inside a module causes segmentation error

Tested in Julia 1.0.3 and Julia 1.1 and Julia 0.7

To recreate the problem:
create package A
pkg] generate A
bash> cd A
pkg] activate .
pkg] add ScikitLearn
julia> edit("src/A.jl")
-----
module A
using ScikitLearn
@sk_import linear_model: LogisticRegression

function testme()
model = LogisticRegression()
end

end
---
julia> using A
julia> A.testme() -> causes segmentation error

However, if you use:
julia> include("src/A.jl")
julia> A.testme() -> works

Hyperparameter tuning with GridSearchCV does not work

Hi,
I am trying to reproduce an example from the package documentation (http://scikitlearnjl.readthedocs.io/en/latest/quickstart/#hyperparameter-tuning), but the code does not work. The Julia version i am working with is: Version 0.6.0-dev.1258 (2016-11-16 17:32 UTC).

The line fit!(LogisticRegression(), X, y) throws the following error:

Cannot convert an object of type Array{Pair{Symbol,Float64}, 1} to an object of type Dict{Symbol,Any}. This may have arisen from a call to the constructor Dict{Symbol,Any}(...), since type constructors fall back to convert mode.

My code:

using RDatasets: dataset
iris = dataset("datasets", "iris")
X = convert(Array, iris[[:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]])
y = convert(Array, iris[:Species])

using ScikitLearn
using ScikitLearn.CrossValidation: cross_val_score
using ScikitLearn.GridSearch: GridSearchCV
@sk_import linear_model: LogisticRegression

gridsearch = GridSearchCV(LogisticRegression(), Dict(:C => 0.1:0.1:2.0))
fit!(LogisticRegression(), X, y)
println("Best parameters: $(gridsearch.best_params_)")

Creating Custom Transformer

Hi team,

Thanks for the great package. I'm trying to build custom Transformer similar to the procedure described here : http://www.dreisbach.us/blog/building-scikit-learn-compatible-transformers/

# Python reference code
# Dumb transformer: it takes any data and returns a feature vector of [1].
from sklearn.base import TransformerMixin

class DumbFeaturizer(TransformerMixin):
    def __init__(self):
        pass

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return [[1] for _ in X]

I'm working on the Kaggle titanic competition and try to pipeline every transformation from import to fit/predict.

Here is one of the function I'm trying to make into a Transformer.
It extracts the title from the "Name" column i.e from "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" it will extract Mrs. and populate a new column from the extraction.

function pp_Title(df::AbstractDataFrame)
    @linq df |>
        transform(
            CptTitle = map(s->match(r"(?<=, ).*?\.", s).match, :Name)
        )
end

Please note that I'm trying to builds transformers that accept Dataframes and output Dataframes as well. I will then use DataFrameMapper for the final transformation.

I tried using @PyDEF but it fails while fitting:

@pydef type pp_extractTitle <: TransformerMixin
    fit(self, X, y=None) = return self
    
    transform(self, X)=
        @linq df |>
            transform(
                CptTitle = map(s->match(r"(?<=, ).*?\.", s).match, :Name)
            )
end

Thanks a lot

Adding native julia implementations

Hi,

Do you have any guidelines for adding new estimators?

I have implemented a multiclass perceptron

perceptron implementation

I would like to know how I could adapt it and leave it in this repo.

This is different from the perceptron in scikitlearn (python) since this implementation natively supports multiclass (it's not using a one vs all). The main benefit is that this is faster than the sklearn version.

I looked at the native linear_regression implementation. It seems you solved directly with this line

results = [ones(size(X, 2), 1) X'] \ y'

so this is not done using SGD. Should we implement abstraction to be reused to do partial_fit ?

Cannot pass scoring to cross_val_score

The following code gives the error below:

using ScikitLearn
using RDatasets

@sk_import linear_model: LinearRegression
using ScikitLearn.CrossValidation: cross_val_score

boston = dataset("MASS", "Boston")

X = Matrix(boston[1:13])
y = Array(boston[:MedV])

lr = LinearRegression()
scores = cross_val_score(lr, X, y,  scoring="r2")

ERROR: ArgumentError: r2 is not a valid scoring value. Valid options are Symbol[:mean_squared_error]
Stacktrace:
 [1] get_scorer(::Symbol) at /home/.julia/v0.6/ScikitLearn/src/scorer.jl:65
 [2] #check_scoring#95(::Bool, ::Function, ::PyCall.PyObject, ::String) at /home/.julia/v0.6/ScikitLearn/src/cro
ss_validation.jl:432
 [3] #cross_val_score#83(::String, ::Int64, ::Int64, ::Int64, ::Void, ::Function, ::PyCall.PyObject, ::Array{Real,2}
, ::Array{Float64,1}) at /home/.julia/v0.6/ScikitLearn/src/cross_validation.jl:276
 [4] (::ScikitLearn.Skcore.#kw##cross_val_score)(::Array{Any,1}, ::ScikitLearn.Skcore.#cross_val_score, ::PyCall.PyO
bject, ::Array{Real,2}, ::Array{Float64,1}) at ./<missing>:0

Any other scoring method will give the same error, except the "mean_squared_error", of course.
The same code works in Python.

Getting warning when importing some models and error when importing LogistivRegression

I'm trying to do the Quickstart example, but when running

@sk_import linear_model: LogisticRegression

I get the output:

WARNING: Method definition require(Symbol) in module Base at loading.jl:345 overwritten in module Main at /Users/user/.julia/v0.5/Requires/src/require.jl:12.
ERROR: LoadError: PyError (:PyImport_ImportModule) <type 'exceptions.ImportError'>
ImportError('No module named sklearn.linear_model',)

 in include_from_node1(::String) at ./loading.jl:488
 in include_from_node1(::String) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 in process_options(::Base.JLOptions) at ./client.jl:265
 in _start() at ./client.jl:321
 in _start() at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
while loading /Users/user/Dropbox/ML/beyes.jl, in expression starting on line 12

Asking for scikit-learn (Python)

I have the following message, but scikit-learn is definitly installed in python

WARNING: Please install scikit-learn (Python). See http://scikitlearnjl.readthedocs.io/en/latest/models/#installation for instructions.
ERROR: PyError (:PyImport_ImportModule) <type 'exceptions.ImportError'>
ImportError('No module named sklearn',)

fit! function

Hi,

I want to use the DecisionTreeRegressor and the RandomForestRegressor, but when I tried the example the fit! function doesn't work. After installing all packages etc.

Can you help me with this?

Thanks,
Steffie

DataFrames v0.11.6

ScikitLearn is causing DataFrames to hang at version v0.10.1.

Update to DataFrames v0.11

JuliaData/DataFrames.jl#1232

Importing LabelEncoder without loading ScikitLearn

I am taking some datasets, cleaning it and encoding categorical variable using tools that are available in ScikitLearn then running XGBoost of the clean data.

However, I cannot make predictions using the trained XGBoost model because both ScitkitLearn and XGBoost have a function named predict. Refer to the error message below:

WARNING: both ScikitLearn and XGBoost export "predict"; uses of it in module Main must be qualified ERROR: LoadError: UndefVarError: predict not defined

The problem is that I can not define the predict function for XGBoost as XGBoost.predict because this does not work and it seems to be the only solution that I know of.

Further, I cannot find or understand how I can load only the LabelEncoder modules from ScikitLearn without loading ScikitLean and thus the predict function. e.g, the formats

using ScikitLearn:LabelEncoder
import ScikitLearn:LabelEncoder
import ScikitLearn:Preprocessing,LabelEncoder

All do not work.

Looking forward to your help.

Error on Julia v 0.6

After upgrading to Julia v0.6
ERROR: LoadError: Module ScikitLearn declares precompile(true) but require failed to create a usable precompiled cache file.
Stacktrace:

running Pkg.build("ScikitLearn") after fixes it but should that be necessary?

warnings on 0.5

Would be nice to fix these warnings that show up during precompilation as we prepare to release 0.5.

WARNING: deprecated syntax "[a=>b for (a,b) in c]".
Use "Dict(a=>b for (a,b) in c)" instead.

WARNING: deprecated syntax "call(self::PredictScorer, ...)".
Use "(self::PredictScorer)(...)" instead.

WARNING: deprecated syntax "call(::Core.kwftype(PredictScorer), ...)".
Use "(::Core.kwftype(PredictScorer))(...)" instead.

WARNING: deprecated syntax "[a=>b for (a,b) in c]".
Use "Dict(a=>b for (a,b) in c)" instead.

WARNING: deprecated syntax "[a=>b for (a,b) in c]".
Use "Dict(a=>b for (a,b) in c)" instead.

Use NBInclude to test notebooks

We could modify NBInclude to be similar to nosebook.

TODO: support Distributions.jl

The main blocker is JuliaStats/Distributions.jl#436. It's hard to test when random code is not reproducible.

Change the CrossValidation module to ModelSelection

It's been done in scikit-learn, and we should follow suit. It's not urgent because we're (almost) not relying on that Python code.

PCA has wrong dimensions

In the following example, there are 10 samples and 5 features. The output should have 10 samples and 2 features, but it has 5 samples:

using ScikitLearn
import ScikitLearn: fit!
@sk_import decomposition: PCA

X = rand(10, 5)
pca = PCA(n_components=2)
fit!(pca, X)
pca.components_

2×5 Array{Float64,2}:
 0.670536  -0.26009   0.325406  -0.324582   0.521049
 0.411367   0.360626  0.461346   0.66101   -0.225726

Getting the Coefficients on a Fit Linear Regression

This is kind of a basic question, but I've tried all combinations of syntax and cannot get it to work. I also looked through all of the old issues before posting here, so my apologies if this has already been answered.

I am having issues returning the coefficients from the linear regression. You can see my code below. I looked at the .jl file for it and tried various function/dot notation using coefs and I did the same with coef_ from the python manual. Those attempts aren't shown below because none of them worked.

Can someone please let me know what the command that returns the coefficient array is after the model has been fit? Sorry if this is not the place for such an issue.

#Generate Data

#Distributions
UniRVB = Uniform(0,1);
NRVB = Normal(0,1);

srand(32);
n=1000000;
numX=10;
X=rand(UniRVB,n,numX);
Xmiss=sum(X,2) + rand(NRVB,n);
Xtot=hcat(X,Xmiss);
y=sum(Xtot,2) + rand(NRVB,n);

#Check
@sk_import linear_model: LinearRegression;
reg=LinearRegression();
(SKL.fit!(reg,X,Xmiss))
Xmisspred=SKL.predict(reg,X);
print(sum(Xmisspred-Xmiss)/n)

Can I use SelectFromModel with DecisionTree?

I'm trying to use SelectFromModel with RadomForestClassifier. There is support for that on ScickitLearn in Julia?

using RDatasets: dataset
using ScikitLearn, DecisionTree

iris = dataset("datasets", "iris")
X = convert(Array, iris[[:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]])
y = convert(Array, iris[:Species])
@sk_import ensemble: RandomForestClassifier
@sk_import feature_selection: SelectFromModel


rfc = RandomForestClassifier(n_subfeatures=30, n_trees=350, partial_sampling = 0.4, min_purity_increase = 0.001)
sfm = SelectFromModel(rfc)
fit!(sfm, X, y)

I get the following error:

ERROR: PyError ($(Expr(:escape, :(ccall(#= /home/tas/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:44 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'TypeError'>
TypeError("Cannot clone object '<PyCall.jlwrap RandomForestClassifier\nn_trees: 350\nn_subfeatures: 30\npartial_sampling: 0.4\nmax_depth: -1\nmin_samples_leaf: 1\nmin_samples_split: 2\nmin_purity_increase: 0.001\nclasses: ensemble: >' (type <class 'PyCall.jlwrap'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.")
File "/home/tas/.julia/conda/3/lib/python3.7/site-packages/sklearn/feature_selection/from_model.py", line 195, in fit
self.estimator_ = clone(self.estimator)
File "/home/tas/.julia/conda/3/lib/python3.7/site-packages/sklearn/base.py", line 60, in clone
% (repr(estimator), type(estimator)))

Stacktrace:
[1] pyerr_check at /home/tas/.julia/packages/PyCall/ttONZ/src/exception.jl:60 [inlined]
[2] pyerr_check at /home/tas/.julia/packages/PyCall/ttONZ/src/exception.jl:64 [inlined]
[3] macro expansion at /home/tas/.julia/packages/PyCall/ttONZ/src/exception.jl:84 [inlined]
[4] __pycall!(::PyCall.PyObject, ::Ptr{PyCall.PyObject_struct}, ::PyCall.PyObject, ::Ptr{Nothing}) at /home/tas/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:44
[5] _pycall!(::PyCall.PyObject, ::PyCall.PyObject, ::Tuple{Array{Float64,2},Array{String,1}}, ::Int64, ::Ptr{Nothing}) at /home/tas/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:29
[6] _pycall!(::PyCall.PyObject, ::PyCall.PyObject, ::Tuple{Array{Float64,2},Array{String,1}}, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/tas/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:11
[7] #call#111(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::PyCall.PyObject, ::Array{Float64,2}, ::Vararg{Any,N} where N) at /home/tas/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:89
[8] (::PyCall.PyObject)(::Array{Float64,2}, ::Vararg{Any,N} where N) at /home/tas/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:89
[9] #fit!#31(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::PyCall.PyObject, ::Array{Float64,2}, ::Vararg{Any,N} where N) at /home/tas/.julia/packages/ScikitLearn/bo2Pt/src/Skcore.jl:100
[10] fit!(::PyCall.PyObject, ::Array{Float64,2}, ::Array{String,1}) at /home/tas/.julia/packages/ScikitLearn/bo2Pt/src/Skcore.jl:100
[11] top-level scope at none:0

PS: There is no issue if I use RandomForestClassifier from ScikitLearn.

Pycall error

julia> using ScikitLearn

julia> @sk_import linear_model: LinearRegression
ERROR: PyError (PyImport_ImportModule

The Python package sklearn could not be found by pyimport. Usually this means
that you did not install sklearn in the Python version being used by PyCall.

PyCall is currently configured to use the Python version at:

/usr/bin/python3

and you should use whatever mechanism you usually use (apt-get, pip, conda,
etcetera) to install the Python package containing the sklearn module.

One alternative is to re-configure PyCall to use a different Python
version on your system: set ENV["PYTHON"] to the path/name of the python
executable you want to use, run Pkg.build("PyCall"), and re-launch Julia.

Another alternative is to configure PyCall to use a Julia-specific Python
distribution via the Conda.jl package (which installs a private Anaconda
Python distribution), which has the advantage that packages can be installed
and kept up-to-date via Julia. As explained in the PyCall documentation,
set ENV["PYTHON"]="", run Pkg.build("PyCall"), and re-launch Julia. Then,
To install the sklearn module, you can use pyimport_conda("sklearn", PKG),
where PKG is the Anaconda package the contains the module sklearn,
or alternatively you can use the Conda package directly (via
using Conda followed by Conda.add etcetera).

) <class 'ModuleNotFoundError'>
ModuleNotFoundError("No module named 'sklearn'")

Stacktrace:
[1] pyimport(::String) at /home/ezio/.julia/packages/PyCall/ttONZ/src/PyCall.jl:544
[2] pyimport_conda(::String, ::String, ::String) at /home/ezio/.julia/packages/PyCall/ttONZ/src/PyCall.jl:702
[3] pyimport_conda at /home/ezio/.julia/packages/PyCall/ttONZ/src/PyCall.jl:701 [inlined]
[4] import_sklearn() at /home/ezio/.julia/packages/ScikitLearn/HK6Vs/src/Skcore.jl:119
[5] top-level scope at /home/ezio/.julia/packages/ScikitLearn/HK6Vs/src/Skcore.jl:153

DataFrameMapper and @sk_import preprocessing: StandardScaler

I am testing out the preprocessing example in the DataFrames section of the manual
I copied this code:

using ScikitLearn
using DataFrames: DataFrame, NA, DataArray
using DataArrays: @data
@sk_import preprocessing: (LabelBinarizer, StandardScaler)

data = DataFrame(pet=["cat", "dog", "dog", "fish", "cat", "dog", "cat", "fish"],
                 children=[4., 6, 3, 3, 2, 3, 5, 4],
                 salary=[90, 24, 44, 27, 32, 59, 36, 27])
mapper = DataFrameMapper([(:pet, LabelBinarizer()),
                          ([:children], StandardScaler())]);
round(fit_transform!(mapper, copy(data)), 2)

and tried to run it (I have all the packages installed already).

However, it seems that the StandardScaler() function interacts strangely for this part:

([:children], StandardScaler())]);

Since Julia v0.5 (and v0.4.5) throws this error:

LoadError: TypeError: #call#17: in new, expected Array{Tuple,1}, got Array{Tuple,1}

Strangely enough, it works as intended in Julia v0.6.0-dev without a hitch.

TODO: Fix remaining Python dependencies

train_test_split and all the code relying on importpy (as well as that function itself) ought to be reviewed and fixed to avoid loading scikit-learn unnecessarily.

ERROR: KeyError: AgglomerativeClustering not found

using ScikitLearn
@sk_import cluster:AgglomerativeClustering
ERROR: KeyError: AgglomerativeClustering not found

I can import the other cluster functions without a problem
e.g.
@sk_import cluster: (estimate_bandwidth, MeanShift, MiniBatchKMeans, SpectralClustering)

Any ideas - would it be something to do with the local python scikitlearn installation?

Error tagging new release

The tag name "0.4.0" is not of the appropriate SemVer form (vX.Y.Z).
cc: @cstjean

Using DataFrames and ScikitLearn fit! warning

I'm running into another problem using the "fit!" command. I'm using both the DataFrames and SckikitLearn packages, I get this error when trying to use "fit!":

WARNING: both ScikitLearn and DataFrames export "fit!"; uses of it in module Main must be qualified

If I then try to use "fit!", I get the error : ERROR: UndefVarError: fit! not defined. However, "fit!" works fine if I have not declared the usage of both packages. Shouldn't using the function work regardless?

Support for NLP Functionality

When you say NLP functionality in the contribution guidelines are you hinting towards a julia implementation of tfidf_vectorizer and count_vectorizer from sklearn?

I'd be very willing to help out with that if I knew more what direction you were wanting to go.

Native GridSearchCV dosen't support predict

using ScikitLearn
@sk_import neighbors: KNeighborsRegressor
X = rand(100, 2)
y = X[:,1] +X[:, 2]
knn = GridSearch.GridSearchCV(KNeighborsRegressor(), cv=5, Dict("n_neighbors"=>[4, 5, 10, 20, 50]))
ScikitLearn.fit!(knn, X, y)
ScikitLearn.predict(knn, X)

ERROR: MethodError: no method matching predict(::ScikitLearn.Skcore.GridSearchCV, ::Array{Float64,2})
Closest candidates are:
  predict(::ScikitLearn.Skcore.FitBit, ::Any...; kwargs...) at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\ScikitLearn\src\sk_utils.jl:73
  predict(::PyCall.PyObject, ::Any...; kwargs...) at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\ScikitLearn\src\Skcore.jl:95
  predict(::ScikitLearn.Skcore.Pipeline, ::Any) at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\ScikitLearn\src\pipeline.jl:85

ScikitLearn.predict(knn.estimator, X)

ERROR: PyError (:PyObject_Call) <class 'sklearn.exceptions.NotFittedError'>
NotFittedError('Must fit neighbors before querying.',)
  File "C:\PortableSoftware\Scoop\apps\python\3.6.0\lib\site-packages\sklearn\neighbors\regression.py", line 144, in predict
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "C:\PortableSoftware\Scoop\apps\python\3.6.0\lib\site-packages\sklearn\neighbors\base.py", line 323, in kneighbors
    raise NotFittedError("Must fit neighbors before querying.")

 in pyerr_check at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\PyCall\src\exception.jl:56 [inlined]
 in pyerr_check at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\PyCall\src\exception.jl:61 [inlined]
 in macro expansion at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\PyCall\src\exception.jl:81 [inlined]
 in #_pycall#66(::Array{Any,1}, ::Function, ::PyCall.PyObject, ::Array{Float64,2}, ::Vararg{Any,N}) at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\PyCall\src\PyCall.jl:550
 in _pycall(::PyCall.PyObject, ::Array{Float64,2}, ::Vararg{Any,N}) at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\PyCall\src\PyCall.jl:538
 in #pycall#70(::Array{Any,1}, ::Function, ::PyCall.PyObject, ::Type{PyCall.PyAny}, ::Array{Float64,2}, ::Vararg{Any,N}) at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\PyCall\src\PyCall.jl:572
 in pycall(::PyCall.PyObject, ::Type{PyCall.PyAny}, ::Array{Float64,2}, ::Vararg{Any,N}) at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\PyCall\src\PyCall.jl:572
 in #call#71(::Array{Any,1}, ::PyCall.PyObject, ::Array{Float64,2}, ::Vararg{Any,N}) at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\PyCall\src\PyCall.jl:575
 in (::PyCall.PyObject)(::Array{Float64,2}, ::Vararg{Any,N}) at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\PyCall\src\PyCall.jl:575
 in #predict#23(::Array{Any,1}, ::Function, ::PyCall.PyObject, ::Array{Float64,2}, ::Vararg{Array{Float64,2},N}) at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\ScikitLearn\src\Skcore.jl:95
 in predict(::PyCall.PyObject, ::Array{Float64,2}) at C:\PortableSoftware\Scoop\apps\julia\pkgs-0.5.0\v0.5\ScikitLearn\src\Skcore.jl:95

PyError with OneHotEncoder (Julia 0.6.0 on Windows10)

I'm getting a PyError with this code.

using DataFrames
using ScikitLearn
@sk_import preprocessing: OneHotEncoder

df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])

mapper = DataFrameMapper([([:B], OneHotEncoder())]);

fit_transform!(mapper, df)

ERROR: PyError (ccall(@pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, arg, C_NULL)) <type 'exceptions.ValueError'>
ValueError('could not convert string to float: M',)
  File "C:\Users\...\.julia\v0.6\Conda\deps\usr\lib\site-packages\sklearn\preprocessing\data.py", line 1844, in fit
    self.fit_transform(X)
  File "C:\Users\...\.julia\v0.6\Conda\deps\usr\lib\site-packages\sklearn\preprocessing\data.py", line 1902, in fit_transform
    self.categorical_features, copy=True)
  File "C:\Users\...\.julia\v0.6\Conda\deps\usr\lib\site-packages\sklearn\preprocessing\data.py", line 1697, in _transform_selected
    X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
  File "C:\Users\...\.julia\v0.6\Conda\deps\usr\lib\site-packages\sklearn\utils\validation.py", line 382, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)

It seems specific to OneHotEncoder. For example, LabelBinarizer works fine like this:

mapper = DataFrameMapper([(:B, LabelBinarizer())]);

I'm on Windows 10 using Julia 0.6.0.
Package versions:

- Conda                         0.5.3
- DataArrays                    0.5.3
- DataFrames                    0.10.0
- PyCall                        1.14.0
- ScikitLearn                   0.3.0
- ScikitLearnBase               0.3.0

I let ScikitLearn.jl automatically handle the installation of Python dependencies. The installed versions are:

python                    2.7.13
numpy                     1.13.0
scikit-learn              0.18.2

GridSearch with pipelines of dataframes

Hello again Cédric,

Following your help on transformer I am now trying to use a GridSearch to optimize the hyperparameters of a RandomForest.

I have a pipeline with lots of transformer which works great with Cross Validation and actual prediction, however I get a type error when trying to use it in a GridSearchCV, it seems like there is an extra argument of type ScikitLearn.Skcore.ParameterGrid in my setup :

pipe = Pipelines.Pipeline([ # This is working fine for cross validation, fitting and predicting
    ("extract_deck",PP_DeckTransformer()),
     ... # A list of 15 transformers
     ("featurize", mapper), # This is a DataFrameMapper to convert to Array
    ("forest", RandomForestClassifier(ntrees=200)) #Hyperparam: nsubfeatures, partialsampling, maxdepth
    ])

X_train = train
Y_train = convert(Array, train[:Survived])

# #Cross Validation - check model accuracy -- This is working fine
# crossval = round(cross_val_score(pipe, X_train, Y_train, cv =10), 2)
# print("\n",crossval,"\n")
# print(mean(crossval))

# GridSearch
grid = Dict(:ntrees => 10:30:240,
            :nsubfeatures => 0:1:13,
            :partialsampling => 0.2:0.1:1.0,
            :maxdepth => -1:2:13
)

gridsearch = GridSearchCV(pipe, grid)
fit!(gridsearch, X_train, Y_train)
println("Best hyper-parameters: $(gridsearch.best_params_)")

The error I get is :

ERROR: LoadError: MethodError: no method matching _fit!(::ScikitLearn.Skcore.GridSearchCV, ::DataFrames.DataFrame, ::Array{Int64,1}, ::ScikitLearn.Skcore.ParameterGrid)
Closest candidates are:
  _fit!(::ScikitLearn.Skcore.BaseSearchCV, !Matched::AbstractArray{T,N}, ::Any, ::Any) at /Users/<user>/.julia/v0.5/ScikitLearn/src/grid_search.jl:254
 in fit!(::ScikitLearn.Skcore.GridSearchCV, ::DataFrames.DataFrame, ::Array{Int64,1}) at /Users/<user>/.julia/v0.5/ScikitLearn/src/grid_search.jl:526
 in include_from_node1(::String) at ./loading.jl:488
 in include_from_node1(::String) at /usr/local/Cellar/julia/0.5.0/lib/julia/sys.dylib:?
 in process_options(::Base.JLOptions) at ./client.jl:262
 in _start() at ./client.jl:318
 in _start() at /usr/local/Cellar/julia/0.5.0/lib/julia/sys.dylib:?
while loading /Users/<path>/Kaggle-001-Julia-MagicalForest.jl, in expression starting on line 538

So the proc is receiving _fit!(::ScikitLearn.Skcore.GridSearchCV, ::DataFrames.DataFrame, ::Array{Int64,1}, ::ScikitLearn.Skcore.ParameterGrid) but expecting an array instead of a Dataframe. The thing is it should have been converted away by the DataFrameMapper.

If needed the full code is there https://github.com/mratsim/MachineLearning_Kaggle/blob/9c07a64a981a6512e021ae01623212a278fd05d1/Kaggle%20-%20001%20-%20Titanic%20Survivors/Kaggle-001-Julia-MagicalForest.jl#L530

Problem with predict_proba

Hello,

First, thanks for this very useful package!
I have a problem when trying to use the predict_proba function. I am running it on a linearSVC, ScikitLearn.predict_proba(svm_model,X) but I get the following error :

ERROR: LoadError: KeyError: key "predict_proba" not found
in getindex(::PyCall.PyObject, ::String) at /home/theo/.julia/v0.5/PyCall/src/PyCall.jl:261
in #predict_proba#22(::Array{Any,1}, ::Function, ::PyCall.PyObject, ::Array{Float64,2}, ::Vararg{Array{Float64,2},N}) at /home/theo/.julia/v0.5/ScikitLearn/src/Skcore.jl:95

I am running on Linux, with every package updated

Thanks for your help

Theo

Getting error when executing "Hyperparameter tuning" example in Quick Start Guide.

First of all, many thanks for this excellent package. I am wondering if I could help with the following problem perhaps. I am running julia version 0.4.5 and, as far I can tell, I have got scikit-learn version 0.17.1 installed.

I have gone through the steps listed in the Quick Start Guide one by one.
All the steps work fine and reproduce the results shown in the guide successfully,
apart from this one step in section "Hyperparameter tuning".
Specifically when I execute the statement:

fit!(gridsearch, X, y)

I get the error:

ERROR: MethodError:_fit_and_scorehas no method matching _fit_and_score(::PyCall.PyObject, ::Array{Float64,2}, ::Array{ByteString,1}, ::Function, ::Array{Int64,1}, ::Array{Int64,1}, ::Int64, ::Dict{Any,Any}, ::Dict{Any,Any})

I cannot see how I can correct this error. Any ideas perhaps? Thanks.

Some notebooks have errors or depwarnings.

Some errors are related to other packages others are not. Will keep track of progress here:

Error loading ScikitLearn

I get the following error when inputting using ScikitLearn after installing the package.
using Julia 1.0.0

julia> using ScikitLearn
[ Info: Precompiling ScikitLearn [3646fa90-6ef7-5e7e-9f22-8aca16db6324]
ERROR: LoadError: LoadError: syntax: extra token "ScikitLearnBase" after end of expression
Stacktrace:
 [1] include at ./boot.jl:317 [inlined]
 [2] include_relative(::Module, ::String) at ./loading.jl:1038
 [3] include at ./sysimg.jl:29 [inlined]
 [4] include(::String) at /Users/kartikeygupta/.julia/packages/ScikitLearn/ILHSi/src/ScikitLearn.jl:10
 [5] top-level scope at none:0
 [6] include at ./boot.jl:317 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1038
 [8] include(::Module, ::String) at ./sysimg.jl:29
 [9] top-level scope at none:2
 [10] eval at ./boot.jl:319 [inlined]
 [11] eval(::Expr) at ./client.jl:389
 [12] top-level scope at ./none:3
in expression starting at /Users/kartikeygupta/.julia/packages/ScikitLearn/ILHSi/src/Skcore.jl:15
in expression starting at /Users/kartikeygupta/.julia/packages/ScikitLearn/ILHSi/src/ScikitLearn.jl:12
ERROR: Failed to precompile ScikitLearn [3646fa90-6ef7-5e7e-9f22-8aca16db6324] to /Users/kartikeygupta/.julia/compiled/v1.0/ScikitLearn/tbUuI.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] macro expansion at ./logging.jl:313 [inlined]
 [3] compilecache(::Base.PkgId, ::String) at ./loading.jl:1184
 [4] macro expansion at ./logging.jl:311 [inlined]
 [5] _require(::Base.PkgId) at ./loading.jl:941
 [6] require(::Base.PkgId) at ./loading.jl:852
 [7] macro expansion at ./logging.jl:311 [inlined]
 [8] require(::Module, ::Symbol) at ./loading.jl:834

Warn on old versions of sklearn-python

Hi,
while @sk_import linear_model: LogisticRegression works fine I get an error when trying
@sk_import neural_network: MLPClassifier.

Is it possible with 'ScikitLearn.jl' to use neural networks?

numpy version error

Here is the error.

julia> @sk_import linear_model: LogisticRegression
/Users/zhuj6/.julia/v0.6/Conda/deps/usr/lib/python2.7/site-packages/scipy/__init__.py:110: UserWarning: Numpy 1.8.2 or above is recommended for this version of scipy (detected version 1.7.0)
  UserWarning)
RuntimeError: module compiled against API version 0xb but this version of numpy is 0x7
INFO: Installing sklearn via the Conda scikit-learn package...
Fetching package metadata .........
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /Users/zhuj6/.julia/v0.6/Conda/deps/usr:
#
scikit-learn              0.18.2              np113py27_0  
ERROR: PyError (ccall(@pysym(:PyImport_ImportModule), PyPtr, (Cstring,), name)) <type 'exceptions.ImportError'>
ImportError('cannot import name __check_build',)
  File "/Users/zhuj6/.julia/v0.6/Conda/deps/usr/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
    from . import __check_build

Stacktrace:
 [1] pyerr_check at /Users/zhuj6/.julia/v0.6/PyCall/src/exception.jl:56 [inlined]
 [2] pyerr_check at /Users/zhuj6/.julia/v0.6/PyCall/src/exception.jl:61 [inlined]
 [3] macro expansion at /Users/zhuj6/.julia/v0.6/PyCall/src/exception.jl:81 [inlined]
 [4] pyimport(::String) at /Users/zhuj6/.julia/v0.6/PyCall/src/PyCall.jl:370
 [5] pyimport_conda(::String, ::String, ::String) at /Users/zhuj6/.julia/v0.6/PyCall/src/PyCall.jl:530
 [6] import_sklearn() at /Users/zhuj6/.julia/v0.6/ScikitLearn/src/Skcore.jl:114

The installed version of numpy is 1.13.0, but somehow detected version 1.7.0
I did Conda.update() to update all packages.

README: For newbs... needs scikit-learn

I am just starting to learn machine learning in Julia and I have no experience with Python. ScikitLearn.jl looks like a nice place to start. I admit I had a hard time getting it to work. The installations instructions probably assume familiarity with Python (which I don't have).

To get ScikitLearn to work, I had to do a Conda.add("scikit-learn"). Probably obvious, but might be good to add to instructions for newbs like me.

Better yet, is there some way the build script can just add the python library to avoid that step?

mysterious error message

This is a fragment of the very long error message:
expected return statement, got ($(QuoteNode(PyCall.pyimport)))("sklearn.tree")

This is the seemingly trivial code:


@sk_import ensemble: AdaBoostClassifier
@sk_import tree: DecisionTreeClassifier

function boost_tree(traindata)
    trainx = traindata[:,1:end-1]
    trainy = traindata[:,end]
    bdt = AdaBoostClassifier(base_estimator = DecisionTreeClassifier(max_depth=6), n_estimators=5)
    ScikitLearn.fit!(bdt, trainx, trainy)
    return bdt
end

No clue. Documentation is really tough to apply if you try anything slightly different than the examples. Translation from the Python doc to the Julia wrapper is definitely non-obvious.

Compatibility with 0.7

Hello,

Would it be possible to update the code to make it compatible with Julia 0.7?
When I try to compile it I get

ERROR: LoadError: LoadError: LoadError: LoadError: type QuoteNode has no field args
Stacktrace:
 [1] getproperty(::Any, ::Symbol) at ./sysimg.jl:18
 [2] @delegate(::LineNumberNode, ::Module, ::Any, ::Any) at /home/theo/.julia/packages/ScikitLearn/ILHSi/src/sk_utils.jl:73
 [3] include at ./boot.jl:317 [inlined]
 [4] include_relative(::Module, ::String) at ./loading.jl:1038
 [5] include at ./sysimg.jl:29 [inlined]
 [6] include(::String) at /home/theo/.julia/packages/ScikitLearn/ILHSi/src/Skcore.jl:5
 [7] top-level scope at none:0
 [8] include at ./boot.jl:317 [inlined]
 [9] include_relative(::Module, ::String) at ./loading.jl:1038
 [10] include at ./sysimg.jl:29 [inlined]
 [11] include(::String) at /home/theo/.julia/packages/ScikitLearn/ILHSi/src/ScikitLearn.jl:10
 [12] top-level scope at none:0
 [13] include at ./boot.jl:317 [inlined]
 [14] include_relative(::Module, ::String) at ./loading.jl:1038
 [15] include(::Module, ::String) at ./sysimg.jl:29
 [16] top-level scope at none:2
 [17] eval at ./boot.jl:319 [inlined]
 [18] eval(::Expr) at ./client.jl:399
 [19] top-level scope at ./none:3
in expression starting at /home/theo/.julia/packages/ScikitLearn/ILHSi/src/sk_utils.jl:236
in expression starting at /home/theo/.julia/packages/ScikitLearn/ILHSi/src/sk_utils.jl:236
in expression starting at /home/theo/.julia/packages/ScikitLearn/ILHSi/src/Skcore.jl:13
in expression starting at /home/theo/.julia/packages/ScikitLearn/ILHSi/src/ScikitLearn.jl:12
ERROR: Failed to precompile ScikitLearn [3646fa90-6ef7-5e7e-9f22-8aca16db6324] to /home/theo/.julia/compiled/v0.7/ScikitLearn/tbUuI.ji.

getindex(o::PyObject, s::Symbol) is deprecated

I got this warning.

┌ Warning: `getindex(o::PyObject, s::Symbol)` is deprecated in favor of dot overloading (`getproperty`) so elements should now be accessed as e.g. `o.s` instead of `o[:s]`.
│   caller = import_sklearn() at Skcore.jl:120

How to resolve it?

Using fit! with strings

I'm currently trying to use the "fit!" command to build a classification model. However, the features in my training set contain stings which seem to not be allowed by Python. I've found that one can use the Python functions "LabelEncoder" and "OneHotEncoder" to basically transform a set number of strings to numerical values. How can this be used through Julia? I've found that using @sk_import preprocessing: LabelEncoder actually works, but I'm not sure where to go from here. Thanks!

Support for Saving Models

It would be very nice if we could save and load models.

I tried both of the typical ways to do this with sklearn:

@pyimport pickle
open(model_save_path, "w") do io
        pickle.dump(model, io)
end

and

@sk_import externals: joblib
joblib.dump(model, model_save_path)

Both give me an error caused by having jl-wrapped python objects.

ERROR: LoadError: PyError (:PyObject_Call) <type 'exceptions.TypeError'>
TypeError("can't pickle jlwrap objects",)

Is there a way around this?

Precompile error

I got the following error when using ScikitLearn.

julia> using ScikitLearn
INFO: Recompiling stale cache file /Users/zhuj6/.julia/lib/v0.6/ScikitLearn.ji for module ScikitLearn.
WARNING: The call to compilecache failed to create a usable precompiled cache file for module ScikitLearn. Got:
WARNING: Module Iterators uuid did not match cache file.
ERROR: LoadError: Declaring __precompile__(true) is only allowed in module files being imported.
Stacktrace:
 [1] __precompile__(::Bool) at ./loading.jl:335
 [2] __precompile__() at ./loading.jl:331
 [3] include_from_node1(::String) at ./loading.jl:569
 [4] eval(::Module, ::Any) at ./boot.jl:235
 [5] _require(::Symbol) at ./loading.jl:483
 [6] require(::Symbol) at ./loading.jl:398
while loading /Users/zhuj6/.julia/v0.6/ScikitLearn/src/ScikitLearn.jl, in expression starting on line 8

I am using Julia 0.6.0

julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b (2017-06-19 13:05 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

Can I get attributes from the model?

If I use Python SVC model, can I get attributes like coef_, support_vectors_ from the fitted model?

ScikitLearn dependency prevents precompilation of LowRankModels

When I add the __precompile()__ magic to the LowRankModels module and import it, I see the following error. Is there a way to accomplish the goal of ScikitLearn here without using eval?

WARNING: eval from module ScikitLearnBase to LowRankModels:    
Expr(:block, Expr(:line, 77, :/Users/madeleine/.julia/v0.5/ScikitLearnBase/src/ScikitLearnBase.jl)::Any, Expr(:call, Expr(:., :ScikitLearnBase, :get_params)::Any, Expr(:parameters, Expr(:kw, :deep, true)::Any)::Any, Expr(:::, :estimator, LowRankModels.SkGLRM)::Any)::Any = Expr(:block, Expr(:line, 77, :/Users/madeleine/.julia/v0.5/ScikitLearnBase/src/ScikitLearnBase.jl)::Any, Expr(:call, :simple_get_params, :estimator, Array{Symbol, 1}[
  :fit_params,
  :init,
  :rx,
  :ry,
  :rx_scale,
  :ry_scale,
  :loss,
  :abs_tol,
  :rel_tol,
  :max_iter,
  :inner_iter,
  :k,
  :verbose])::Any)::Any, Expr(:line, 79, :/Users/madeleine/.julia/v0.5/ScikitLearnBase/src/ScikitLearnBase.jl)::Any, Expr(:call, Expr(:., :ScikitLearnBase, :set_params!)::Any, Expr(:parameters, Expr(:..., :new_params)::Any)::Any, Expr(:::, :estimator, LowRankModels.SkGLRM)::Any)::Any = Expr(:block, Expr(:line, 79, :/Users/madeleine/.julia/v0.5/ScikitLearnBase/src/ScikitLearnBase.jl)::Any, Expr(:call, :simple_set_params!, Expr(:parameters, Expr(:kw, :param_names, Array{Symbol, 1}[
  :fit_params,
  :init,
  :rx,
  :ry,
  :rx_scale,
  :ry_scale,
  :loss,
  :abs_tol,
  :rel_tol,
  :max_iter,
  :inner_iter,
  :k,
  :verbose])::Any)::Any, :estimator, :new_params)::Any)::Any, Expr(:line, 82, :/Users/madeleine/.julia/v0.5/ScikitLearnBase/src/ScikitLearnBase.jl)::Any, Expr(:call, Expr(:., :ScikitLearnBase, :clone)::Any, Expr(:::, :estimator, LowRankModels.SkGLRM)::Any)::Any = Expr(:block, Expr(:line, 82, :/Users/madeleine/.julia/v0.5/ScikitLearnBase/src/ScikitLearnBase.jl)::Any, Expr(:call, :simple_clone, :estimator)::Any)::Any)::Any
  ** incremental compilation may be broken for this module **

Get rid of Skcore

Right now, ScikitLearn.jl has this peculiar structure, where ScikitLearn.Skcore contains all of the code, and the functions are reexported in the various submodules: ScikitLearn.CrossValidation, ScikitLearn.Pipelines, etc. It's messy. Unfortunately, Julia doesn't seem to support submodules referencing each other:

module A
using ..B: f
f(x) = 10
end

module B
using ..A: f
g(x) = f(x)+2
end

> ERROR

whereas Python does support cyclic imports, which is why the Python code's life is a lot easier. The options are:

Move the Skcore functions into their submodules. This requires eliminating all cycles, and I'm not even sure that's possible. That's non-trivial work.
Get rid of the submodules, put everything in ScikitLearn.*
Keep the status quo.

Option 2 would be a breaking change (no more submodules, or at least, fewer submodules). It's also one more step away from scikit-learn-python

Can't use ScikitLearn

I can't for the life of me get "using ScikitLearn" to work in julia. This occurs after running Pkg.add("ScikitLearn") successfully. The error I get is:

ERROR: LoadError: LoadError: ArgumentError: Iterators not found in path
in compilecache at loading.jl:393
in require at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
in include at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
in include_from_node1 at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
in include at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
in include_from_node1 at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
[inlined code] from none:2
in anonymous at no file:0
in process_options at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
in _start at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
while loading /Users/kyledegrave/.julia/v0.4/ScikitLearn/src/preprocessing.jl, in expression starting on line 3
while loading /Users/kyledegrave/.julia/v0.4/ScikitLearn/src/ScikitLearn.jl, in expression starting on line 73
ERROR: Failed to precompile ScikitLearn to /Users/kyledegrave/.julia/lib/v0.4/ScikitLearn.ji
in error at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
in compilecache at loading.jl:400
in require at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib

I've never had a problem with any other package, and I can't figure out why. Any help is definitely appreciated!

Linear Regression does not fit intercept

Is there any plan to provide the option to fit the intercept in the linear regression model?