javascriptdata / scikit.js Goto Github PK
View Code? Open in Web Editor NEWJavaScript package for predictive data analysis and machine learning
License: MIT License
JavaScript package for predictive data analysis and machine learning
License: MIT License
Implement the model_selection train_test_split from sklearn.
Build a DecisionTreeClassifier which matches the scikit-learn API.
To start we don't have to mimic the entire API, just start somewhere, and we can help add features to the class.
Contribution guide link is returning 404 error
Hello!
Thanks for the excellent library. I noticed some SVC files but they are commented out. Is there any reason why SVC is not implemented?
Thank You!
Hi, first of all, thanks for developing this library :)
I was writing a code that use the DecisionTreeClassifier
. When I checked my results, I realised that they are not constants.
I mean, I set a random number generator with Math.seedrandom(my_seed)
to get the same results for each execution, but they are not.
This method worked for me on LogisticRegression
, where the weights are initialized by TensorflowJS (if I am not wrong, the user can "fixs" this initialization using Math.seedrandom()
before building the neural network). I know DecisionTreeClassifier is not a neural network, so the random number generation is not equal to LogisticRegression.
In summary: How can I set a random number generator for getting the same results on a DecisionTreeClassifier?
Hello everyone,
I'm working on a project that needs stratified sampling of the dataset so it can have a more balanced test set.
More on the subject: https://en.wikipedia.org/wiki/Stratified_sampling
I implemented a solution using Danfo.js for that purpose and, if you think it is a good idea, I can open a PR with that as a splitting tool.
Its parallel in scikit learn: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html
If you think this would make sense for the project, just let me know. :)
update build process to allow for TensorFlow versions between danfo and scikit to be out of sync
Hey Folks!
I thought it might be a bit easier if we had one issue that had the current "state of the world".
It would have a list of all completed Estimators/Functions and next to each it would have a person's name if someone was working on it or it'd be checked if it was complete and merged in dev.
Ping me in the comments beneath and I'll add you to whichever estimators you want to work on.
I went through the scikit-learn docs yesterday and broke out the Estimators that we would need for an MVP of scikit.js (let's call it version 1).
The focus here is on simple models, and all the preprocessing, and metrics that you'd need to perform high quality model generation.
linear_model
cluster
neighbors
dummy
impute
preprocessing
pipeline
compose
tree
metrics
So pick whichever ya want, and ping me, and I'll update the issue and put your name next to the Estimator / Function.
Hello folks! Time flies when you're having fun :)
We are rounding the corner the completion of the MVP / Version 1 list above. I thought it would be good to go through scikit-learn and make a list of the next most important things. That list is below as well as some general todos (docs, tutorials). Feel free to ping me or comment below and grab whatever interests in the following list.
Onward and Upward!
linear_model
datasets
naive_bayes
svm
model_selection
decomposition
hyper_parameter
ensemble
docs
Is your feature request related to a problem? Please describe.
It would be amazing to have a fast LibSVM implementation for SVC, SVR estimators.
Describe the solution you'd like
Compile the LibSVM project to a wasm file, and use that with the familiar SVM sklearn api
In the model_selection category, implement KFold.
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html#sklearn.model_selection.KFold
Is your feature request related to a problem? Please describe.
Like the scikit-learn library we should have a RidgeRegression Estimator.
Describe the solution you'd like
Similar to the LinearRegression issue, I think the best plan of action is to create a SGD solution that delivers the right answer (even if it is slower) and then eventually switch to a faster linear algebra library once we understand that landscape a bit more.
It seems based on Node.js currently, but there's WASM around the corner:
Any input on it w.r.t. to this project?
Hi!
Maybe I did something wrong but I think this is the simplest example to get a score out.
When I call model.score(xTest, yTest)
I get the error Labels can't be converted to a 1D Tensor
.
const btcusdtData = await ensureCryptoData('BTCUSDT', '1d')
const df = new dnf.DataFrame(btcusdtData)
// Couldn't use a list of column names here unfortunately so I used their index numbers. Would be a lot cooler if I could use 'open', 'high', etc.
const x = df.iloc({ columns: [1, 2, 3, 5, 6, 7, 8, 9, 10, 11] })
const y = df.iloc({ columns: [4] })
const [xTrain, xTest, yTrain, yTest] = sk.trainTestSplit(x.tensor, y.tensor)
const model = new sk.LinearRegression()
await model.fit(xTrain, yTrain)
const score = model.score(xTest, yTest)
Implement BernoulliNB with the scikit-learn API.
Hi,
How do you save and then re-open a trained network?
It doesn't seem to be explained anywhere in the doc.
Thanks.
Is your feature request related to a problem? Please describe.
We should have a robust and well-tested set of metrics that mirror those on sklearn.
Describe the solution you'd like
Many of these will be thin wrappers around the ones that ship with tensorflow.js, but there will be some that we will just have to simply write using basic tensor math.
Is your feature request related to a problem? Please describe.
In order to match the Estimators in scikit-learn, we should make a LinearRegression estimator with the same API.
Describe the solution you'd like
There are couple of ways to do this. Scikit-Learn leans on linear algebra libraries from scipy. I'm honestly not sure if JS (either Node / Browser) has adapters for the same low-level BLAS and LAPACK libraries that scipy uses.
In lieu of not having access to those libraries, the plan should be to just create a gradient descent solution using a TF model.
Once we know / can use low level libraries for solving a linear system of equations, we should trade out the SGD solution above in favor of that.
Additional context
There might be cases where we actual want to have both implementations (Linear Algebra solver, and SGD), because we could deploy different ones to different contexts, ie... Lin Alg solver on Node, and SGD solver on web. This would get rid of the need for shipping an entire solver to the client. But that's something that we would need to test later.
The goal of the better Github actions is to
Build a DecisionTreeRegressor which matches some of the API of the scikit-learn DecisionTreeRegressor.
No need to support the entire API to start. If you make the first pass, others can come in, and chip in other features as well.
Is your feature request related to a problem? Please describe.
How are models plan to be save and loaded
Describe the solution you'd like
I think we can have a fromJson
and toJson
to save and load model params and weight. if this is needed i can start working on that
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Steps to reproduce the behavior:
Desktop:
Implement GaussianNB with the scikit-learn API.
from @dcrescim:
My basic premise (and I think we are aligned here) is that less is more.
If we can build a library with fewer repositories (1 is better than 2), or if we can support the same use cases with fewer npm packages (like supporting node/cpu/wasm with only 1 package), then that is better.
The only "gotcha" which will force my hand into "more repo / more packages" territory is if we can't keep the user experience clean.
So what does the dream scenario look like? Here's some example code:
import { LinearRegression } from 'scikitjs' // uses tfjs library, and whichever (webgl, cpu) backend is better
import { LinearRegression } from 'scikitjs/node' // uses tfjs-node library
import { LinearRegression, tf } from 'scikitjs' // uses tfjs library, with wasm backend
tf.setBackend('wasm')
import { LinearRegression } from 'scikitjs/node-gpu' // uses tfjs-node-gpu library
I don't care too much about that last case in the short term, but it's there just to make sure our code structure could eventually support it one day
Is this related to https://github.com/transitive-bullshit/scikit-learn-ts?
If not, then how do they differ?
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Making a great crossValidate function that matches the sklearn api
When PRs are merged onto the main
branch:
###Up for debate:
PREFER in HTML (as in d3.js , tensorflow.js, tfjs-vis use global variables [d3/tf/tfvis] respectively in browser console ):
<script src='src/d3.min.js'></script> <script src="https://cdn.jsdelivr.net/npm/@tensorflow/[email protected]/dist/tf.min.js"></script> <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-vis"></script>Then in browser console when you type tfvis. , d3. , tf. immediately a drop down list of available modules/functions
appeared right after the dot. This is faster for debugging and testing by just utilizing one script tag line in HTML!
===========================
scikit.js in HTML from your website:
This way sk is not recognizable in browser console. Can you provide a link/file of scikit.js that can utilize a global
variable such as "sk" if you have one.
If scikit.js does not have this feature, will it provides one in the future [time frame please].
Is there a way to work around this? [create global variable "sk" and gain access to all sub
modules/functions in browser console for interactive testing].
Thank you, I'm looking forward to hear from you soon.
Add Iris, Boston, Wine, etc... to the repo. Add them to the docs site, and write functions to "go and get them".
update coveralls config to gate releases by test coverage
I run the scikitjs (version: 1.24.0) in my project. and package it by parcel-bundler (version: 1.12.5). When I start the project I get the following error:
if I import the bundle by script tag (CDN), works fine.
The error location: /node_modules/scikitjs/dist/es5/index.js (line: 20). And, Comment out this line of code and it works fine.
Build a KNeighborsRegressor which matches the sklearn API.
Technically there are 3 different strategies ‘ball_tree’, ‘kd_tree’, ‘brute’. This issue is only to support the "brute" method which checks the distance between the predicted point with every point in the input.
The other two are optimizations which try to use trees (kd_tree), and spheres(ball_tree) to speed up this algo, but to start let's just do brute.
When scikit-learn tests the effectiveness of their model training, they usually construct fake datasets where they know the underlying model (coefficients). They have two functions, they are
make_regression (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html)
make_classification (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html#sklearn.datasets.make_classification)
We should do the same. This is issue tracks our implementation of makeRegression
and makeClassification
which will be helpful for testing the speed and efficacy of our models.
Build a KNeighborsClassifier which matches the scikit-learn API.
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
Technically there are 3 different strategies ‘ball_tree’, ‘kd_tree’, ‘brute’. This issue is only to support the "brute" method which checks the distance between the predicted point with every point in the input.
The other two are optimizations which try to use trees (kd_tree), and spheres(ball_tree) to speed up this algo, but to start let's just do brute.
covert to Jest Tests
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.