pola-rs / valves Goto Github PK
View Code? Open in Web Editor NEWgeneral functions for your data .pipe()-lines.
general functions for your data .pipe()-lines.
Just for the heck of it. Can we build a text-based NaiveBayes classifier in dataframes? That means splitting on the whitespace and generating tokens.
Given a log of weighted user
-item
interactions, can we generate a item-item
recommendation table and a user-item
recommendation table?
Kind of! We can calculate p(item_a | item_b)
and p(item_a)
which is can be reweighed into a table with recommendations. We can also do something similar for users. After all, a user that interactive with items a
, b
and c
will have a score for item x
defined via;
p(item_x | user) = p(item_x | item_a, item_b, item_c)
\propto p(item_x | item_a) p(item_x| item_b) p(item_x|item_c)
As we compare different tools here. It would be cool to run benchmarks from this repo.
Maybe in CI, and later maybe even a dedicated runner.
These can could then be shown on the website. I am already assuming here that polars does great. ๐
Things like weighted mean/sum/std might be good to support.
Someone asked if we could support this: https://pandas.pydata.org/pandas-docs/stable/user_guide/window.html#window-exponentially-weighted
I haven't looked at it much, but it seems like this should be possible with some cumulative expression kung fu.
Looking at pypi, it seems valve is taken.
@ritchie46 Maybe valves
instead?
Also, would we want to host this package on pypi?
Currently, the subtitle of the project lists "data pipelines built with polars".
Considering that we don't just support polars here, might it be better to call it "general functions for your data .pipe()
-lines."?
@ritchie46 Do we have any preference for documentation? I could go for something like mkdocs
but I figured checking in first because it wouldn't fit the current documentation style.
If you're going to make a random sample on your data you don't always want to uniformly sample the rows. Instead, you may want to uniformly sample users and all their rows. That way, you'll have all interactions/sessions of the user in your subset.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.