Comments (2)
This might run on our tokio runtime. Then we could static task (runs for the duration of the polars process) that most of the time sleeps and once in a while garbage collects.
from polars.
Alright, did a brainstorm. I think we have got some ideas.
Assuming our spill/cache
directory ~.polars/
.
We can dump spilled files under a folder created by a combination process id and current datetime. This can hold future spilling files.
For the caching of the files we should provide a time-to-live, TTL
. This TTL can for instance be 1 day for files downloaded from the internet.
During startup we create a task that checks for old pid_datetime
folders that are not alive anymore (interupted process) and files that surpassed their TTL and cleans them.
~/.polars/
# Spills from the streaming engine. For future reference
pid_datetime/
pid_datetime/
# files with a TTL
cache/
The spill manager can be a static
struct that initially only deals with the downloads, caching and cleanup. I think that we should set an in-process
bit during downloading so that we don't start duplicate downloads.
from polars.
Related Issues (20)
- Add `pl.col(...).is_not_in(<iterable>)` method HOT 4
- `search_sorted` in an order of magnitude slower when single element chunk vstacked to the original dataframe HOT 2
- Rust to_ndarray does not cast Null in f64 column to NaN HOT 1
- .hash() return Int64 instead of UInt64 HOT 2
- Add argument to `Series.value_counts` to set the name of the new column created HOT 5
- Copy logic-plan from one LazyFrame to another LazyFrame? HOT 3
- Support converting DataFrames with matching Array types to multidimensional NumPy array
- ColumnNotFoundError appears in lazy mode only in version 0.20.28 HOT 9
- Multiple combination of expressions with Lazyframe raises PanicException
- `cluster_with_optimizer` PanicException during `scan_csv` call
- Opening large CSV files on some Macs is extremely slow. HOT 17
- Use more appropriate error variants in various places across the API HOT 2
- Change `DataFrame.write_parquet(write_statistics)` to a more granular type
- LazyFrame.schema fails with "Option::unwrap()` on a `None` value" HOT 6
- Schema of `LazyFrame.with_context` does not match result of collect HOT 2
- Following a selector with .exclude() is not considered a selector HOT 6
- predicate pushdown with `pl.Expr.cut`
- `.list.to_struct()` has non-deterministic behavior HOT 5
- Add `Expr.list.map_elements(func)` to perform a custom function on every element in a list HOT 2
- pl.from_pandas(..., nan_to_null=True) does not convert NaN to Null HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.