pola-rs / nodejs-polars Goto Github PK
View Code? Open in Web Editor NEWnodejs front-end of polars
Home Page: https://pola-rs.github.io/nodejs-polars/
License: MIT License
nodejs front-end of polars
Home Page: https://pola-rs.github.io/nodejs-polars/
License: MIT License
The new zig-based runtime for JS looks extremely promising:
Specifically, FFI appears much more performant: https://github.com/oven-sh/bun#bunffi-foreign-functions-interface
@ritchie46 @universalmind303 -- has this made your radar yet?
I was considering giving it a run on my side; but figured it made sense to ask you guys first.
If this is of interest, do you have any guidance on how you would like to see it in the project structurally?
Best,
Ryan
node 20 is the latest LTS version of node. As such, we should be testing on it.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
$ yarn add nodejs-polars
➤ YN0000: ┌ Resolution step
➤ YN0001: │ Error: nodejs-polars-android-arm64@npm:0.6.0: No candidates found
at ce (/Users/chenzili/.cache/node/corepack/yarn/3.2.2/yarn.js:439:7864)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Promise.allSettled (index 0)
at async go (/Users/chenzili/.cache/node/corepack/yarn/3.2.2/yarn.js:390:10446)
➤ YN0000: └ Completed in 1s 159ms
➤ YN0000: Failed with errors in 1s 161ms
yarn add nodejs-polars
Successfully install.
tested this with 0.6.0 and 0.5.4
Windows 10
node 16.17.0
When you chain withRowCount to scanCSV, the row count column seems to reset after a while.
Use a CSV with sufficient lines (it seems you need at least 1500 lines). Use scanCSV and withRowCount.
When manually initializing the LazyDataFrame, withRowCount seems to work as intended.
import pl from "nodejs-polars";
import * as fs from "node:fs";
const data = [...Array(5000).keys()];
fs.writeFileSync("data.csv", data.join("\n"));
const lf1 = await pl.DataFrame(data).lazy().withRowCount().collect();
const lf2 = await pl.scanCSV("data.csv").withRowCount().collect();
console.log(lf1);
console.log(lf2);
lf1:
┌────────┬──────────┐
│ row_nr ┆ column_0 │
│ --- ┆ --- │
│ u32 ┆ f64 │
╞════════╪══════════╡
│ 0 ┆ 0.0 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 1.0 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2.0 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 3.0 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4996 ┆ 4996.0 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4997 ┆ 4997.0 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4998 ┆ 4998.0 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4999 ┆ 4999.0 │
└────────┴──────────┘
lf2:
shape: (4999, 2)
┌────────┬──────┐
│ row_nr ┆ 0 │
│ --- ┆ --- │
│ u32 ┆ i64 │
╞════════╪══════╡
│ 0 ┆ 1 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1 ┆ 2 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ 3 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ 4 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1186 ┆ 4996 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1187 ┆ 4997 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1188 ┆ 4998 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1189 ┆ 4999 │
└────────┴──────┘
I would expect the withRowCount to not reset halfway
What do you think polars should have done?
Is your question related to syntax or how you could do something with the polars library?
Yes
If a question is not yet on stackoverflow, please create a new question and post the link here, so we are noted.
https://stackoverflow.com/questions/76635638/who-decides-what-optional-dependencies-to-install-when-installing-nodejs-polars
Convert a string column into a Datetime column, with the ability to supply a format
argument and other options.
export const to_datetime = (
format: str,
time_unit: TimeUnit | None = None,
time_zone: str | None = None,
strict: bool = True,
exact: bool = True,
cache: bool = True,
utc: bool | None = None,
): Expr => {
...
}
Python/Rust API: https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.str.to_datetime.html#polars-expr-str-to-datetime
Python Implementation: https://github.com/pola-rs/polars/blob/1313e59009edd1d6e6f85ef9be32c4706cc4d0b8/py-polars/polars/expr/string.py#L80-L159
I saw this function but I don't think this is an entrypoint/exposed in the public API:
Line 580 in e6c1edb
0.8.1
macOS 13.4 intel
node 18.16.0
pl.readJSON()
parse new line \n
wrongly.
\n
=> \\
test.json
file
[
{
"key": "\n所"
}
]
Example
index.js
const pl = require('nodejs-polars');
const df = pl.readJSON("./test.json");
console.log(df.select(pl.col("*")).toRecords());
node index.js
output
[ { key: '\\n�' } ]
[ { key: '\n所' } ]
I want to convert nested json columns in my csv files with col('key').str.jsonExtract
like python with pl.col('key').str.json_extract
.
Unfortunately, it looks like the method is not implemented.
pl.Date
column in the data frame, using the toRecords
function produces objects which have dates at the unix epoch.import pl from "nodejs-polars";
let df = pl.DataFrame({
date: [new Date()],
});
df = df.withColumn(pl.col("date").cast(pl.Date).alias("date"));
console.log(df.toString());
console.log(df.toRecords());
shape: (1, 1)
┌────────────┐
│ date │
│ --- │
│ date │
╞════════════╡
│ 2023-03-09 │
└────────────┘
{date: Thu Jan 01 1970 01:00:19 GMT+0100 (Greenwich Mean Time)}
shape: (1, 1)
┌────────────┐
│ date │
│ --- │
│ date │
╞════════════╡
│ 2023-03-09 │
└────────────┘
{date: Thu Mar 09 2023 00:00:00 GMT+0100 (Greenwich Mean Time)}
0.7.2
macOS Ventura Version 13.1 (22C65)
ex: node 18.12.1
Using readJson
results in a fatal error.
import pl from "nodejs-polars"
const jsonString = `
{"a", 1, "b", "foo", "c": 3}
{"a": 2, "b": "bar", "c": 6}
`
const df = pl.readJSON(jsonString)
Full trace:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ExternalFormat("InvalidToken(44)")', /Users/runner/.cargo/git/checkouts/polars-b0d90607192fd414/43598c3/polars/polars-io/src/ndjson_core/ndjson.rs:161:90
stack backtrace:
0: 0x12325a7b8 - _napi_register_module_v1
1: 0x12327688c - _napi_register_module_v1
2: 0x123257750 - _napi_register_module_v1
3: 0x12325a5cc - _napi_register_module_v1
4: 0x12325bd40 - _napi_register_module_v1
5: 0x12325ba98 - _napi_register_module_v1
6: 0x12325c364 - _napi_register_module_v1
7: 0x12325c184 - _napi_register_module_v1
8: 0x12325ac20 - _napi_register_module_v1
9: 0x12325bee0 - _napi_register_module_v1
10: 0x12334f078 - _napi_register_module_v1
11: 0x12334f260 - _napi_register_module_v1
12: 0x1223e9c34 - <unknown>
13: 0x121ebb94c - <unknown>
14: 0x121ebc12c - <unknown>
fatal runtime error: failed to initiate panic, error 5
Should read the JSON string properly
What do you think polars should have done?
Should have read the JSON string properly
This is just something I threw into the node console to see what happens:
const pl = require('nodejs-polars');
df = pl.DataFrame()
try { df.withColumn(pl.arange(0, 5).alias('foo').cast('asdf')) } catch(e) { console.error(e) }
thread '<unnamed>' panicked at 'not yet implemented', src/conversion.rs:596:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
[1] 27374 IOT instruction (core dumped) node
This just crashes the complete node runtime, I can't catch errors.
I don't know anything about rust, but I guess there is some sort of exception handling there as well? Can errors be made to bubble up into node?
I would like to have the ability to perform a custom operation on a Series/Expr as such:
import pl from 'nodejs-polars';
const data = [
{ a: 'A', b: 10},
{ a: 'B', b: 20},
{ a: 'B', b: 13},
{ a: 'C', b: 40},
];
const mapping = { D: 'OtherD'};
const df = pl.DataFrame(data);
const dfWithOtherA = df.withColumns([pl.col("a").map((val) => mapping[val]).alias('otherA')]);
/* expected result:
┌─────┬──────┬────────┐
│ a ┆ b ┆ otherA │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ str │
╞═════╪══════╪════════╡
│ A ┆ 10.0 ┆ A │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ B ┆ 20.0 ┆ OtherB │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ B ┆ 13.0 ┆ OtherB │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ C ┆ 40.0 ┆ C │
└─────┴──────┴────────┘
*/
The expected behavior should follow what is described in the polars docs.
I would be happy to help with some guidance, but I still am not aware of polars internals to go for a PR.
Thank you for the awesome work in all other APIs and functions, keep the great work! :)
0.6.0
Linux 5.4.0-132-generic #148-Ubuntu
node --version
v16.13.2
Try to create a dataframe with a value that is null results in an unwrap on a None value in rust and a panic.
Try to create a dataframe with a null value
node
Welcome to Node.js v16.13.2.
Type ".help" for more information.
> pl = require('nodejs-polars')
> pl.DataFrame([{a:1, b:2, c:null}])
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/dataframe.rs:1548:29
Polars should have created the dataframe with a null value for column c
> pl.DataFrame([{a:1, b:2, c:null}])
Proxy [
shape: (1, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 1.0 ┆ 2.0 ┆ null │
└─────┴─────┴─────┘,
{
get: [Function: get],
set: [Function: set],
has: [Function: has],
ownKeys: [Function: ownKeys],
getOwnPropertyDescriptor: [Function: getOwnPropertyDescriptor]
}
]
In datatypes/datatype.ts
there is an export of the namespace DataType with all of the previously declared datatypes; but time is missing, meaning that we cannot cast to time - to a format that will measure HH:MM:SS.ssss for durations or timestamps.
Furthermore - why not initialize time with a time unit - make it ms by default but access TimeUnit and use that as a unit? (This might be a feature request)
Early in the development, the decision to use camelCase
was made to make the library more closely aligned with JS standards. However, much of methods are not a 1-1, or don't translate well to camelCase. I'm wondering if users would prefer the methods be renamed to exactly match the python/rust equivalents. Which would minimize context switching between languages.
ex:
readCSV
-> read_csv
Please upvote with one of the following emojis
👍 for snake_case
👎 for camelCase
If the problem was resolved, please update polars. :)
0.7.4
macOS
20.3.0
Running fold
to perform a row sum gives an error code DateExpected
.
import('nodejs-polars').then(mod => {pl = mod})
df = pl.DataFrame({
"a": [1, 2, 3],
"b": [1, 2, 3],
"c": [1, 2, 3]
});
df.fold((s1, s2) => s1.plus(s2))
Uncaught Error
at dtypeWrap (xxx/node_modules/nodejs-polars/bin/series/index.js:29:65)
at Proxy.plus (xxx/node_modules/nodejs-polars/bin/series/index.js:428:20)
at REPL3:1:24
at xxx/node_modules/nodejs-polars/bin/dataframe.js:166:60
at Array.reduce (<anonymous>)
at Proxy.fold (xxx/node_modules/nodejs-polars/bin/dataframe.js:166:38) {
code: 'DateExpected'
}
Series: 'a' [f64]
[
3
6
9
]
0.8.0
Debian
v18.13.0
Using the str.contains
function of a column doesn't respect the case insensitive flag of a provided regex object.
Use str.contains on a column, providing a regex object created with the "i"
flag for case insensitivity. Test it on a sample with different casing than the original regex pattern.
Example
import pl from "nodejs-polars"
let df = pl.DataFrame({
"text": ["foo", "FOO", "FoO"],
})
const regex = new RegExp("foo", "i")
df = df.withColumn(pl.col("text").str.contains(regex).alias("result"))
console.log(df.toString())
The contains function does not match all case variations.
shape: (3, 2)
┌──────┬────────┐
│ text ┆ result │
│ --- ┆ --- │
│ str ┆ bool │
╞══════╪════════╡
│ foo ┆ true │
│ FOO ┆ false │
│ FoO ┆ false │
└──────┴────────┘
The contains function should match all variations. Probably by injecting the appropriate (?i)
and (?-i)
flags for polars to interpret.
shape: (3, 2)
┌──────┬────────┐
│ text ┆ result │
│ --- ┆ --- │
│ str ┆ bool │
╞══════╪════════╡
│ foo ┆ true │
│ FOO ┆ true │
│ FoO ┆ true │
└──────┴────────┘
I want to read data from MongoDB and analyze it using polars, but I don't know how to do it
thanks
When I add nodejs-polars to my package json it for some reasons asks:
? Please choose a version of "nodejs-polars-android-arm64" from this list: (Use arrow keys)
❯ 0.5.4
0.5.3
0.5.2
0.5.1
0.5.0
It looks really strange as I have regular Linux Mint 21 there. While I have no issues like this with 0.5.4 version
Dear @universalmind303
I work on implementing polars for R and very much see nodejs-polars as the example to follow. Especially when using rust-polars core features, not in the public polars crate, I found the solutions in nodejs-polars.
My unit tests likely will catch implementation bugs, but whenever py-polars makes a behavior change, it requires I manually notice this and update/add such behavior too. I was asked what is lifecycle policy of rpolars, and I'm very interested to learn of your thoughts and experiences of maintaining nodejs-polars along the main projects rust-polars and py-polars.
best
If the problem was resolved, please update polars. :)
0.7.4
macOS
20.3.0
sum
will add bools when axis is 0
(i.e., by column), but not when axis is 1
(i.e., by row).
import('nodejs-polars').then(mod => {pl = mod})
let df = pl.DataFrame({
"a": [false, true, false],
"b": [true, true, false],
"c": [false, false, false]
});
// Column sum works correctly
df.sum(0);
// Proxy [
// shape: (1, 3)
// ┌─────┬─────┬─────┐
// │ a ┆ b ┆ c │
// │ --- ┆ --- ┆ --- │
// │ u32 ┆ u32 ┆ u32 │
// ╞═════╪═════╪═════╡
// │ 1 ┆ 2 ┆ 0 │
// └─────┴─────┴─────┘,
// {
// get: [Function: get],
// set: [Function: set],
// has: [Function: has],
// ownKeys: [Function: ownKeys],
// getOwnPropertyDescriptor: [Function: getOwnPropertyDescriptor]
// }
// ]
// Row sum throws an error
df.sum(1)
// thread '<unnamed>' panicked at '`add` operation not supported for dtype `bool`', /Users/runner/.cargo/git/checkouts/polars-b0d90607192fd414/af2948a/polars/polars-core/src/series/series_trait.rs:149:13
// Uncaught Error: `add` operation not supported for dtype `bool`
// at Proxy.sum (xxx/node_modules/nodejs-polars/bin/dataframe.js:395:50) {
// code: 'GenericFailure'
// }
Proxy [
shape: (3,)
Series: 'a' [u32]
[
1
2
0
],
{ get: [Function: get], set: [Function: set] }
]
0.7.1
MacOS
node 16.14.0
Getting a typescript "cannot find module" error when trying to use the latest version of Polars.
Install latest version, try to import it.
npm install nodejs-polars --save
--
import * as pl from 'nodejs-polars';
node_modules/nodejs-polars/bin/series/series.d.ts:9:37 - error TS2307: Cannot find module '@polars/lazy/expr' or its corresponding type declarations.
9 import { InterpolationMethod } from "@polars/lazy/expr";
~~~~~~~~~~~~~~~~~~~
Found 1 error in node_modules/nodejs-polars/bin/series/series.d.ts:9
It should import.
What do you think polars should have done?
I think this line is the culprit: https://github.com/pola-rs/nodejs-polars/pull/22/files#diff-593c508a021e0588e493c0b6207a578a7237471bca1892b20eaee7a4f0736930R13
If I've searched correctly, it looks like the only place in the repo which uses that namespaced import, so maybe just revert it to the regular syntax? Happy to open a PR if you'd like me to.
Hello. How to print full DataFrame object data in console ?
import pl from 'nodejs-polars';
import * as util from "util";
let df: pl.DataFrame;
df = pl.DataFrame({
"row": [
"A", "B", "C", "D", "E", "F",
"A2", "B2", "C2", "D2", "E2", "F2",
"G", "H", "J", "K", "L", "M"
],
});
console.log(util.inspect(df, true, null, true))
ts-node -r tsconfig-paths/register apps/ms-analytics/src/df.ts
Result:
yes
If the problem was resolved, please update polars. :)
Replace this text with the version.
Replace this text with your operating system and version.
ex: node 16.10.0
Failed to create napi buffer, when use dataframe.toJson()
but when i set params space string,it works well.
If possible, please include a minimal simple example on a dataset that is created through code:
Please use code instead of images, we don't like typing.
If the example is large, put it in a gist: https://gist.github.com/
If the example is small, put it in code fences:
your
code
goes
here
Example
import pl from "nodejs-polars"
// Create a simple dataset on which we can reproduce the bug.
pl.DataFrame({
"foo": [None, 1, 2],
"bar": [1n,2n,3n]
})
If we cannot reproduce the bug, it is unlikely that we will be able fix it.
Please remove clutter from your examples. Only include the bare minimum to produce the result.
So please:
Show the query you ran and the actual output.
If the output is large, put it in a gist: https://gist.github.com/
If the output is small, put it in code fences:
your
output
goes
here
What do you think polars should have done?
0.7.3
Macos 13.0
node 18.6.0
Memory usage seems unreasonably large and slow when passing data to nodejs-polars
.
With the example code below, it takes about 20 seconds and balloons the memory usage to around 3.5GB.
When doing a simple copy of the data in JS only, the operation takes 10-20ms and peak memory usage is around 325MB:
const { DataFrame, Series } = require("nodejs-polars")
function runBuggyCode(entries) {
let df = new DataFrame(entries)
const timestampSeries = new Series("created_at", new Array(df.height).fill(Date.now()))
df = df.withColumn(timestampSeries)
df.writeParquet()
}
async function main() {
const data = Array(50000)
.fill(null)
.map((_, i) =>
Array(100)
.fill(0)
.map((_, ii) => i * 100 + ii)
)
let count = 0
let peakMemUsage = 0
while (true) {
const start = Date.now()
runBuggyCode(data)
console.log("process time:", Date.now() - start, "ms")
peakMemUsage = Math.max(process.memoryUsage().rss, peakMemUsage)
console.log("peak rss mem usage:", Math.round(peakMemUsage / 1024 ** 2), "MB")
console.log("run:", count++)
await new Promise((res) => setTimeout(res, 100))
}
}
if (require.main === module) {
void main().catch((err) => console.error(err))
}
Example JS replacement for comparison:
function runBuggyCode(entries) {
const copy = entries.map(row => row.slice())
}
Code runs as expected, it just uses what seems to be an unreasonable amount of memory.
Memory usage should be somewhat comparable to "maybe" 2x the usage for the data when not using library.
Hi - would it be possible to add a LICENSE file to this repo?
I'd like to use this within my current workplace but there is a scan that is done before the library can be internally mirrored. This import process currently fails because it can't find a valid license for this library.
Thanks!
I"m having some issues with categoricals 'on the border':
filtered.select(pl.col(pl.Categorical).cast(pl.Utf8));
get's me 'Failed to convert JavaScript value Object {"DataType":"Categorical"}
into rust type String
' (possibly, I'm holding this wrong)and slightly related: with_columns seems to be missing?
Could you provide some insights, thank you!
I also asked this on stackoverflow here, but as primarily I'm asking for help finding a feature that definitely does exist through the python bindings I think this may be a more appropriate place.
I'm trying to use polars to calculate the results of ranked choice elections in a hypothetical space. I was able to get this working through the python bindings here without too much trouble, but it depends on list context features, specifically pl.element()
, in order to do some of the calculations. The problem I'm running into is that there does not appear to be an equivalent feature in the nodejs bindings. Is there a different way to talk about the element of a list? Is there a way to do something like this;
lambda loser: pl.col("vote").arr.eval(pl.element().filter(pl.element() != loser)).alias("vote")
within nodejs-polars, or without using pl.element()
?
Using Node.JS
"nodejs-polars": "^0.2.0"
MacOS Big Sur 11.1
Reading in a buffer from an .ipc
(ArrowStream) file using readIPC
fails with Error: Arrow file does not contain correct header
. At the same time the file is not corrupt since it can be loaded using apache-arrow's Table.from
method
See code example below. I'll post both the .arrow file (works) and .ipc file (doesn't work) as attachment
const pl = require('nodejs-polars');
const { Table } = require('apache-arrow')
const { readFileSync } = require('fs');
const fromArrow = readFileSync('hits.arrow');
const fromIPC = readFileSync('hits.ipc');
// Read Arrow file by Arrow.js -> works
const df = Table.from([fromArrow])
console.log("df", df.count()) // 10
// Read Arrow file by polars -> works
const dfPolars = pl.readIPC(fromArrow)
console.log("dfPolars", dfPolars) // prints nice table with 10 entries
// Read IPC (ArrowStream) file by Arrow.js -> works
const dfIpc = Table.from([fromIPC])
console.log("dfIpc", dfIpc.count()) // 10
// Read IPC (ArrowStream) by polars -> Fails
const dfIpcPolars = pl.readIPC(fromIPC)
console.log("dfIpcPolars", dfIpcPolars) // Error: Arrow file does not contain correct header
I'm using deno with nodejs-polars, works great except of some minor bug
I like to check permissions when I use deno, so with nodejs-polars it looks like this
Permissions:
{
read: [
"/home/mrcool/.deno/bin/deno",
"/usr/bin/ldd",
"/home/mrcool/.cache/deno/npm/registry.npmjs.org/nodejs-polars/0.8.0/bin/nodejs-polars.linux-x64-gnu."... 4 more characters
],
write: [],
net: [],
env: "all",
run: [ "/bin/sh" ],
ffi: [
"/home/mrcool/.cache/deno/npm/registry.npmjs.org/nodejs-polars-linux-x64-gnu/0.8.0/nodejs-polars.linu"... 14 more characters
]
}
It all seem reasonable except of run: sh
, this is invoked because of https://github.com/pola-rs/nodejs-polars/blob/main/polars/native-polars.js#L14 shelling out to call which
I believe a better way is to use https://www.npmjs.com/package/which it seems to be a popular library with a total of one dependency
I think its a better change overall (small performance optimization, more robust since which can be not installed on the system, )
If this sound good I can make a PR
Linux 5.4.0-132-generic #148-Ubuntu
node --version
v16.13.2
Cannot filter for string values
> df = pl.DataFrame({"foo": ["a", "b", "c"]})
Proxy [
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ str │
╞═════╡
│ a │
├╌╌╌╌╌┤
│ b │
├╌╌╌╌╌┤
│ c │
└─────┘,
{
get: [Function: get],
set: [Function: set],
has: [Function: has],
ownKeys: [Function: ownKeys],
getOwnPropertyDescriptor: [Function: getOwnPropertyDescriptor]
}
]
> df.filter(pl.col("foo").eq("b"))
Uncaught Error: Not found: b
at Object.collectSync (/home/user/dev/workspaces/node_modules/nodejs-polars/bin/lazy/dataframe.js:62:53)
at Proxy.filter (/home/user/dev/workspaces/node_modules/nodejs-polars/bin/dataframe.js:155:18) {
code: 'GenericFailure'
}
The filter fails
The filter should succeed
You can get around this error by placing the string value in a series
> df = pl.DataFrame({"foo": ["a", "b", "c"]})
Proxy [
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ str │
╞═════╡
│ a │
├╌╌╌╌╌┤
│ b │
├╌╌╌╌╌┤
│ c │
└─────┘,
{
get: [Function: get],
set: [Function: set],
has: [Function: has],
ownKeys: [Function: ownKeys],
getOwnPropertyDescriptor: [Function: getOwnPropertyDescriptor]
}
]
> df.filter(pl.col("foo").eq(pl.Series(["b"])))
Proxy [
shape: (1, 1)
┌─────┐
│ foo │
│ --- │
│ str │
╞═════╡
│ b │
└─────┘,
{
get: [Function: get],
set: [Function: set],
has: [Function: has],
ownKeys: [Function: ownKeys],
getOwnPropertyDescriptor: [Function: getOwnPropertyDescriptor]
}
]
Hey there,
I'm trying to filter on multiple columns, any of which matching should satisfy my filter.
Expanding from one to multiple columns gave me the awesome error message
'This is ambiguous. Try to combine the predicates with the 'all_horizontal' or `any_horizontal' expression.'
Alas I can't find either any_horizontal, all_horizontal, or any for that matter in nodejs-polars.
How I would have expected this to work:
let filtered = uq.filter(
pl.any_horizontal(
pl.col(index_columns).cast(pl.Utf8).str.toLowerCase().str.contains(q),
),
);
Is it not implemented (yet), or am I missing it in the docs?
0.7.2
Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-1027 aarch64)
node 16.19.0
getting Cannot find module 'nodejs-polars-linux-arm64-musl'
error
I trying to migrate a express API to use nodejs-polars. However, I'm struggling to reading data from a mysql database. In Python there is the read_sql
function, but it does not exist in nodejs version.
I tried to use mysql
library to query the db and then transform to polars using the function readRecords
but it is not working. It throws this error:
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src\dataframe.rs:1527:29
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
If I reduced the query to a single column, it works. So, I guess it's a data format problem. How can I figure it out where the problem is? Apart from checking column by column and possibly row by row.
Thank you!
Most functions in io.ts have no typings in their options parameter; even though there are options in their jsdoc. Just add an options type/interface so that we can have intellisense.
If the problem was resolved, please update polars. :)
nodejs-polars version 0.6.0
Windows 11
node v16.16.0
The compilation with npm produces the following log
log.txt
Install polars with npm i -s nodejs-polars
in a project with webpack and run it.
The actual behaviour is reported in the log.
The compilation runs smoothly.
const pl = require("nodejs-polars");
const df = pl.readCSV("https://j.mp/iriscsv")
console.log(df)
it doesn't download the link, but instead just use the string in a table
Also another issue pl.all() seems to be not reexported
nodejs-polars/polars/lazy/functions.ts
Line 170 in da804fc
I'm just following the book https://pola-rs.github.io/polars-book/user-guide/quickstart/intro.html and writing the issues I find
Is there a version of concat
available for LazyDataFrame
s? E.g., is it possible to do something like pl.concat([lazy1, lazy2], {how: "vertical"})
? If not, what would it take to add it?
0.7.3
Linux amd x64 (Debian 12)
node 18.13.0
Awesome to see a new release of this (as I've been working around some of the open issues and missing features). It looks like from 0.7.2 to 0.7.3 there was a regression (perhaps intentional) and the when
function no longer exists as a property of the default export.
This was caught by my CI trying to compile my typescript on the upgrade PR.
Example
import pl, { LazyDataFrame } from "nodejs-polars";
function transform(data: LazyDataFrame): LazyDataFrame {
return data.withColumn(
pl
.when(pl.lit(true))
.then(pl.lit("Positive"))
.otherwise(pl.lit("Negative"))
.alias("someCol")
);
}
error TS2339: Property 'when' does not exist on type 'typeof pl'.
Can be fixed by importing when
directly.
Works as did previously.
Either included this change in the changelog (if intentional) or not introduced a regression
If the problem was resolved, please update polars. :)
0.7.4
Mac OSX 13.2.1 and Amazon Linux (AWS Lambda)
node v16.19.1
The list of datatypes is incomplete. I see the python version has a binary type.
import * as pl from 'nodejs-polars';
pl.Series('bytes', Uint8Array.from([]), pl.DataType.Binary) // Does not compile
Uint8Array
is cast to List(Uint8)
which is not the same in Parquet and Arrow. We're trying to read the Parquet files in Snowflake but the binary columns end up as empty objects as Snowflake is confused by this custom List type.
A Binary
type that adheres to Arrow and Parquet specs.
0.7.2
Mac M1
v16.18.0
but this is a request/question regarding bun.js compatibility. My bun.js version is v0.5.7
Using bun.js
instead of Node.js, when I try to use any I/O
functions like pl.readCSV()
or pl.scanCSV()
, bun.js
crashes giving this error:
bun index.js
125 | parseDates: false,
126 | skipRowsAfterHeader: 0
127 | };
128 | function scanCSV(path, options) {
129 | options = { ...scanCsvDefaultOptions, ...options };
130 | return (0, dataframe_2._LazyDataFrame)(polars_internal_1.default.scanCsv(path, options));
^
error: InvalidArg
at scanCSV (/Users/home/Documents/js_backtest_engine/node_modules/nodejs-polars/bin/io.js:130:43)
at /Users/home/Documents/js_backtest_engine/index.js:32:9
import pl from "nodejs-polars"
let df = pl.readCSV('data.csv')
I understand the library is called nodejs
-polars and bun.js
is not even in the map (nor should it be i guess) and in that sense the issue is more related to bun
compatibility with node
than an issue with the polars' bindings itself. That being said, since the package is aimed at extracting the most performance from JS in general, I can see bun
is consistently 2x-10x faster than Node in quite any situation really, and it also shows when manipulating Dataframes
with nodejs-polars
. Hence I wondered I'd give it a shot to ask if anyone has any idea why such error appears and if anything could make the I/O
functions compatible with bun
too?
Note that, as I mentionned, Dataframe
and Series
do work like a charm with bun
, so this is specifically something going on with I/O
. I reckon that to create a stream in bun
I do const stream = Bun.file(this.filePath).stream();
instead of const stream = fs.createReadStream(this.filePath);
with Node. So I thought I'd allow myself to post this bug in case some quick fix/workaround could be found.
The nodejs-polars contribuiting guideline currently refers to the main contrib guideline which in term refers back to the nodejs-polars one. How do we actually contribuite to the codebase?
^0.7.4
Windows 11
v18.8.0
Cannot run the library due to esbuild
No loader is configured for ".node" files: node_modules/nodejs-polars-win32-x64-msvc/nodejs-polars.win32-x64-msvc.node
node_modules/nodejs-polars/bin/native-polars.js:68:48:
68 │ nativeBinding = require('nodejs-polars-win32-x64-msvc');
Another point that I want to add that the project that im building is based on vite + react
[yes]
0.7.3
Mac M1 on Ventura 13.2.1
node v19.2.0
.readRecords()
method crashes when called with a Schema.
const pl = require('nodejs-polars');
const rows = [
{ num: 1, date: "foo", string: "foo1" },
{ num: 1, date: "foo" },
];
const schema = {
num: pl.Int32,
date: pl.Utf8,
string: pl.Utf8,
};
const df = pl.readRecords(rows, { schema });
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/conversion.rs:613:37
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
Since the provided code is a copy-paste from the tests, here is the entire test set:
test("from row objects, with schema", () => {
const rows = [
{ num: 1, date: "foo", string: "foo1" },
{ num: 1, date: "foo" },
];
const expected = [
{ num: 1, date: rows[0].date.toString(), string: "foo1" },
{ num: 1, date: rows[1].date.toString(), string: null },
];
const schema = {
num: pl.Int32,
date: pl.Utf8,
string: pl.Utf8,
};
const df = pl.readRecords(rows, { schema });
expect(df.toRecords()).toEqual(expected);
expect(df.schema).toEqual(schema);
});
Hi,
I would love to use the library also in the frontend, but receive module not found messages for "none" browser libs like fs, path and so on..
Is it possilbe to configure/import the library also into a web application and run it in the browser?
Kind Regards,
Oliver
0.7.3
5.15.94-1-MANJARO
node 18
If the package.json says type: module
{
"name": "fresh-ts-project",
"version": "1.0.0",
"main": "index.js",
"license": "MIT",
"type": "module",
"dependencies": {
"@types/node": "^18.15.11",
"nodejs-polars": "^0.7.3",
"typescript": "^5.0.2"
}
}
tsconfig
{
"compilerOptions": {
"module": "es6",
"moduleResolution": "node"
},
"include": [
"main.ts"
]
}
and main.ts
import pl from 'nodejs-polars'
console.log('a', pl.Datetime)
// @ts-ignore
console.log('b', pl.DataType.Datetime)
it prints
tsc && node main.js
a undefined
b [Function: Datetime]
I don't remember the specifics why we have those two values in our tsconfig and "type": "module"
, but it's usually because imports of other packages are broken without it.
furthermore, typescript will think that pl.DataType.Datetime is an error, unless it is imported via import * as pl from 'nodejs-polars';
.
Hi, This looks interesting
I wonder how feature complete this is (are things missing in node api compared to rust/python?), if it is production ready and how committed you are to supporting nodejs going forward (or if this was more of a "proof-of-concept") (reading this) and the following
"Releases happen quite often (weekly / every few days) at the moment, so updating polars regularly to get the latest bugfixes / features might not be a bad idea."
In the README.md
there is a link to node documentation which gives a 404. In the general user guide I see some references to javascript here and there, but examples are in rust/python.
I totally get that this is a new endeavour and things take time, I just want to assess if and / or when this might be ready for use and what I can except going forward.
I hope this question is OK, I think I am not the only one who has these questions in mind when discovering this repository for the first time. Maybe add a few answers to these types of questions in the readme?
As mentioned in #6, there is a broken link in README.md from the polars-> nodejs-polars migration.
https://pola-rs.github.io/polars/nodejs-polars/html/index.html
@universalmind303 could you push the code to this new repo. You may also do a git filter branch to keep only the node-js related code in the history?
If the problem was resolved, please update polars. :)
0.7.2
Windows
v18.12.1
When using either anti or semi for the how options, I'm getting a typescript error:
Overload 1 of 3, '(other: LazyDataFrame, joinOptions: { on: ValueOrArray<string | Expr>; } & LazyJoinOptions): LazyDataFrame', gave the following error.
Type '"semi"' is not assignable to type '"left" | "inner" | "outer" | "cross" | undefined'.
Overload 2 of 3, '(other: LazyDataFrame, joinOptions: { leftOn: ValueOrArray<string | Expr>; rightOn: ValueOrArray<string | Expr>; } & LazyJoinOptions): LazyDataFrame', gave the
following error.
Type '"semi"' is not assignable to type '"left" | "inner" | "outer" | "cross" | undefined'.
Overload 3 of 3, '(other: LazyDataFrame, options: { how: "cross"; suffix?: string | undefined; allowParallel?: boolean | undefined; forceParallel?: boolean | undefined; }): Lazy
DataFrame', gave the following error.
Type '"semi"' is not assignable to type '"cross"'.
18 const result = await df.join(otherDF, { leftOn: "ham", rightOn: 'ham2', how: "semi" }).collect();
import pl from "nodejs-polars";
const df = pl
.DataFrame({
foo: [1, 2, 3],
bar: [6.0, 7.0, 8.0],
ham: ["a", "b", "c"],
})
.lazy();
const otherDF = pl
.DataFrame({
apple: ["x", "y", "z"],
ham2: ["a", "b", "d"],
})
.lazy();
const result = await df.join(otherDF, { leftOn: "ham", rightOn: 'ham2', how: "semi" }).collect();
Overload 1 of 3, '(other: LazyDataFrame, joinOptions: { on: ValueOrArray<string | Expr>; } & LazyJoinOptions): LazyDataFrame', gave the following error.
Type '"semi"' is not assignable to type '"left" | "inner" | "outer" | "cross" | undefined'.
Overload 2 of 3, '(other: LazyDataFrame, joinOptions: { leftOn: ValueOrArray<string | Expr>; rightOn: ValueOrArray<string | Expr>; } & LazyJoinOptions): LazyDataFrame', gave the
following error.
Type '"semi"' is not assignable to type '"left" | "inner" | "outer" | "cross" | undefined'.
Overload 3 of 3, '(other: LazyDataFrame, options: { how: "cross"; suffix?: string | undefined; allowParallel?: boolean | undefined; forceParallel?: boolean | undefined; }): Lazy
DataFrame', gave the following error.
Type '"semi"' is not assignable to type '"cross"'.
18 const result = await df.join(otherDF, { leftOn: "ham", rightOn: 'ham2', how: "semi" }).collect();
A successful join
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.