GithubHelp home page GithubHelp logo

pola-rs / nodejs-polars Goto Github PK

View Code? Open in Web Editor NEW
316.0 316.0 33.0 52.45 MB

nodejs front-end of polars

Home Page: https://pola-rs.github.io/nodejs-polars/

License: MIT License

TypeScript 70.60% JavaScript 1.57% Rust 27.83%
polars

nodejs-polars's People

Contributors

alex-patow avatar bidek56 avatar brooooooklyn avatar cnpryer avatar cojmeister avatar controversial avatar denbezrukov avatar dhruv-1001 avatar gustaferiksson avatar icequeen3333 avatar invakid404 avatar jly36963 avatar johanroelofsen avatar littledian avatar rgbkrk avatar ritchie46 avatar ryanrussell avatar sezanzeb avatar stinodego avatar universalmind303 avatar vreyespue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nodejs-polars's Issues

Fail to install on macOS ARM64

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

$ yarn add nodejs-polars

➤ YN0000: ┌ Resolution step
➤ YN0001: │ Error: nodejs-polars-android-arm64@npm:0.6.0: No candidates found
    at ce (/Users/chenzili/.cache/node/corepack/yarn/3.2.2/yarn.js:439:7864)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Promise.allSettled (index 0)
    at async go (/Users/chenzili/.cache/node/corepack/yarn/3.2.2/yarn.js:390:10446)
➤ YN0000: └ Completed in 1s 159ms
➤ YN0000: Failed with errors in 1s 161ms

Reproducible example

yarn add nodejs-polars

Expected behavior

Successfully install.

withRowCount() seems to reset its output

Have you tried latest version of polars?

  • [yes]

What version of polars are you using?

tested this with 0.6.0 and 0.5.4

What operating system are you using polars on?

Windows 10

What node version are you using

node 16.17.0

Describe your bug.

When you chain withRowCount to scanCSV, the row count column seems to reset after a while.

What are the steps to reproduce the behavior?

Use a CSV with sufficient lines (it seems you need at least 1500 lines). Use scanCSV and withRowCount.
When manually initializing the LazyDataFrame, withRowCount seems to work as intended.

import pl from "nodejs-polars";
import * as fs from "node:fs";

const data = [...Array(5000).keys()];
fs.writeFileSync("data.csv", data.join("\n"));

const lf1 = await pl.DataFrame(data).lazy().withRowCount().collect();
const lf2 = await pl.scanCSV("data.csv").withRowCount().collect();

console.log(lf1);
console.log(lf2);

What is the actual behavior?

lf1:
┌────────┬──────────┐
│ row_nr ┆ column_0 │
│ ---    ┆ ---      │
│ u32    ┆ f64      │
╞════════╪══════════╡
│ 0      ┆ 0.0      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1      ┆ 1.0      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2      ┆ 2.0      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3      ┆ 3.0      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ ...    ┆ ...      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4996   ┆ 4996.0   │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4997   ┆ 4997.0   │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4998   ┆ 4998.0   │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4999   ┆ 4999.0   │
└────────┴──────────┘

lf2:
shape: (4999, 2)
┌────────┬──────┐
│ row_nr ┆ 0    │
│ ---    ┆ ---  │
│ u32    ┆ i64  │
╞════════╪══════╡
│ 0      ┆ 1    │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1      ┆ 2    │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2      ┆ 3    │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3      ┆ 4    │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ ...    ┆ ...  │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1186   ┆ 4996 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1187   ┆ 4997 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1188   ┆ 4998 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1189   ┆ 4999 │
└────────┴──────┘

What is the expected behavior?

I would expect the withRowCount to not reset halfway

What do you think polars should have done?

`to_datetime` function to convert strings to dates

Describe your feature request

Convert a string column into a Datetime column, with the ability to supply a format argument and other options.

export const to_datetime = (
        format: str,
        time_unit: TimeUnit | None = None,
        time_zone: str | None = None,
        strict: bool = True,
        exact: bool = True,
        cache: bool = True,
        utc: bool | None = None,
): Expr => {
        ...
}

Python/Rust API: https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.str.to_datetime.html#polars-expr-str-to-datetime

Python Implementation: https://github.com/pola-rs/polars/blob/1313e59009edd1d6e6f85ef9be32c4706cc4d0b8/py-polars/polars/expr/string.py#L80-L159

I saw this function but I don't think this is an entrypoint/exposed in the public API:

pub fn str_to_datetime(

`pl.readJSON()` parse new line `\n` wrongly.

Have you tried latest version of polars?

  • [yes]

What version of polars are you using?

0.8.1

What operating system are you using polars on?

macOS 13.4 intel

What node version are you using

node 18.16.0

Describe your bug.

pl.readJSON() parse new line \n wrongly.

\n => \\

What are the steps to reproduce the behavior?

test.json file

[
  {
    "key": "\n"
  }
]

Example
index.js

const pl = require('nodejs-polars'); 

const df = pl.readJSON("./test.json");
console.log(df.select(pl.col("*")).toRecords());

What is the actual behavior?

node index.js output

[ { key: '\\n�' } ]

What is the expected behavior?

[ { key: '\n所' } ]

`col.str` is missing `jsonExtract` function

I want to convert nested json columns in my csv files with col('key').str.jsonExtract like python with pl.col('key').str.json_extract.
Unfortunately, it looks like the method is not implemented.

`toRecords` fails to convert `pl.Date` values

Have you tried latest version of polars?

  • yes

What version of polars are you using?

  • 0.7.2

What operating system are you using polars on?

  • Debian 12

What node version are you using

  • node 18.13.0

Describe your bug.

  • With a pl.Date column in the data frame, using the toRecords function produces objects which have dates at the unix epoch.

What are the steps to reproduce the behavior?

import pl from "nodejs-polars";

let df = pl.DataFrame({
  date: [new Date()],
});

df = df.withColumn(pl.col("date").cast(pl.Date).alias("date"));

console.log(df.toString());
console.log(df.toRecords());

What is the actual behavior?

shape: (1, 1)
┌────────────┐
│ date       │
│ ---        │
│ date       │
╞════════════╡
│ 2023-03-09 │
└────────────┘

{date: Thu Jan 01 1970 01:00:19 GMT+0100 (Greenwich Mean Time)}

What is the expected behavior?

shape: (1, 1)
┌────────────┐
│ date       │
│ ---        │
│ date       │
╞════════════╡
│ 2023-03-09 │
└────────────┘

{date: Thu Mar 09 2023 00:00:00 GMT+0100 (Greenwich Mean Time)}

`readJson` throws fatal error

Have you tried latest version of polars?

  • [yes]

What version of polars are you using?

0.7.2

What operating system are you using polars on?

macOS Ventura Version 13.1 (22C65)

What node version are you using

ex: node 18.12.1

Describe your bug.

Using readJson results in a fatal error.

What are the steps to reproduce the behavior?

import pl from "nodejs-polars"

  const jsonString = `
{"a", 1, "b", "foo", "c": 3}
{"a": 2, "b": "bar", "c": 6}
`
const df = pl.readJSON(jsonString)

What is the actual behavior?

Full trace:

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ExternalFormat("InvalidToken(44)")', /Users/runner/.cargo/git/checkouts/polars-b0d90607192fd414/43598c3/polars/polars-io/src/ndjson_core/ndjson.rs:161:90
stack backtrace:
   0:        0x12325a7b8 - _napi_register_module_v1
   1:        0x12327688c - _napi_register_module_v1
   2:        0x123257750 - _napi_register_module_v1
   3:        0x12325a5cc - _napi_register_module_v1
   4:        0x12325bd40 - _napi_register_module_v1
   5:        0x12325ba98 - _napi_register_module_v1
   6:        0x12325c364 - _napi_register_module_v1
   7:        0x12325c184 - _napi_register_module_v1
   8:        0x12325ac20 - _napi_register_module_v1
   9:        0x12325bee0 - _napi_register_module_v1
  10:        0x12334f078 - _napi_register_module_v1
  11:        0x12334f260 - _napi_register_module_v1
  12:        0x1223e9c34 - <unknown>
  13:        0x121ebb94c - <unknown>
  14:        0x121ebc12c - <unknown>
fatal runtime error: failed to initiate panic, error 5

What is the expected behavior?

Should read the JSON string properly

What do you think polars should have done?
Should have read the JSON string properly

Catching errors

This is just something I threw into the node console to see what happens:

const pl = require('nodejs-polars');
df = pl.DataFrame()
try { df.withColumn(pl.arange(0, 5).alias('foo').cast('asdf')) } catch(e) { console.error(e) }
thread '<unnamed>' panicked at 'not yet implemented', src/conversion.rs:596:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
[1]    27374 IOT instruction (core dumped)  node

This just crashes the complete node runtime, I can't catch errors.

I don't know anything about rust, but I guess there is some sort of exception handling there as well? Can errors be made to bubble up into node?

Map and apply function to perform custom operations

Ability to perform custom operations by using the map and apply functions

Feature demonstration with expected result

I would like to have the ability to perform a custom operation on a Series/Expr as such:

import pl from 'nodejs-polars';

const data = [
	{ a: 'A', b: 10},
	{ a: 'B', b: 20},
	{ a: 'B', b: 13},
	{ a: 'C', b: 40},
];

const mapping = { D: 'OtherD'};

const df = pl.DataFrame(data);
const dfWithOtherA = df.withColumns([pl.col("a").map((val) => mapping[val]).alias('otherA')]);

/* expected result:
  ┌─────┬──────┬────────┐
  │ a   ┆ b    ┆ otherA │
  │ --- ┆ ---  ┆ ---    │
  │ str ┆ f64  ┆ str    │
  ╞═════╪══════╪════════╡
  │ A   ┆ 10.0 ┆ A      │
  ├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
  │ B   ┆ 20.0 ┆ OtherB │
  ├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
  │ B   ┆ 13.0 ┆ OtherB │
  ├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
  │ C   ┆ 40.0 ┆ C      │
  └─────┴──────┴────────┘
*/

The expected behavior should follow what is described in the polars docs.

Other comments

I would be happy to help with some guidance, but I still am not aware of polars internals to go for a PR.
Thank you for the awesome work in all other APIs and functions, keep the great work! :)

Cannot create dataframe containing null values

Have you tried latest version of polars?

  • yes
  • no

What version of polars are you using?

0.6.0

What operating system are you using polars on?

Linux 5.4.0-132-generic #148-Ubuntu

What node version are you using

node --version
v16.13.2

Describe your bug.

Try to create a dataframe with a value that is null results in an unwrap on a None value in rust and a panic.

What are the steps to reproduce the behavior?

Try to create a dataframe with a null value

What is the actual behavior?

node                              
Welcome to Node.js v16.13.2.
Type ".help" for more information.
> pl = require('nodejs-polars')

> pl.DataFrame([{a:1, b:2, c:null}])
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/dataframe.rs:1548:29

What is the expected behavior?

Polars should have created the dataframe with a null value for column c

> pl.DataFrame([{a:1, b:2, c:null}])
Proxy [
  shape: (1, 3)
  ┌─────┬─────┬─────┐
  │ a   ┆ b   ┆ c   │
  │ --- ┆ --- ┆ --- │
  │ f64 ┆ f64 ┆ f64 │
  ╞═════╪═════╪═════╡
  │ 1.0 ┆ 2.0 ┆ null │
  └─────┴─────┴─────┘,
  {
    get: [Function: get],
    set: [Function: set],
    has: [Function: has],
    ownKeys: [Function: ownKeys],
    getOwnPropertyDescriptor: [Function: getOwnPropertyDescriptor]
  }
]

Why doesn't datatype.ts export Time?

In datatypes/datatype.ts there is an export of the namespace DataType with all of the previously declared datatypes; but time is missing, meaning that we cannot cast to time - to a format that will measure HH:MM:SS.ssss for durations or timestamps.
Furthermore - why not initialize time with a time unit - make it ms by default but access TimeUnit and use that as a unit? (This might be a feature request)

RFC: Should nodejs-polars use `snake_case` method names?

Early in the development, the decision to use camelCase was made to make the library more closely aligned with JS standards. However, much of methods are not a 1-1, or don't translate well to camelCase. I'm wondering if users would prefer the methods be renamed to exactly match the python/rust equivalents. Which would minimize context switching between languages.

ex:
readCSV -> read_csv

Please upvote with one of the following emojis
👍 for snake_case
👎 for camelCase

Running `fold` to perform a row sum gives an error code `DateExpected`.

Have you tried latest version of polars?

  • [yes]✓
  • [no]

If the problem was resolved, please update polars. :)

What version of polars are you using?

0.7.4

What operating system are you using polars on?

macOS

What node version are you using

20.3.0

Describe your bug.

Running fold to perform a row sum gives an error code DateExpected.

What are the steps to reproduce the behavior?

import('nodejs-polars').then(mod => {pl = mod})
df = pl.DataFrame({
    "a": [1, 2, 3],
    "b": [1, 2, 3],
    "c": [1, 2, 3]
});
df.fold((s1, s2) => s1.plus(s2))

What is the actual behavior?

Uncaught Error
    at dtypeWrap (xxx/node_modules/nodejs-polars/bin/series/index.js:29:65)
    at Proxy.plus (xxx/node_modules/nodejs-polars/bin/series/index.js:428:20)
    at REPL3:1:24
    at xxx/node_modules/nodejs-polars/bin/dataframe.js:166:60
    at Array.reduce (<anonymous>)
    at Proxy.fold (xxx/node_modules/nodejs-polars/bin/dataframe.js:166:38) {
  code: 'DateExpected'
}

What is the expected behavior?

Series: 'a' [f64]
[
    3
    6
    9
]

`col.str.contains` does not respect case insensitive regex object

Have you tried latest version of polars?

  • yes

What version of polars are you using?

0.8.0

What operating system are you using polars on?

Debian

What node version are you using

v18.13.0

Describe your bug.

Using the str.contains function of a column doesn't respect the case insensitive flag of a provided regex object.

What are the steps to reproduce the behavior?

Use str.contains on a column, providing a regex object created with the "i" flag for case insensitivity. Test it on a sample with different casing than the original regex pattern.

Example

import pl from "nodejs-polars"

let df = pl.DataFrame({
    "text": ["foo", "FOO", "FoO"],
})

const regex = new RegExp("foo", "i")

df = df.withColumn(pl.col("text").str.contains(regex).alias("result"))

console.log(df.toString())

What is the actual behavior?

The contains function does not match all case variations.

shape: (3, 2)
┌──────┬────────┐
│ text ┆ result │
│ ---  ┆ ---    │
│ str  ┆ bool   │
╞══════╪════════╡
│ foo  ┆ true   │
│ FOO  ┆ false  │
│ FoO  ┆ false  │
└──────┴────────┘

What is the expected behavior?

The contains function should match all variations. Probably by injecting the appropriate (?i) and (?-i) flags for polars to interpret.

shape: (3, 2)
┌──────┬────────┐
│ text ┆ result │
│ ---  ┆ ---    │
│ str  ┆ bool   │
╞══════╪════════╡
│ foo  ┆ true   │
│ FOO  ┆ true   │
│ FoO  ┆ true   │
└──────┴────────┘

nodejs-polars 0.6 npm package is broken

When I add nodejs-polars to my package json it for some reasons asks:

? Please choose a version of "nodejs-polars-android-arm64" from this list: (Use arrow keys)
❯ 0.5.4 
  0.5.3 
  0.5.2 
  0.5.1 
  0.5.0 

It looks really strange as I have regular Linux Mint 21 there. While I have no issues like this with 0.5.4 version

How do nodejs-polars track/maintain feature-parity with py-polars?

Dear @universalmind303

I work on implementing polars for R and very much see nodejs-polars as the example to follow. Especially when using rust-polars core features, not in the public polars crate, I found the solutions in nodejs-polars.

My unit tests likely will catch implementation bugs, but whenever py-polars makes a behavior change, it requires I manually notice this and update/add such behavior too. I was asked what is lifecycle policy of rpolars, and I'm very interested to learn of your thoughts and experiences of maintaining nodejs-polars along the main projects rust-polars and py-polars.

best

`sum` will add bools by column but not by row

Have you tried latest version of polars?

  • [yes]✓
  • [no]

If the problem was resolved, please update polars. :)

What version of polars are you using?

0.7.4

What operating system are you using polars on?

macOS

What node version are you using

20.3.0

Describe your bug.

sum will add bools when axis is 0 (i.e., by column), but not when axis is 1 (i.e., by row).

What are the steps to reproduce the behavior?

import('nodejs-polars').then(mod => {pl = mod})
let df = pl.DataFrame({
    "a": [false, true, false],
    "b": [true, true, false],
    "c": [false, false, false]
});

// Column sum works correctly
df.sum(0);

// Proxy [
//   shape: (1, 3)
//   ┌─────┬─────┬─────┐
//   │ a   ┆ b   ┆ c   │
//   │ --- ┆ --- ┆ --- │
//   │ u32 ┆ u32 ┆ u32 │
//   ╞═════╪═════╪═════╡
//   │ 1   ┆ 2   ┆ 0   │
//   └─────┴─────┴─────┘,
//   {
//     get: [Function: get],
//     set: [Function: set],
//     has: [Function: has],
//     ownKeys: [Function: ownKeys],
//     getOwnPropertyDescriptor: [Function: getOwnPropertyDescriptor]
//   }
// ]

// Row sum throws an error
df.sum(1)

What is the actual behavior?

// thread '<unnamed>' panicked at '`add` operation not supported for dtype `bool`', /Users/runner/.cargo/git/checkouts/polars-b0d90607192fd414/af2948a/polars/polars-core/src/series/series_trait.rs:149:13
// Uncaught Error: `add` operation not supported for dtype `bool`
//     at Proxy.sum (xxx/node_modules/nodejs-polars/bin/dataframe.js:395:50) {
//   code: 'GenericFailure'
// }

What is the expected behavior?

Proxy [
  shape: (3,)
  Series: 'a' [u32]
  [
        1
        2
        0
  ],
  { get: [Function: get], set: [Function: set] }
]

Issue with scoped package

Have you tried latest version of polars?

  • [yes]

What version of polars are you using?

0.7.1

What operating system are you using polars on?

MacOS

What node version are you using

node 16.14.0

Describe your bug.

Getting a typescript "cannot find module" error when trying to use the latest version of Polars.

What are the steps to reproduce the behavior?

Install latest version, try to import it.

npm install nodejs-polars --save 

--

import * as pl from 'nodejs-polars';

What is the actual behavior?

node_modules/nodejs-polars/bin/series/series.d.ts:9:37 - error TS2307: Cannot find module '@polars/lazy/expr' or its corresponding type declarations.

9 import { InterpolationMethod } from "@polars/lazy/expr";
                                      ~~~~~~~~~~~~~~~~~~~


Found 1 error in node_modules/nodejs-polars/bin/series/series.d.ts:9

What is the expected behavior?

It should import.

What do you think polars should have done?

I think this line is the culprit: https://github.com/pola-rs/nodejs-polars/pull/22/files#diff-593c508a021e0588e493c0b6207a578a7237471bca1892b20eaee7a4f0736930R13

If I've searched correctly, it looks like the only place in the repo which uses that namespaced import, so maybe just revert it to the regular syntax? Happy to open a PR if you'd like me to.

Print full DataFrame object data in console

Hello. How to print full DataFrame object data in console ?

import pl from 'nodejs-polars';
import * as util from "util";

let df: pl.DataFrame;

df = pl.DataFrame({
    "row": [
        "A", "B", "C", "D", "E", "F",
        "A2", "B2", "C2", "D2", "E2", "F2",
        "G", "H", "J", "K", "L", "M"
    ],
   });

console.log(util.inspect(df, true, null, true))

ts-node -r tsconfig-paths/register apps/ms-analytics/src/df.ts

Result:

2022-10-11_102551

Failed to create napi buffer, when use `dataframe.toJson()`

Have you tried latest version of polars?

yes

If the problem was resolved, please update polars. :)

What version of polars are you using?

Replace this text with the version.

What operating system are you using polars on?

Replace this text with your operating system and version.

What node version are you using

ex: node 16.10.0

Describe your bug.

Failed to create napi buffer, when use dataframe.toJson()
image
but when i set params space string,it works well.
image

What are the steps to reproduce the behavior?

If possible, please include a minimal simple example on a dataset that is created through code:

Please use code instead of images, we don't like typing.

If the example is large, put it in a gist: https://gist.github.com/

If the example is small, put it in code fences:

your
code
goes
here

Example

import pl from "nodejs-polars"

// Create a simple dataset on which we can reproduce the bug.
pl.DataFrame({
    "foo": [None, 1, 2],
    "bar": [1n,2n,3n]
})

If we cannot reproduce the bug, it is unlikely that we will be able fix it.

Please remove clutter from your examples. Only include the bare minimum to produce the result.
So please:

  • strip unused columns
  • use short distinguishable names
  • don't include unneeded computations

What is the actual behavior?

Show the query you ran and the actual output.

If the output is large, put it in a gist: https://gist.github.com/

If the output is small, put it in code fences:

your
output
goes
here

What is the expected behavior?

What do you think polars should have done?

Slow and seemingly unreasonably large memory usage

Have you tried latest version of polars?

  • [yes]

What version of polars are you using?

0.7.3

What operating system are you using polars on?

Macos 13.0

What node version are you using

node 18.6.0

Describe your bug.

Memory usage seems unreasonably large and slow when passing data to nodejs-polars.

With the example code below, it takes about 20 seconds and balloons the memory usage to around 3.5GB.
image

When doing a simple copy of the data in JS only, the operation takes 10-20ms and peak memory usage is around 325MB:
image

What are the steps to reproduce the behavior?

const { DataFrame, Series } = require("nodejs-polars")

function runBuggyCode(entries) {
  let df = new DataFrame(entries)
  const timestampSeries = new Series("created_at", new Array(df.height).fill(Date.now()))
  df = df.withColumn(timestampSeries)
  df.writeParquet()
}

async function main() {
  const data = Array(50000)
    .fill(null)
    .map((_, i) =>
      Array(100)
        .fill(0)
        .map((_, ii) => i * 100 + ii)
    )

  let count = 0
  let peakMemUsage = 0
  while (true) {
    const start = Date.now()
    runBuggyCode(data)
    console.log("process time:", Date.now() - start, "ms")
    peakMemUsage = Math.max(process.memoryUsage().rss, peakMemUsage)
    console.log("peak rss mem usage:", Math.round(peakMemUsage / 1024 ** 2), "MB")
    console.log("run:", count++)
    await new Promise((res) => setTimeout(res, 100))
  }
}

if (require.main === module) {
  void main().catch((err) => console.error(err))
}

Example JS replacement for comparison:

function runBuggyCode(entries) {
  const copy = entries.map(row => row.slice())
}

What is the actual behavior?

Code runs as expected, it just uses what seems to be an unreasonable amount of memory.

What is the expected behavior?

Memory usage should be somewhat comparable to "maybe" 2x the usage for the data when not using library.

Please add LICENSE file to this project

Hi - would it be possible to add a LICENSE file to this repo?

I'd like to use this within my current workplace but there is a scan that is done before the library can be internally mirrored. This import process currently fails because it can't find a valid license for this library.

Thanks!

Categorical support

I"m having some issues with categoricals 'on the border':

  • you can't convert them to json ("not yet implemented in src/conversion.rs:147),
  • I can't cast all of them to string columns 'easily':
    filtered.select(pl.col(pl.Categorical).cast(pl.Utf8)); get's me 'Failed to convert JavaScript value Object {"DataType":"Categorical"} into rust type String' (possibly, I'm holding this wrong)

and slightly related: with_columns seems to be missing?

Could you provide some insights, thank you!

list context and row wise compute through the javascript API

I also asked this on stackoverflow here, but as primarily I'm asking for help finding a feature that definitely does exist through the python bindings I think this may be a more appropriate place.

I'm trying to use polars to calculate the results of ranked choice elections in a hypothetical space. I was able to get this working through the python bindings here without too much trouble, but it depends on list context features, specifically pl.element(), in order to do some of the calculations. The problem I'm running into is that there does not appear to be an equivalent feature in the nodejs bindings. Is there a different way to talk about the element of a list? Is there a way to do something like this;

lambda loser: pl.col("vote").arr.eval(pl.element().filter(pl.element() != loser)).alias("vote")

within nodejs-polars, or without using pl.element()?

[NodeJS]: readIPC from buffer fails with 'Arrow file does not contain correct header', while it works in ArrowJS

Using Node.JS

What version of polars are you using?

"nodejs-polars": "^0.2.0"

What operating system are you using polars on?

MacOS Big Sur 11.1

Describe your bug.

Reading in a buffer from an .ipc (ArrowStream) file using readIPC fails with Error: Arrow file does not contain correct header. At the same time the file is not corrupt since it can be loaded using apache-arrow's Table.from method

What are the steps to reproduce the behavior?

See code example below. I'll post both the .arrow file (works) and .ipc file (doesn't work) as attachment

const pl = require('nodejs-polars'); 
const { Table } = require('apache-arrow')
const { readFileSync } = require('fs');

const fromArrow = readFileSync('hits.arrow'); 
const fromIPC = readFileSync('hits.ipc'); 

// Read Arrow file by Arrow.js -> works
const df = Table.from([fromArrow])
console.log("df", df.count()) // 10

// Read Arrow file by polars -> works
const dfPolars = pl.readIPC(fromArrow)
console.log("dfPolars", dfPolars) // prints nice table with 10 entries

// Read IPC (ArrowStream) file by Arrow.js -> works
const dfIpc = Table.from([fromIPC])
console.log("dfIpc", dfIpc.count()) // 10

// Read IPC (ArrowStream) by polars -> Fails
const dfIpcPolars = pl.readIPC(fromIPC)
console.log("dfIpcPolars", dfIpcPolars) // Error: Arrow file does not contain correct header


use npm which package instead of shelling out

I'm using deno with nodejs-polars, works great except of some minor bug

I like to check permissions when I use deno, so with nodejs-polars it looks like this

Permissions:
{
  read: [
    "/home/mrcool/.deno/bin/deno",
    "/usr/bin/ldd",
    "/home/mrcool/.cache/deno/npm/registry.npmjs.org/nodejs-polars/0.8.0/bin/nodejs-polars.linux-x64-gnu."... 4 more characters
  ],
  write: [],
  net: [],
  env: "all",
  run: [ "/bin/sh" ],
  ffi: [
    "/home/mrcool/.cache/deno/npm/registry.npmjs.org/nodejs-polars-linux-x64-gnu/0.8.0/nodejs-polars.linu"... 14 more characters
  ]
}

It all seem reasonable except of run: sh, this is invoked because of https://github.com/pola-rs/nodejs-polars/blob/main/polars/native-polars.js#L14 shelling out to call which

I believe a better way is to use https://www.npmjs.com/package/which it seems to be a popular library with a total of one dependency

I think its a better change overall (small performance optimization, more robust since which can be not installed on the system, )

If this sound good I can make a PR

Cannot filter for strings

Have you tried latest version of polars?

  • yes
  • no

What version of polars are you using?

[email protected]

What operating system are you using polars on?

Linux 5.4.0-132-generic #148-Ubuntu

What node version are you using

node --version
v16.13.2

Describe your bug.

Cannot filter for string values

What are the steps to reproduce the behavior?

> df = pl.DataFrame({"foo": ["a", "b", "c"]})
Proxy [
  shape: (3, 1)
  ┌─────┐
   foo 
   --- 
   str 
  ╞═════╡
   a   
  ├╌╌╌╌╌┤
   b   
  ├╌╌╌╌╌┤
   c   
  └─────┘,
  {
    get: [Function: get],
    set: [Function: set],
    has: [Function: has],
    ownKeys: [Function: ownKeys],
    getOwnPropertyDescriptor: [Function: getOwnPropertyDescriptor]
  }
]
> df.filter(pl.col("foo").eq("b"))
Uncaught Error: Not found: b
    at Object.collectSync (/home/user/dev/workspaces/node_modules/nodejs-polars/bin/lazy/dataframe.js:62:53)
    at Proxy.filter (/home/user/dev/workspaces/node_modules/nodejs-polars/bin/dataframe.js:155:18) {
  code: 'GenericFailure'
}

What is the actual behavior?

The filter fails

What is the expected behavior?

The filter should succeed

Additional Info

You can get around this error by placing the string value in a series

> df = pl.DataFrame({"foo": ["a", "b", "c"]})
Proxy [
  shape: (3, 1)
  ┌─────┐
  │ foo │
  │ --- │
  │ str │
  ╞═════╡
  │ a   │
  ├╌╌╌╌╌┤
  │ b   │
  ├╌╌╌╌╌┤
  │ c   │
  └─────┘,
  {
    get: [Function: get],
    set: [Function: set],
    has: [Function: has],
    ownKeys: [Function: ownKeys],
    getOwnPropertyDescriptor: [Function: getOwnPropertyDescriptor]
  }
]
> df.filter(pl.col("foo").eq(pl.Series(["b"])))
Proxy [
  shape: (1, 1)
  ┌─────┐
  │ foo │
  │ --- │
  │ str │
  ╞═════╡
  │ b   │
  └─────┘,
  {
    get: [Function: get],
    set: [Function: set],
    has: [Function: has],
    ownKeys: [Function: ownKeys],
    getOwnPropertyDescriptor: [Function: getOwnPropertyDescriptor]
  }
]

any/any_horizontal

Hey there,

I'm trying to filter on multiple columns, any of which matching should satisfy my filter.
Expanding from one to multiple columns gave me the awesome error message
'This is ambiguous. Try to combine the predicates with the 'all_horizontal' or `any_horizontal' expression.'

Alas I can't find either any_horizontal, all_horizontal, or any for that matter in nodejs-polars.

How I would have expected this to work:

let filtered = uq.filter(
      pl.any_horizontal(
        pl.col(index_columns).cast(pl.Utf8).str.toLowerCase().str.contains(q),
      ),
    );

Is it not implemented (yet), or am I missing it in the docs?

Cannot find module 'nodejs-polars-linux-arm64-musl'

Have you tried latest version of polars?

  • [yes]

What version of polars are you using?

0.7.2

What operating system are you using polars on?

Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-1027 aarch64)

What node version are you using

node 16.19.0

Describe your bug.

getting Cannot find module 'nodejs-polars-linux-arm64-musl' error

Error using readRecords to read a database query.

I trying to migrate a express API to use nodejs-polars. However, I'm struggling to reading data from a mysql database. In Python there is the read_sql function, but it does not exist in nodejs version.

I tried to use mysql library to query the db and then transform to polars using the function readRecords but it is not working. It throws this error:

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src\dataframe.rs:1527:29
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

If I reduced the query to a single column, it works. So, I guess it's a data format problem. How can I figure it out where the problem is? Apart from checking column by column and possibly row by row.

Thank you!

Add types to the options in io.ts

Most functions in io.ts have no typings in their options parameter; even though there are options in their jsdoc. Just add an options type/interface so that we can have intellisense.

Node Webpack "Module not found" error

Have you tried latest version of polars?

  • [yes]

If the problem was resolved, please update polars. :)

What version of polars are you using?

nodejs-polars version 0.6.0

What operating system are you using polars on?

Windows 11

What node version are you using

node v16.16.0

Describe your bug.

The compilation with npm produces the following log
log.txt

What are the steps to reproduce the behavior?

Install polars with npm i -s nodejs-polars in a project with webpack and run it.

What is the actual behavior?

The actual behaviour is reported in the log.

What is the expected behavior?

The compilation runs smoothly.

concat LazyDataFrames?

Describe your feature request

Is there a version of concat available for LazyDataFrames? E.g., is it possible to do something like pl.concat([lazy1, lazy2], {how: "vertical"})? If not, what would it take to add it?

Regression: Property 'when' does not exist on type 'typeof pl'

Have you tried latest version of polars?

  • yes

What version of polars are you using?

0.7.3

What operating system are you using polars on?

Linux amd x64 (Debian 12)

What node version are you using

node 18.13.0

Describe your bug.

Awesome to see a new release of this (as I've been working around some of the open issues and missing features). It looks like from 0.7.2 to 0.7.3 there was a regression (perhaps intentional) and the when function no longer exists as a property of the default export.

What are the steps to reproduce the behavior?

This was caught by my CI trying to compile my typescript on the upgrade PR.

Example

import pl, { LazyDataFrame } from "nodejs-polars";

function transform(data: LazyDataFrame): LazyDataFrame {
  return data.withColumn(
    pl
      .when(pl.lit(true))
      .then(pl.lit("Positive"))
      .otherwise(pl.lit("Negative"))
      .alias("someCol")
  );
}

What is the actual behavior?

error TS2339: Property 'when' does not exist on type 'typeof pl'.

Can be fixed by importing when directly.

What is the expected behavior?

Works as did previously.

What do you think polars should have done?

Either included this change in the changelog (if intentional) or not introduced a regression

feature: add `Binary` dtype

Have you tried latest version of polars?

  • yes

If the problem was resolved, please update polars. :)

What version of polars are you using?

0.7.4

What operating system are you using polars on?

Mac OSX 13.2.1 and Amazon Linux (AWS Lambda)

What node version are you using

node v16.19.1

Describe your bug.

The list of datatypes is incomplete. I see the python version has a binary type.

What are the steps to reproduce the behavior?

import * as pl from 'nodejs-polars';

pl.Series('bytes', Uint8Array.from([]), pl.DataType.Binary) // Does not compile

What is the actual behavior?

Uint8Array is cast to List(Uint8) which is not the same in Parquet and Arrow. We're trying to read the Parquet files in Snowflake but the binary columns end up as empty objects as Snowflake is confused by this custom List type.

What is the expected behavior?

A Binary type that adheres to Arrow and Parquet specs.

I/O compatibility with Bun.js

What version of polars are you using?

0.7.2

What operating system are you using polars on?

Mac M1

What node version are you using

v16.18.0

but this is a request/question regarding bun.js compatibility. My bun.js version is v0.5.7

Describe your bug.

Using bun.js instead of Node.js, when I try to use any I/O functions like pl.readCSV() or pl.scanCSV(), bun.js crashes giving this error:

bun index.js
125 |     parseDates: false,
126 |     skipRowsAfterHeader: 0
127 | };
128 | function scanCSV(path, options) {
129 |     options = { ...scanCsvDefaultOptions, ...options };
130 |     return (0, dataframe_2._LazyDataFrame)(polars_internal_1.default.scanCsv(path, options));
                                              ^
error: InvalidArg
      at scanCSV (/Users/home/Documents/js_backtest_engine/node_modules/nodejs-polars/bin/io.js:130:43)
      at /Users/home/Documents/js_backtest_engine/index.js:32:9

What are the steps to reproduce the behavior?

import pl from "nodejs-polars"

let df = pl.readCSV('data.csv')

I understand the library is called nodejs-polars and bun.js is not even in the map (nor should it be i guess) and in that sense the issue is more related to bun compatibility with node than an issue with the polars' bindings itself. That being said, since the package is aimed at extracting the most performance from JS in general, I can see bun is consistently 2x-10x faster than Node in quite any situation really, and it also shows when manipulating Dataframes with nodejs-polars. Hence I wondered I'd give it a shot to ask if anyone has any idea why such error appears and if anything could make the I/O functions compatible with bun too?

Note that, as I mentionned, Dataframe and Series do work like a charm with bun, so this is specifically something going on with I/O. I reckon that to create a stream in bun I do const stream = Bun.file(this.filePath).stream(); instead of const stream = fs.createReadStream(this.filePath); with Node. So I thought I'd allow myself to post this bug in case some quick fix/workaround could be found.

Better Contribuition Guideline

The nodejs-polars contribuiting guideline currently refers to the main contrib guideline which in term refers back to the nodejs-polars one. How do we actually contribuite to the codebase?

An issue of running polar-rs

Have you tried latest version of polars?

  • [yes]

What version of polars are you using?

^0.7.4

What operating system are you using polars on?

Windows 11

What node version are you using

v18.8.0

Describe your bug.

Cannot run the library due to esbuild

What is the actual behavior?

No loader is configured for ".node" files: node_modules/nodejs-polars-win32-x64-msvc/nodejs-polars.win32-x64-msvc.node

    node_modules/nodejs-polars/bin/native-polars.js:68:48:
      68 │                         nativeBinding = require('nodejs-polars-win32-x64-msvc');

Another point that I want to add that the project that im building is based on vite + react

pl.readRecords(rows, {schema}) crashes when a schema is specified

Have you tried latest version of polars?

[yes]

What version of polars are you using?

0.7.3

What operating system are you using polars on?

Mac M1 on Ventura 13.2.1

What node version are you using

node v19.2.0

Describe your bug.

.readRecords() method crashes when called with a Schema.

What are the steps to reproduce the behavior?

const pl = require('nodejs-polars');

const rows = [
  { num: 1, date: "foo", string: "foo1" },
  { num: 1, date: "foo" },
];

const schema = {
  num: pl.Int32,
  date: pl.Utf8,
  string: pl.Utf8,
};

const df = pl.readRecords(rows, { schema });

What is the actual behavior?

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/conversion.rs:613:37
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5

What is the expected behavior?

Since the provided code is a copy-paste from the tests, here is the entire test set:

test("from row objects, with schema", () => {
    const rows = [
      { num: 1, date: "foo", string: "foo1" },
      { num: 1, date: "foo" },
    ];

    const expected = [
      { num: 1, date: rows[0].date.toString(), string: "foo1" },
      { num: 1, date: rows[1].date.toString(), string: null },
    ];

    const schema = {
      num: pl.Int32,
      date: pl.Utf8,
      string: pl.Utf8,
    };
    const df = pl.readRecords(rows, { schema });
    expect(df.toRecords()).toEqual(expected);
    expect(df.schema).toEqual(schema);
  });

Can I user nodejs-polars also in the browser

Hi,

I would love to use the library also in the frontend, but receive module not found messages for "none" browser libs like fs, path and so on..

Is it possilbe to configure/import the library also into a web application and run it in the browser?

Kind Regards,
Oliver

pl.Datetime only accessible via pl.DataType.Datetime with "module": "es6" and "type": "module"

Have you tried latest version of polars?

  • [yes]

What version of polars are you using?

0.7.3

What operating system are you using polars on?

5.15.94-1-MANJARO

What node version are you using

node 18

Describe your bug.

If the package.json says type: module

{
  "name": "fresh-ts-project",
  "version": "1.0.0",
  "main": "index.js",
  "license": "MIT",
  "type": "module",
  "dependencies": {
    "@types/node": "^18.15.11",
    "nodejs-polars": "^0.7.3",
    "typescript": "^5.0.2"
  }
}

tsconfig

{
  "compilerOptions": {
    "module": "es6",
    "moduleResolution": "node"
  },
  "include": [
    "main.ts"
  ]
}

and main.ts

import pl from 'nodejs-polars'
console.log('a', pl.Datetime)
// @ts-ignore
console.log('b', pl.DataType.Datetime)

it prints

tsc && node main.js 
a undefined
b [Function: Datetime]

I don't remember the specifics why we have those two values in our tsconfig and "type": "module", but it's usually because imports of other packages are broken without it.

furthermore, typescript will think that pl.DataType.Datetime is an error, unless it is imported via import * as pl from 'nodejs-polars';.

Add information in docs regarding status/maturity/future?

Hi, This looks interesting

I wonder how feature complete this is (are things missing in node api compared to rust/python?), if it is production ready and how committed you are to supporting nodejs going forward (or if this was more of a "proof-of-concept") (reading this) and the following

"Releases happen quite often (weekly / every few days) at the moment, so updating polars regularly to get the latest bugfixes / features might not be a bad idea."

In the README.md there is a link to node documentation which gives a 404. In the general user guide I see some references to javascript here and there, but examples are in rust/python.

I totally get that this is a new endeavour and things take time, I just want to assess if and / or when this might be ready for use and what I can except going forward.

I hope this question is OK, I think I am not the only one who has these questions in mind when discovering this repository for the first time. Maybe add a few answers to these types of questions in the readme?

Join options semi and anti are not accepted for lazy dataframes

Have you tried latest version of polars?

  • [yes]

If the problem was resolved, please update polars. :)

What version of polars are you using?

0.7.2

What operating system are you using polars on?

Windows

What node version are you using

v18.12.1

Describe your bug.

When using either anti or semi for the how options, I'm getting a typescript error:

Overload 1 of 3, '(other: LazyDataFrame, joinOptions: { on: ValueOrArray<string | Expr>; } & LazyJoinOptions): LazyDataFrame', gave the following error.
Type '"semi"' is not assignable to type '"left" | "inner" | "outer" | "cross" | undefined'.
Overload 2 of 3, '(other: LazyDataFrame, joinOptions: { leftOn: ValueOrArray<string | Expr>; rightOn: ValueOrArray<string | Expr>; } & LazyJoinOptions): LazyDataFrame', gave the
following error.
Type '"semi"' is not assignable to type '"left" | "inner" | "outer" | "cross" | undefined'.
Overload 3 of 3, '(other: LazyDataFrame, options: { how: "cross"; suffix?: string | undefined; allowParallel?: boolean | undefined; forceParallel?: boolean | undefined; }): Lazy
DataFrame', gave the following error.
Type '"semi"' is not assignable to type '"cross"'.
18 const result = await df.join(otherDF, { leftOn: "ham", rightOn: 'ham2', how: "semi" }).collect();

What are the steps to reproduce the behavior?

import pl from "nodejs-polars";

const df = pl
  .DataFrame({
    foo: [1, 2, 3],
    bar: [6.0, 7.0, 8.0],
    ham: ["a", "b", "c"],
  })
  .lazy();

const otherDF = pl
  .DataFrame({
    apple: ["x", "y", "z"],
    ham2: ["a", "b", "d"],
  })
  .lazy();

const result = await df.join(otherDF, { leftOn: "ham", rightOn: 'ham2', how: "semi" }).collect();

What is the actual behavior?

Overload 1 of 3, '(other: LazyDataFrame, joinOptions: { on: ValueOrArray<string | Expr>; } & LazyJoinOptions): LazyDataFrame', gave the following error.
Type '"semi"' is not assignable to type '"left" | "inner" | "outer" | "cross" | undefined'.
Overload 2 of 3, '(other: LazyDataFrame, joinOptions: { leftOn: ValueOrArray<string | Expr>; rightOn: ValueOrArray<string | Expr>; } & LazyJoinOptions): LazyDataFrame', gave the
following error.
Type '"semi"' is not assignable to type '"left" | "inner" | "outer" | "cross" | undefined'.
Overload 3 of 3, '(other: LazyDataFrame, options: { how: "cross"; suffix?: string | undefined; allowParallel?: boolean | undefined; forceParallel?: boolean | undefined; }): Lazy
DataFrame', gave the following error.
Type '"semi"' is not assignable to type '"cross"'.
18 const result = await df.join(otherDF, { leftOn: "ham", rightOn: 'ham2', how: "semi" }).collect();

What is the expected behavior?

A successful join

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.