GithubHelp home page GithubHelp logo

qv's Introduction

Hi there 👋

qv's People

Contributors

alippai avatar chapeupreto avatar github-actions[bot] avatar renovate-bot avatar renovate[bot] avatar timvw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

qv's Issues

Support paritioned parquet dataset

File location patterns mydata/*.parquet and mydata/partition=01/*.parquet are common storage formats. Is it possible to support reading mydata in these cases? In theory Datafusion should have some support

Will this work on single files without deltalake?

Hi,
When I try to view a schema on a csv file I got an error for some _delta_log.
Is this normal?

∴ qv -s  ./fixtures/good/usage_data.csv 
Error: ObjectStore(Generic { store: "LocalFileSystem", source: UnableToWalkDir { source: Error { depth: 0, inner: Io { path: Some("/home/guda/projects/toki/invoicing/fixtures/good/usage_data.csv/_delta_log"), err: Os { code: 20, kind: NotADirectory, message: "Not a directory" } } } } })

 ∴ qv -V
qv 0.3.1

Problem with special characters in file path: "No such file or directory"

Hi, I tried your tool but it does not work with special characters in the path:

# qv "/tmp/test@dir#with_special characters.parquet"
Error: ObjectStore(NotFound { path: "/tmp/test@dir%23with_special%20characters.parquet", source: Os { code: 2, kind: NotFound, message: "No such file or directory" } })

latest version published to cargo out of date

I noticed the --profile flag wasn't working for some reason, and then realized I was using an out of date version of qv installed via cargo install qv.

❯ cargo search qv
qv = "0.1.22"               # quickly view your data
❯ cargo install qv
    Updating crates.io index
     Ignored package `qv v0.1.22` is already installed, use --force to override

looks like the latest version https://crates.io/crates/qv/versions is 0.1.22. would it be possible to upload the history of versions to cargo?

Support for google cloud storage blobs.

can we extend support for google cloud storage as well??

It would be very nice to have this api generalised for any cloud storage provider and provide a simple api for common use cases like fetch, get, e.t.c

It's really a great tool with some understanding use-case(PS: data-engg)

Tyvm for putting effort though.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

cargo
Cargo.toml
  • aws-config 1.2.1
  • aws-sdk-glue 1.27
  • aws-types 1.2
  • aws-credential-types 1.2
  • chrono 0.4.38
  • clap 4.5.4
  • datafusion 35
  • deltalake 0.17
  • futures 0.3
  • glob 0.3
  • object_store 0.9
  • regex 1.10
  • tokio 1
  • url 2.5
  • assert_cmd 2.0.14
  • predicates 3.1
dockerfile
Dockerfile
  • rust 1.77
github-actions
.github/workflows/binaries.yml
  • actions/checkout v4
  • taiki-e/setup-cross-toolchain-action v1
  • taiki-e/upload-rust-binary-action v1
.github/workflows/release-plz.yml
  • actions/checkout v4
  • MarcoIeni/release-plz-action v0.5
.github/workflows/test_suite.yml
  • actions/checkout v4
  • actions-rust-lang/setup-rust-toolchain v1
  • taiki-e/install-action v2
  • mikepenz/action-junit-report v4
  • codecov/codecov-action v4
  • actions/checkout v4
  • actions-rust-lang/setup-rust-toolchain v1
  • actions-rust-lang/rustfmt v1
  • actions/checkout v4
  • actions-rust-lang/setup-rust-toolchain v1

  • Check this box to trigger a request for Renovate to run again on this repository

Support for saving dataset

Is it in scope for this tool to support saving datasets or do you just want to keep it as a view tool? It would be useful for instance for converting formats, filtering data from a CSV quickly and so on. I know there is the datafusion CLI for that, but a simple tool like AWK but with friendly syntax would be welcome.

Slow startup time

It might be because of the high number of CPUs (if arrow starts a thread-per-core threadpool), but reading a 1MB parquet file (with limits) takes 3s.
When running qv table.parquet likely only one thread / CPU is needed (or one per column at most) as in theory we are reading only one batch (few rows)?!

failed to map column projection- incompatible data types list field element vs item

I have a table that reads correctly using Spark + Delta Lake Libraries, but I'm having trouble reading via pv.

do you know which downstream dependency could be giving me this error?

Error: ArrowError(ExternalError(Execution("Failed to map column projection for field mycolumn. Incompatible data types List(Field { name: "element", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }) and List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })")))

I checked the schema from the delta transaction log and didn't see a hardcoded item or element:

❯ aws s3 cp s3://mybucket/year=2022/month=6/day=9/myprefix/_delta_log/00000000000000000000.json - | head -n 3 | tail -n 1 | jq '.metaData.schemaString | fromjson | .fields[] | select(.name == "mycolumn")'
{
  "name": "mycolumn",
  "type": {
    "type": "array",
    "elementType": "string",
    "containsNull": true
  },
  "nullable": true,
  "metadata": {}
}

When I look at the schema of a sample parquet file on s3, I do indeed see that the item in the list is called element:

pqrs schema =(s5cmd cat s3://mybucket/year=2022/month=6/day=9/myprefix/_partition=00001/part-00037-cb2e71c3-4f26-4de0-9e9a-18298489ccdc.c000.snappy.parquet)

...
message spark_schema {
  ...
  OPTIONAL group mycolumn (LIST) {
    REPEATED group list {
      OPTIONAL BYTE_ARRAY element (UTF8);
    }
  }
  ...
}

I see this exact error is from here: https://github.com/apache/arrow-datafusion/blob/aad82fbb32dc1bb4d03e8b36297f8c9a3148df89/datafusion/core/src/physical_plan/file_format/mod.rs#L253

And I also see that element is hardcoded in delta-rs here:

https://github.com/delta-io/delta-rs/blob/83b8296fa5d55ebe050b022ed583dc57152221fe/rust/src/delta_arrow.rs#L38-L48 (pr: delta-io/delta-rs#228)

But I can't seem to find where the schema mismatch is coming from.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.