GithubHelp home page GithubHelp logo

skale-me / node-parquet Goto Github PK

View Code? Open in Web Editor NEW
57.0 57.0 11.0 218 KB

NodeJS module to access apache parquet format files

License: Apache License 2.0

Python 4.51% Shell 1.80% JavaScript 40.86% C++ 52.83%
node-parquet nodejs parquet skale-engine

node-parquet's People

Contributors

danielsan avatar mvertes avatar spinningarrow avatar terebentina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-parquet's Issues

Does not build on MacOS Sierra

I'm having trouble building node-parquet on my MacOS system:

Darwin Apollo.local 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64 x86_64

Everything appears to build fine until the node-gyp rebuild step:

> node-gyp rebuild

  CXX(target) Release/obj.target/parquet/src/parquet_binding.o
In file included from ../src/parquet_binding.cc:3:
In file included from ../src/parquet_reader.h:8:
In file included from ../deps/parquet-cpp/src/parquet/api/reader.h:22:
../deps/parquet-cpp/src/parquet/column_reader.h:22:10: fatal error: 'cstdint' file not found
#include <cstdint>
         ^
1 error generated.
make: *** [Release/obj.target/parquet/src/parquet_binding.o] Error 1
gyp ERR! build error 
gyp ERR! stack Error: `make` failed with exit code: 2
gyp ERR! stack     at ChildProcess.onExit (~/.nvm/versions/node/v6.11.0/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:285:23)
gyp ERR! stack     at emitTwo (events.js:106:13)
gyp ERR! stack     at ChildProcess.emit (events.js:191:7)
gyp ERR! stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:215:12)
gyp ERR! System Darwin 16.7.0
gyp ERR! command "~/.nvm/versions/node/v6.11.0/bin/node" "~/.nvm/versions/node/v6.11.0/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd node-parquet
gyp ERR! node -v v6.11.0
gyp ERR! node-gyp -v v3.6.0
gyp ERR! not ok 
npm ERR! code ELIFECYCLE
npm ERR! errno 1

I have g++, cmake, boost, and thrift all installed. I've even tried upgrading to newer versions of cmake and boost building from source, re-installing packages, and everything else I could think of.

StackOverflow seems to think this "cstdint" package is included in a "tr1" folder, and proposes a solution: https://stackoverflow.com/questions/10116724/clang-os-x-lion-cannot-find-cstdint

however, the proposed solution doesn't work for me either.

Any help getting this to build would be greatly appreciated.

arrow/util/bit-util.h: No such file or directory

Getting this on a Fedora, all requirements listed in the readme installed:

  CXX(target) Release/obj.target/parquet/src/parquet_binding.o
In file included from ../deps/parquet-cpp/src/parquet/api/reader.h:22:0,
                 from ../src/parquet_reader.h:8,
                 from ../src/parquet_binding.cc:3:
../deps/parquet-cpp/src/parquet/column_reader.h:29:33: fatal error: arrow/util/bit-util.h: No such file or directory
 #include <arrow/util/bit-util.h>

Cannot install node-parquet

I did "sudo apt-get install -y bison flex libssl-dev libboost-dev libboost-system-dev libboost-filesystem-dev libboost-regex-dev"
before install node-parquet.
But cannot go next step...
Please, check the error logs.

npm install --save node-parquet

[email protected] preinstall /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet
./build_parquet-cpp.sh

CMake Error at CMakeLists.txt:19 (cmake_minimum_required):
CMake 3.2.0 or higher is required. You are running version 2.8.12.2

-- Configuring incomplete, errors occurred!
npm WARN [email protected] No description
npm WARN [email protected] No repository field.

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] preinstall: ./build_parquet-cpp.sh
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] preinstall script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2018-05-30T06_53_50_947Z-debug.log

0 info it worked if it ends with ok
1 verbose cli [ '/usr/local/bin/node',
1 verbose cli '/usr/local/bin/npm',
1 verbose cli 'install',
1 verbose cli '--save',
1 verbose cli 'node-parquet' ]
2 info using [email protected]
3 info using [email protected]
4 verbose npm-session 008f3bb4f7d551b2
5 silly install loadCurrentTree
6 silly install readLocalPackageData
7 http fetch GET 304 https://registry.npmjs.org/node-parquet 817ms (from cache)
8 silly pacote tag manifest for node-parquet@latest fetched in 854ms
9 silly install loadIdealTree
10 silly install cloneCurrentTreeToIdealTree
11 silly install loadShrinkwrap
12 silly install loadAllDepsIntoIdealTree
13 silly resolveWithNewModule [email protected] checking installable status
14 http fetch GET 304 https://registry.npmjs.org/minimist 126ms (from cache)
15 http fetch GET 304 https://registry.npmjs.org/nan 127ms (from cache)
16 silly pacote range manifest for minimist@^1.2.0 fetched in 130ms
17 silly resolveWithNewModule [email protected] checking installable status
18 silly pacote range manifest for nan@^2.10.0 fetched in 131ms
19 silly resolveWithNewModule [email protected] checking installable status
20 http fetch GET 304 https://registry.npmjs.org/hexdump-nodejs 697ms (from cache)
21 silly pacote range manifest for hexdump-nodejs@^0.1.0 fetched in 700ms
22 silly resolveWithNewModule [email protected] checking installable status
23 silly currentTree [email protected]
24 silly idealTree [email protected]
24 silly idealTree ├── [email protected]
24 silly idealTree ├── [email protected]
24 silly idealTree ├─┬ [email protected]
24 silly idealTree │ └── [email protected]
24 silly idealTree └── [email protected]
25 silly install generateActionsToTake
26 silly diffTrees action count 5
27 silly diffTrees add [email protected]
28 silly diffTrees add [email protected]
29 silly diffTrees add [email protected]
30 silly diffTrees add [email protected]
31 silly diffTrees add [email protected]
32 silly decomposeActions action count 40
33 silly decomposeActions fetch [email protected]
34 silly decomposeActions extract [email protected]
35 silly decomposeActions preinstall [email protected]
36 silly decomposeActions build [email protected]
37 silly decomposeActions install [email protected]
38 silly decomposeActions postinstall [email protected]
39 silly decomposeActions finalize [email protected]
40 silly decomposeActions refresh-package-json [email protected]
41 silly decomposeActions fetch [email protected]
42 silly decomposeActions extract [email protected]
43 silly decomposeActions preinstall [email protected]
44 silly decomposeActions build [email protected]
45 silly decomposeActions install [email protected]
46 silly decomposeActions postinstall [email protected]
47 silly decomposeActions finalize [email protected]
48 silly decomposeActions refresh-package-json [email protected]
49 silly decomposeActions fetch [email protected]
50 silly decomposeActions extract [email protected]
51 silly decomposeActions preinstall [email protected]
52 silly decomposeActions build [email protected]
53 silly decomposeActions install [email protected]
54 silly decomposeActions postinstall [email protected]
55 silly decomposeActions finalize [email protected]
56 silly decomposeActions refresh-package-json [email protected]
57 silly decomposeActions fetch [email protected]
58 silly decomposeActions extract [email protected]
59 silly decomposeActions preinstall [email protected]
60 silly decomposeActions build [email protected]
61 silly decomposeActions install [email protected]
62 silly decomposeActions postinstall [email protected]
63 silly decomposeActions finalize [email protected]
64 silly decomposeActions refresh-package-json [email protected]
65 silly decomposeActions fetch [email protected]
66 silly decomposeActions extract [email protected]
67 silly decomposeActions preinstall [email protected]
68 silly decomposeActions build [email protected]
69 silly decomposeActions install [email protected]
70 silly decomposeActions postinstall [email protected]
71 silly decomposeActions finalize [email protected]
72 silly decomposeActions refresh-package-json [email protected]
73 silly install executeActions
74 silly doSerial global-install 40
75 verbose correctMkdir /root/.npm/_locks correctMkdir not in flight; initializing
76 verbose lock using /root/.npm/_locks/staging-846bdfdb6908b49a.lock for /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging
77 silly doParallel extract 40
78 silly extract [email protected]
79 silly pacote trying hexdump-nodejs@https://registry.npmjs.org/hexdump-nodejs/-/hexdump-nodejs-0.1.0.tgz by hash: sha1-W2KB2R3YjHnfpRtC8I2sTML5rpI=
80 silly extract [email protected]
81 silly pacote trying minimist@https://registry.npmjs.org/minimist/-/minimist-1.2.0.tgz by hash: sha1-o1AIsg9BOD7sH7kU9M1d95omQoQ=
82 silly extract [email protected]
83 silly pacote trying nan@https://registry.npmjs.org/nan/-/nan-2.10.0.tgz by hash: sha512-bAdJv7fBLhWC+/Bls0Oza+mvTaNQtP+1RyhhhvD95pgUJz6XM5IzgmxOkItJ9tkoCiplvAnXI1tNmmUD/eScyA==
84 silly extract [email protected]
85 silly pacote trying node-parquet@https://registry.npmjs.org/node-parquet/-/node-parquet-0.2.7.tgz by hash: sha512-m9OySE3WfBgkTQ+lH8SC9cbrmBPgBSbGSG9hhrQACaqnyQFXJXuutqEeCIxo/2We5iuguCFsfpqqnjfCvPxGMg==
86 silly extract [email protected]
87 silly pacote trying varint@https://registry.npmjs.org/varint/-/varint-5.0.0.tgz by hash: sha1-2Ca4n3SQcy+rwMDtaT7Uddyynr8=
88 silly pacote hexdump-nodejs@https://registry.npmjs.org/hexdump-nodejs/-/hexdump-nodejs-0.1.0.tgz extracted to /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging/hexdump-nodejs-1072ae2d by content address 113ms
89 silly pacote varint@https://registry.npmjs.org/varint/-/varint-5.0.0.tgz extracted to /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging/varint-4786d7ab by content address 116ms
90 silly pacote minimist@https://registry.npmjs.org/minimist/-/minimist-1.2.0.tgz extracted to /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging/minimist-1906643f by content address 128ms
91 silly pacote nan@https://registry.npmjs.org/nan/-/nan-2.10.0.tgz extracted to /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging/nan-85e1df4c by content address 142ms
92 silly pacote node-parquet@https://registry.npmjs.org/node-parquet/-/node-parquet-0.2.7.tgz extracted to /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging/node-parquet-76a6ccb4 by content address 232ms
93 silly doReverseSerial unbuild 40
94 silly doSerial remove 40
95 silly doSerial move 40
96 silly doSerial finalize 40
97 silly finalize /usr/local/globalcdn/playground/nodeParquet/node_modules/hexdump-nodejs
98 silly finalize /usr/local/globalcdn/playground/nodeParquet/node_modules/minimist
99 silly finalize /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet/node_modules/nan
100 silly finalize /usr/local/globalcdn/playground/nodeParquet/node_modules/varint
101 silly finalize /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet
102 silly doParallel refresh-package-json 40
103 silly refresh-package-json /usr/local/globalcdn/playground/nodeParquet/node_modules/hexdump-nodejs
104 silly refresh-package-json /usr/local/globalcdn/playground/nodeParquet/node_modules/minimist
105 silly refresh-package-json /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet/node_modules/nan
106 silly refresh-package-json /usr/local/globalcdn/playground/nodeParquet/node_modules/varint
107 silly refresh-package-json /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet
108 silly doParallel preinstall 40
109 silly preinstall [email protected]
110 info lifecycle [email protected]preinstall: [email protected]
111 silly preinstall [email protected]
112 info lifecycle [email protected]
preinstall: [email protected]
113 silly preinstall [email protected]
114 info lifecycle [email protected]preinstall: [email protected]
115 silly preinstall [email protected]
116 info lifecycle [email protected]
preinstall: [email protected]
117 silly preinstall [email protected]
118 info lifecycle [email protected]preinstall: [email protected]
119 verbose lifecycle [email protected]
preinstall: unsafe-perm in lifecycle false
120 verbose lifecycle [email protected]preinstall: PATH: /usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/node-gyp-bin:/usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet/node_modules/.bin:/usr/local/globalcdn/playground/nodeParquet/node_modules/.bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
121 verbose lifecycle [email protected]
preinstall: CWD: /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet
122 silly lifecycle [email protected]preinstall: Args: [ '-c', './build_parquet-cpp.sh' ]
123 silly lifecycle [email protected]
preinstall: Returned: code: 1 signal: null
124 info lifecycle [email protected]~preinstall: Failed to exec preinstall script
125 verbose unlock done using /root/.npm/_locks/staging-846bdfdb6908b49a.lock for /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging
126 silly saveTree [email protected]
126 silly saveTree └─┬ [email protected]
126 silly saveTree ├── [email protected]
126 silly saveTree ├── [email protected]
126 silly saveTree ├── [email protected]
126 silly saveTree └── [email protected]
127 warn [email protected] No description
128 warn [email protected] No repository field.
129 verbose stack Error: [email protected] preinstall: ./build_parquet-cpp.sh
129 verbose stack Exit status 1
129 verbose stack at EventEmitter. (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/index.js:285:16)
129 verbose stack at emitTwo (events.js:126:13)
129 verbose stack at EventEmitter.emit (events.js:214:7)
129 verbose stack at ChildProcess. (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/lib/spawn.js:55:14)
129 verbose stack at emitTwo (events.js:126:13)
129 verbose stack at ChildProcess.emit (events.js:214:7)
129 verbose stack at maybeClose (internal/child_process.js:925:16)
129 verbose stack at Process.ChildProcess._handle.onexit (internal/child_process.js:209:5)
130 verbose pkgid [email protected]
131 verbose cwd /usr/local/globalcdn/playground/nodeParquet
132 verbose Linux 3.13.0-74-generic
133 verbose argv "/usr/local/bin/node" "/usr/local/bin/npm" "install" "--save" "node-parquet"
134 verbose node v8.11.1
135 verbose npm v5.6.0
136 error code ELIFECYCLE
137 error errno 1
138 error [email protected] preinstall: ./build_parquet-cpp.sh
138 error Exit status 1
139 error Failed at the [email protected] preinstall script.
139 error This is probably not a problem with npm. There is likely additional logging output above.
140 verbose exit [ 1, true ]

Group type returns error

When using the group type I get an error on the number of values in the column.

E.g. from the example given in the documentation, this message is returned:
Error: Column 2 had 7 while previous column had 2

So it seems like the nested values are treated as a single value column and therefore it thinks the column has 7 rows, when it in reality has two.

crash when using int96

Installed and compiled on Amazon linux. Whenever I try to use int96 to store timestamp node crashes:

Using Node 8.x.x and 6.x.x

/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node[15317]: ../src/node_buffer.cc:220:char* node::Buffer::Data(v8::Localv8::Object): Assertion `obj->IsArrayBufferView()' failed.
1: node::Abort() [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
2: node::Assert(char const* const () [4]) [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
3: node::Buffer::Length(v8::Localv8::Value) [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
4: 0x7f2e12e39d41 [/home/ec2-user/bidder/node_modules/node-parquet/build/Release/parquet.node]
5: ParquetWriter::Write(Nan::FunctionCallbackInfov8::Value const&) [/home/ec2-user/bidder/node_modules/node-parquet/build/Release/parquet.node]
6: 0x7f2e12e39b57 [/home/ec2-user/bidder/node_modules/node-parquet/build/Release/parquet.node]
7: v8::internal::FunctionCallbackArguments::Call(void (
)(v8::FunctionCallbackInfov8::Value const&)) [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
8: 0xb43f48 [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
9: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
10: 0x2efe9ab840bd

fatal error: 'cstdint' file not found

Hi,

I'm trying to install node-parquet, using "npm install node-parquet" on my macOS Sierra.
I installed all the dependencies, but i'm getting this error:

[100%] Built target parquet_static

[email protected] install /Volumes/Data/Desenvolvimento/repositories/web/parquet_reader/node_modules/node-parquet
node-gyp rebuild

CXX(target) Release/obj.target/parquet/src/parquet_binding.o
In file included from ../src/parquet_binding.cc:3:
In file included from ../src/parquet_reader.h:8:
In file included from ../deps/parquet-cpp/src/parquet/api/reader.h:22:
../deps/parquet-cpp/src/parquet/column/reader.h:22:10: fatal error: 'cstdint' file not found
#include
^
1 error generated.
make: *** [Release/obj.target/parquet/src/parquet_binding.o] Error 1
gyp ERR! build error

memory leak in ParquetWriter

Hi,

I'm trying to use the ParquetWriter to write to a parquet file one line at a time. I noticed that node memory usage continues to grow with each call writer.write(rows). I am processing a very large file and the memory usage grows beyond my machines limits. Since I am reading and writing one row at a time, it seems like the memory usage should stay constant. Is there a workaround for this?

Thanks,
David

cannot write more then once

based on the example on the readme:

var parquet = require('node-parquet');

var schema = {
small_int: {type: 'int32', optional: true},
big_int: {type: 'int64'},
my_boolean: {type: 'bool'},
name: {type: 'byte_array', optional: true},
};

var data = [
[ 1, 23234, true, 'hello world'],
[ , 1234, false, ],
];

var writer = new parquet.ParquetWriter('my_file.parquet', schema);
writer.write(data);
writer.close();

Here will be the output:

$ ./node_modules/node-parquet/bin/parquet.js cat ./my_file.parquet
[1,23234,true,"hello world"]
[null,1234,false,null]

If write it three times like below:
writer.write(data);
writer.write(data);
writer.write(data);
writer.close();

The output will be nulls after first write:
$ ./node_modules/node-parquet/bin/parquet.js cat ./my_file.parquet
[1,23234,true,"hello world"]
[null,1234,false,null]
[null,null,null,null]
[null,null,null,null]
[null,null,null,null]
[null,null,null,null]

CLI: numeric filename converted to number, raises javascript TypeError

First: thank you for building this super-useful tool. Definitely comes in handy when needing to run quick diffs against two parquet files.

Issue

Numeric filenames cause javascript errors. Since my parquet files are output by hive, they have numeric names.

Details

If you run the following code:

parquet head 00000

You'll get the following error message:

cat 0
/home/youruser/.nodenv/versions/6.11.2/lib/node_modules/node-parquet/bin/parquet.js:54
  const reader = new parquet.ParquetReader(file);
                 ^

TypeError: wrong argument
    at TypeError (native)
    at cat (/home/sroeca/.nodenv/versions/6.11.2/lib/node_modules/node-parquet/bin/parquet.js:54:18)
    at Object.<anonymous> (/home/sroeca/.nodenv/versions/6.11.2/lib/node_modules/node-parquet/bin/parquet.js:43:5)
    at Module._compile (module.js:570:32)
    at Object.Module._extensions..js (module.js:579:10)
    at Module.load (module.js:487:32)
    at tryModuleLoad (module.js:446:12)
    at Function.Module._load (module.js:438:3)
    at Module.runMain (module.js:604:10)
    at run (bootstrap_node.js:389:7)

Workaround

At present, the simple workaround is to rename the files to a non-numeric value. This is mildly cumbersome.

ParquetWriter - Segmentation Fault

In some cases my node dies with following message:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Reproducible on Ubuntu 16.04.4 LTS
Node.js v6.10.3 (downloaded as binaries)
node-parquet 0.2.6 (installed via npm with all dev-depenencies)

I can fetch more information if needed. This is my dev environment.

P.S. Thanks for parquet for node!!

npm install fail

Hi

AWS athena because i am trying to use parquet.

However, there was a problem with the installation.

I need help.

i installed brew install cmake before npm install

  ~ brew install cmake
Warning: cmake 3.12.2 is already installed and up-to-date
➜  assistant git:(113-admin_006-post-s3-athena-parquet) ✗ npm i node-parquet

> [email protected] preinstall /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet
> ./build_parquet-cpp.sh

-- The C compiler identification is AppleClang 9.1.0.9020039
-- The CXX compiler identification is AppleClang 9.1.0.9020039
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PkgConfig: /usr/local/bin/pkg-config (found version "0.29.2")
clang-tidy not found
clang-format not found
-- Compiler id: AppleClang
Selected compiler clang 4.0
-- Performing Test CXX_SUPPORTS_SSE3
-- Performing Test CXX_SUPPORTS_SSE3 - Success
-- Performing Test CXX_SUPPORTS_ALTIVEC
-- Performing Test CXX_SUPPORTS_ALTIVEC - Success
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
-- Build Type: RELEASE
-- Boost version: 1.67.0
-- Found the following Boost libraries:
--   regex
-- Boost include dir: /usr/local/include
-- Boost libraries: /usr/local/lib/libboost_regex-mt.dylib
-- THRIFT_HOME:
-- Thrift compiler/libraries NOT found:  (THRIFT_INCLUDE_DIR-NOTFOUND, THRIFT_STATIC_LIB-NOTFOUND). Looked in system search paths.
-- Thrift include dir: /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep/src/thrift_ep-install/include
-- Thrift static library: /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep/src/thrift_ep-install/lib/libthrift.a
-- Thrift compiler: /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep/src/thrift_ep-install/bin/thrift
-- Thrift version:
-- Checking for module 'arrow'
--   No package 'arrow' found
-- Could not find the Arrow library. Looked for headers in , and for libs in
-- Building Apache Arrow from commit: 501d60e918bd4d10c429ab34e0b8e8a87dffb732
-- CMAKE_CXX_FLAGS:  -Qunused-arguments  -O3 -DNDEBUG  -Wall -std=c++11 -stdlib=libc++
-- Found cpplint executable at /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/deps/parquet-cpp/build-support/cpplint.py
-- Configuring done
CMake Warning (dev):
  Policy CMP0068 is not set: RPATH settings on macOS do not affect
  install_name.  Run "cmake --help-policy CMP0068" for policy details.  Use
  the cmake_policy command to set the policy and suppress this warning.

  For compatibility with older versions of CMake, the install_name fields for
  the following targets are still affected by RPATH settings:

   parquet_shared

This warning is for project developers.  Use -Wno-dev to suppress it.

-- Generating done
-- Build files have been written to: /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp
Scanning dependencies of target thrift_ep
[  1%] Creating directories for 'thrift_ep'
[  3%] Performing download step (download, verify and extract) for 'thrift_ep'
-- thrift_ep download command succeeded.  See also /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download-*.log
[  5%] No patch step for 'thrift_ep'
[  7%] No update step for 'thrift_ep'
[  9%] Performing configure step for 'thrift_ep'
-- thrift_ep configure command succeeded.  See also /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure-*.log
[ 10%] Performing build step for 'thrift_ep'
CMake Error at /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build-RELEASE.cmake:16 (message):
  Command failed: 2

   '/Applications/Xcode.app/Contents/Developer/usr/bin/make'

  See also

    /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build-*.log


make[2]: *** [thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build] Error 1
make[1]: *** [CMakeFiles/thrift_ep.dir/all] Error 2
make: *** [all] Error 2
npm WARN assistant No description
npm WARN assistant No repository field.
npm WARN assistant No license field.

npm ERR! code ELIFECYCLE
npm ERR! errno 2
npm ERR! [email protected] preinstall: `./build_parquet-cpp.sh`
npm ERR! Exit status 2
npm ERR!
npm ERR! Failed at the [email protected] preinstall script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /Users/hongjinho/.npm/_logs/2018-09-15T18_38_47_846Z-debug.log

Feature request: read data from in-memory buffer

My use-case is that we have a bunch of Parquet files in S3 I'm operating over in batches. While it works fine to download things to the local file system before reading and then deleting them after I'm done, it would be nicer if I could cut out the file system completely and pass the reader object a Buffer instance.

(and apologies if this already exists but I just didn't spot it in the docs)

Is it possible to upload stream upload direct AWS S3?

I have too large amount no-sql data, I want to read data as stream and just pass the schema and stream . I will upload on S3 as parquet file . Due to large amount data can't store on local so I don't want to store file in memory or physically memory . Please advise me

run node-parquet in AWS Lambda

Hi,
I wanted to use this wonderful module in aws lambda, the key blocker is that when I compile node-parquet module then the whole thing is over 400MB; Unfortunately AWS Lambda allows to upload ~240 MB max per lambda function.
I was wondering is there any possibility to slim the whole output down. Or is this is what we get?
In any case I'm looking through make files to understand if I can do something on my own.
Thanks for your time!

Waiting for the API Release

Hi,
We are desperately in a need to parse parquet formatted files from node server to get some meaningful information out of it.we believe this module is the best fit for our need.
So,If we could tell us when can we expect the initial working version of this module that would be of very helpful.

Thanks,
Basil

Stream parquet output

Would it be possible for ParquetWriter to write to a node stream? It looks to me like parquet-cpp supports various streams.

Integers converted to undefined

Hi Mark,
I started getting strange results when converting to Parquet and back using your module and your example:

Here is the code I'm using:

var parquet = require('node-parquet');

var schema = {
    small_int: {type: 'int32'},
    big_int: {type: 'int64'},
    name: {type: 'byte_array'}
};

var data = [
    [ 13, 1111, 'hello world r'],
    [ 2, 2234, 'hello world 1'],
    [ 3, 2334, 'hello world 2'],
    [ 4, 1223, 'hello world 3']
];


var writer = new parquet.ParquetWriter('/tmp/my_file.parquet', schema);
writer.write(data);
writer.close();

And this is the code I'm reading the Parquet file:

var fs = require('fs');
var parquet = require('node-parquet');

var file = '/tmp/my_file.parquet';

var reader = new parquet.ParquetReader(file);
console.log(reader.info());
console.log(reader.rows());
reader.close();

And this is the result I'm getting:

{ version: 0,
createdBy: 'parquet-cpp version 1.0.0',
rowGroups: 1,
columns: 3,
rows: 4 }
[ [ undefined, 1111, 'hello world r' ],
[ 2, 2234, 'hello world 1' ],
[ 3, 2334, 'hello world 2' ],
[ 4, 1223, 'hello world 3' ] ]

As you can see the number 13 is shown as undefined. If I add a more complex schema more integers are shown as undefined.

I'm running AWS Linux, Node 8.2.1

Any idea?

Hangs on error conditions

First of all, thank you for creating a Node module that performs this very special task and for sharing it with us!

I've found two different conditions that cause the host program to hang indefinitely (in Node 8.9.0), never reporting an error or any hint of what the issue could be. I debugged these problems through a couple of arduous debug sessions:

  1. Writing data arrays that contain null/undefined instead of empty Array indexes for optional fields.
    What is an empty Array index? How does one create an empty Array index? The answer is that it is impossible to explicitly create an empty index. Instead, you have to create an empty Array and then set only the indexes that should contain a value. The remaining indexes are "empty", a little-known aspect of JavaScript Arrays.
  2. Writing a parquet file to a directory that does not exist.

Each of these are not necessarily problems, especially if they're documented. The problem is that this module does not throw errors, instead it hangs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.