apache / arrow Goto Github PK

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

License: Apache License 2.0

Makefile 0.06% C++ 53.41% C 2.27% Shell 0.85% Ruby 3.34% Batchfile 0.06% CMake 1.43% Python 6.17% Java 15.43% FreeMarker 0.01% JavaScript 0.26% HTML 0.01% TypeScript 2.06% Lua 0.02% Go 11.01% Awk 0.01% Meson 0.09% Dockerfile 0.27% Thrift 0.07% R 3.17%

arrow

arrow's Introduction

Apache Arrow

Powering In-Memory Analytics

Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.

Major components of the project include:

The Arrow Columnar In-Memory Format: a standard and efficient in-memory representation of various datatypes, plain or nested
The Arrow IPC Format: an efficient serialization of the Arrow format and associated metadata, for communication between processes and heterogeneous environments
The Arrow Flight RPC protocol: based on the Arrow IPC format, a building block for remote services exchanging Arrow data with application-defined semantics (for example a storage server or a database)
C++ libraries
C bindings using GLib
C# .NET libraries
Gandiva: an LLVM-based Arrow expression compiler, part of the C++ codebase
Go libraries
Java libraries
JavaScript libraries
Python libraries
R libraries
Ruby libraries
Rust libraries

Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.

What's in the Arrow libraries?

The reference Arrow libraries contain many distinct software components:

Columnar vector and table-like containers (similar to data frames) supporting flat or nested types
Fast, language agnostic metadata messaging layer (using Google's Flatbuffers library)
Reference-counted off-heap buffer memory management, for zero-copy memory sharing and handling memory-mapped files
IO interfaces to local and remote filesystems
Self-describing binary wire formats (streaming and batch/file-like) for remote procedure calls (RPC) and interprocess communication (IPC)
Integration tests for verifying binary compatibility between the implementations (e.g. sending data from Java to C++)
Conversions to and from other in-memory data structures
Readers and writers for various widely-used file formats (such as Parquet, CSV)

Implementation status

The official Arrow libraries in this repository are in different stages of implementing the Arrow format and related features. See our current feature matrix on git main.

How to Contribute

Please read our latest project contribution guide.

Getting involved

Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved:

Join the mailing list: send an email to [email protected]. Share your ideas and use cases for the project.
Follow our activity on GitHub issues
Learn the format
Contribute code to one of the reference implementations

arrow's People

Contributors

Stargazers

Watchers

Forkers

jacques-n smarthi hyukjinkwon laurentgo bitisony xiangfu0 kalyankumarpichuka ted12214 narayana-glassbeam ccko guptam quinnlin wesm songyi10011001 hskimsky mpoooooo hnfgns yunliaw stegben lw-lin swathimystery sunchao codingcat brandonhaynes hadooping baiyunping333 prog012 ashwinaravind tijoparacka flykobe zuowang cjnolet zhe-thoughts wangke19910912 kiszk stevenmphillips klucar nonvolatilecomputing hdfeos zhonghongxia nikolayvoronchikhin drankye liqingfei fanlu xxwwbb3 ganeshraju akhld paulmw taigetco hhy5277 dremio ebegoli anoordover xhochy sdecoder vivekdudani spiritobsessioon danrobinson emkornfield cboatwri-mix nagabharat monte-hayward cfregly cophy08 gallenvara aristide sirpkt fengguangyuan wligtenberg pombredanne linearregression charles2588 butterkaffee ray-project younjinjeong skaarthik lfzcarlosc tspannhw sjanulonoks rowhit realforce1024 jihoonson intellifora adomore srcclrapache1 zhangh43 gatorsmile cuixiongyi weichenxu123 nkhuyu sdaingade vkorukanti holdenk smyatkin-maxim apsaltis kiril-me oza mechcoder sandeep-n desperado1992

arrow's Issues

[C++][Parquet] minor compilation issue

I find out some very minor issue when I tried to compile the reader on my environment due to some namespace clashing.
As example shared_ptr and unordered_map are also in C++11 std namespace. Some compile don't like it.

Also I find that with my test files that I'm reading there was a dereference to a null pointer, if the field is required definition_level_decoder_ is null.

I've make the fork and the change https://github.com/ffabbri4/incubator-parquet-cpp/tree/candidate

Reporter: Fabrizio Fabbri / @ffabbri4
Assignee: Fabrizio Fabbri / @ffabbri4

_{Note: This issue was originally created as PARQUET-232. Please see the migration documentation for further details.}

[C++][Parquet] Error handling: C++ exceptions or Status

This library currently throws C++ exceptions. I would very much prefer to use Google's convention of using Status objects to communicate errors and force explicit action to be taken on the part of the developer if an error occurs in a particular function call. It will also make it much easier to incorporate libparquet into other libraries that do not use C++ exceptions, and also to provide an ANSI C API wrapper.

Reporter: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-440. Please see the migration documentation for further details.}

[C++][Parquet] onvert flat SchemaElement vector to implied nested schema data structure

To assist with conversion to in-memory nested data structures. Related: PARQUET-441

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

Related issues:

Add ParquetFilePrinter (is depended upon by)

_{Note: This issue was originally created as PARQUET-442. Please see the migration documentation for further details.}

[C++][Parquet] Schema resolution: map encoding

Related: PARQUET-441 and PARQUET-442

Reporter: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-443. Please see the migration documentation for further details.}

[C++][Parquet] Roll back Thrift bindings to 0.9.0

Thrift 0.9.3 conflicts with googletest in ugly ways on gcc 4.9. This is a stopgap until PARQUET-468

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-469. Please see the migration documentation for further details.}

[C++][Parquet] Add zlib codec support

See apache/parquet-cpp#11

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-456. Please see the migration documentation for further details.}

[C++][Parquet] Refactor parquet_reader.cc into a ParquetFileReader::DebugPrint method

This is follow up work per discussion in PARQUET-418 and apache/parquet-cpp#18

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-453. Please see the migration documentation for further details.}

[C++][Parquet] Run DebugPrint on all data files in the data/ directory

As a smoke test. Follow-up to PARQUET-453

Reporter: Wes McKinney / @wesm
Assignee: Aliaksei Sandryhaila / @asandryh

_{Note: This issue was originally created as PARQUET-475. Please see the migration documentation for further details.}

[C++][Parquet] Implement ParquetFileWriter class entry point for generating new Parquet files

Reporter: Wes McKinney / @wesm
Assignee: Uwe Korn / @xhochy

_{Note: This issue was originally created as PARQUET-436. Please see the migration documentation for further details.}

[C++][Parquet] onform all copyright headers to ASF requirements

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-439. Please see the migration documentation for further details.}

[C++][Parquet] Schema resolution: one, two, and three-level array encoding

While the Parquet spec recommends the "three-level" array encoding, two other styles are possible in the wild, see for example:

https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/hdfs-parquet-scanner.cc#L1986

Reporter: Wes McKinney / @wesm
Assignee: Micah Kornfield / @emkornfield

_{Note: This issue was originally created as PARQUET-441. Please see the migration documentation for further details.}

[C++][Parquet] Provide vectorized ColumnReader interface

Related to PARQUET-433. The library user should be able to retrieve a batch of column values, repetition levels, or definition levels with a particular size into a preallocated C array.

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-435. Please see the migration documentation for further details.}

[C++][Parquet] Parquet-cpp: Implement support for bulk reading and writing repetition/definition levels.

Currently , parquet-cpp only supports reading definition levels.
Extend the code to read/write repetition/definition levels.

Reporter: Nong Li / @nongli
Assignee: Deepak Majeti / @majetideepak

_{Note: This issue was originally created as PARQUET-169. Please see the migration documentation for further details.}

[C++][Parquet] 11, cpplint cleanup, package target and header installation

I'm planning to work on building out parquet-cpp with columnar data structures (see Arrow proposal) for materialized in-memory data and feature complete reader/writers so that native-code consumers like Python can finally read and write Parquet files at native speeds. It would be great to have all this officially a part of Apache Parquet.

This adds minimal support to be able to install the resulting libparquet.so and its various header files to support minimally viable development on downstream C++ and Python projects that will need to depend on this. It also builds in C++11 mode and passes Google's cpplint.

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

Related issues:

Detach thirdparty code from build configuration. (is related to)

Externally tracked issue: apache/parquet-cpp#14

_{Note: This issue was originally created as PARQUET-416. Please see the migration documentation for further details.}

[C++][Parquet] Make parquet-format a git submodule and add tool for updating generated Thrift code

As a follow up to PARQUET-449

Reporter: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-466. Please see the migration documentation for further details.}

[C++][Parquet] Add a cmake option to generate the Parquet thrift headers with the thriftc in the environment

Follow-up to PARQUET-449. This will help toolchains which are unable to upgrade to the latest version of Thrift.

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-468. Please see the migration documentation for further details.}

[C++][Parquet] lean up InputStream ownership semantics in ColumnReader

Follow-up to PARQUET-418, PARQUET-433. The ColumnReader destructor uses delete on an InputStream*. The lifetime of this object should be managed by a std::unique_ptr.

Reporter: Wes McKinney / @wesm
Assignee: Aliaksei Sandryhaila / @asandryh

_{Note: This issue was originally created as PARQUET-472. Please see the migration documentation for further details.}

[C++][Parquet] Add a RowGroup writer interface class

Per PARQUET-436; as soon as we are able to begin constructing new Parquet files, we can provide an interface class for writing data to a new row group, which will automatically set the appropriate Thrift metadata

Reporter: Wes McKinney / @wesm
Assignee: Uwe Korn / @xhochy

_{Note: This issue was originally created as PARQUET-452. Please see the migration documentation for further details.}

[C++][Parquet] Add compressed data page unit tests

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-457. Please see the migration documentation for further details.}

[C++][Parquet] Specialize ColumnReaders based on the column type

ColumnReader class is used to read columns of all types. This leads to a lot of type checking and 'switch' statements. ColumnReaders should be specialized to different types, while sharing the same interface.

Reporter: Aliaksei Sandryhaila / @asandryh
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-433. Please see the migration documentation for further details.}

[C++][Parquet] Add a RowGroup reader interface class

Currently the logic for interacting with row group metadata and constructing column decoders is embedded in the parquet_reader.cc executable here:

https://github.com/apache/parquet-cpp/blob/master/example/parquet_reader.cc

With PARQUET-434, we have a file reader container, which can then provide a row group reader container, something like

RowGroupReader* group_reader = file_reader->row_group(i);

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-451. Please see the migration documentation for further details.}

[C++][Parquet] Add DCHECK* macros for assertions in debug builds

Some of these macros are already defined in parquet/util/logging.h, but they are no-ops. This will assist in "can't fail" assertions. See https://www.chromium.org/developers/coding-style#TOC-CHECK-DCHECK-and-NOTREACHED-

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-463. Please see the migration documentation for further details.}

[C++][Parquet] Hide thrift dependency in parquet-cpp

Pulling in thrift compiled headers tend to pull in a lot of things. It would be nice to not expose them in the parquet library (the application should be able to use a different version of thrift, etc).

We can also see if it is practical to not depend on thrift at all and replicate the logic we need. Thrift is fairly stable at this point so this might be feasible. This would allow us to do things like not rely on boost.

Reporter: Nong Li / @nongli
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-446. Please see the migration documentation for further details.}

[C++][Parquet] Add a utility function to print the raw repetition / definition levels to an std::ostream

This will facilitate development of nested data features and debugging

Reporter: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-476. Please see the migration documentation for further details.}

[C++][Parquet] Implement support for DataPageV2

Reporter: Wes McKinney / @wesm
Assignee: Hatem Helal / @hatemhelal

Related issues:

PRs and other links:

GitHub Pull Request #6481

_{Note: This issue was originally created as PARQUET-458. Please see the migration documentation for further details.}

[C++][Parquet] Metadata generation: Nested physical schema builder

The idea here is to define a simple API for creating logical schemas, which will be then automatically flattened in DFS order to a vector of SchemaElement structs. This will spare users from having to necessarily implement their own flattening / unflattening code

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-444. Please see the migration documentation for further details.}

[C++][Parquet] Implement and test BIT_PACKED level encoding / decoding

While RLE is the preferred encoding format (and BIT_PACKED is deprecated in Parquet 2.0), we will need to support this encoding format for legacy Parquet files that use it. As part of this JIRA we will verify round-tripping levels to this encoding format.

[C++][Parquet] Support Travis CI in parquet-cpp

Having a continuous build env helps ensure that pull requests compile and pass tests. It provides valuable feedback for ensuring various environments support desired changes.

Pull request that gets Travis CI - GitHub integration up and running for parquet-cpp:
apache/parquet-cpp#9

Reporter: Kalon Mills / @kalaxy
Assignee: Kalon Mills / @kalaxy

_{Note: This issue was originally created as PARQUET-259. Please see the migration documentation for further details.}

[C++][Parquet] Unable to Install C++ Driver - reference to 'share_ptr' is ambiguous

Install commands worked up until the make cmd

Aarons-MBP:parquet-cpp Aaron$ make
Scanning dependencies of target ThriftParquet
[ 12%] Building CXX object generated/gen-cpp/CMakeFiles/ThriftParquet.dir/parquet_constants.cpp.o
[ 25%] Building CXX object generated/gen-cpp/CMakeFiles/ThriftParquet.dir/parquet_types.cpp.o
Linking CXX static library ../../build/libThriftParquet.a
[ 25%] Built target ThriftParquet
Scanning dependencies of target Parquet
[ 37%] Building CXX object src/CMakeFiles/Parquet.dir/parquet.cc.o
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:79:5: warning: variable 'value_byte_size'
is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized]
default:
^~~~~~~
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:94:46: note: uninitialized use occurs here
values_buffer_.resize(config_.batch_size * value_byte_size);
^~~~~~~~~~~~~~~
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:59:22: note: initialize the variable
'value_byte_size' to silence this warning
int value_byte_size;
^
= 0
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:167:37: error: reference to 'shared_ptr' is
ambiguous
unordered_map<Encoding::type, shared_ptr >::iterator it =
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:717:36: note: candidate found by name
lookup is 'boost::shared_ptr'
template friend class shared_ptr;
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:3750:29: note:
candidate found by name lookup is 'std::1::shared_ptr'
class LIBCPP_TYPE_VIS_ONLY shared_ptr
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:167:48: error: 'Decoder' does not refer to
a value
unordered_map<Encoding::type, shared_ptr >::iterator it =
^
/Users/Aaron/myProgs/parquet-cpp/src/encodings/encodings.h:27:7: note: declared here
class Decoder {
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:169:11: error: use of undeclared identifier
'it'
if (it != decoders.end()) {
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:176:7: error: reference to 'shared_ptr' is
ambiguous
shared_ptr decoder(new DictionaryDecoder(schema->type, &dictionary));
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:717:36: note: candidate found by name
lookup is 'boost::shared_ptr'
template friend class shared_ptr;
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:3750:29: note:
candidate found by name lookup is 'std::1::shared_ptr'
class LIBCPP_TYPE_VIS_ONLY shared_ptr
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:177:45: error: use of undeclared identifier
'decoder'; did you mean 'decoders'?
decoders[Encoding::RLE_DICTIONARY] = decoder;
^~~~~~~
decoders
/Users/Aaron/myProgs/parquet-cpp/src/parquet/parquet.h:152:78: note: 'decoders' declared
here
boost::unordered_map<parquet::Encoding::type, boost::shared_ptr > decoders_;
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:177:43: error: no viable overloaded '='
decoders_[Encoding::RLE_DICTIONARY] = decoder;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:500:18: note: candidate function not
viable: no known conversion from 'boost::unordered_map<parquet::Encoding::type,
boost::shared_ptr >' to 'const boost::shared_ptr<parquet_cpp::Decoder>' for
1st argument
shared_ptr & operator=( shared_ptr const & r ) BOOST_NOEXCEPT
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:509:18: note: candidate template ignored:
could not match 'shared_ptr' against 'unordered_map'
shared_ptr & operator=(shared_ptr const & r) BOOST_NOEXCEPT
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:520:18: note: candidate template ignored:
could not match 'auto_ptr' against 'unordered_map'
shared_ptr & operator=( std::auto_ptr & r )
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:538:77: note: candidate template ignored:
substitution failure [with Ap =
boost::unordered::unordered_map<parquet::Encoding::type,
boost::shared_ptr<parquet_cpp::Decoder>, boost::hashparquet::Encoding::type,
std::__1::equal_toparquet::Encoding::type, std::__1::allocator<std::__1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >]: no type
named 'type' in
'boost::detail::sp_enable_if_auto_ptr<boost::unordered::unordered_map<parquet::Encoding::type,
boost::shared_ptr<parquet_cpp::Decoder>, boost::hashparquet::Encoding::type,
std::__1::equal_toparquet::Encoding::type, std::__1::allocator<std::__1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >,
boost::shared_ptr<parquet_cpp::Decoder> &>'
typename boost::detail::sp_enable_if_auto_ptr< Ap, shared_ptr & >::type operato...
~~~~ ^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:202:37: error: reference to 'shared_ptr' is
ambiguous
unordered_map<Encoding::type, shared_ptr >::iterator it =
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:717:36: note: candidate found by name
lookup is 'boost::shared_ptr'
template friend class shared_ptr;
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:3750:29: note:
candidate found by name lookup is 'std::1::shared_ptr'
class LIBCPP_TYPE_VIS_ONLY shared_ptr
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:202:48: error: 'Decoder' does not refer to
a value
unordered_map<Encoding::type, shared_ptr >::iterator it =
^
/Users/Aaron/myProgs/parquet-cpp/src/encodings/encodings.h:27:7: note: declared here
class Decoder {
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:204:11: error: use of undeclared identifier
'it'
if (it != decoders.end()) {
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:205:28: error: use of undeclared identifier
'it'
current_decoder = it->second.get();
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:209:13: error: reference to 'shared_ptr' is
ambiguous
shared_ptr decoder;
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:717:36: note: candidate found by name
lookup is 'boost::shared_ptr'
template friend class shared_ptr;
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:3750:29: note:
candidate found by name lookup is 'std::1::shared_ptr'
class LIBCPP_TYPE_VIS_ONLY shared_ptr
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:211:15: error: use of undeclared identifier
'decoder'; did you mean 'decoders'?
decoder.reset(new BoolDecoder());
^~~~~~~
decoders
/Users/Aaron/myProgs/parquet-cpp/src/parquet/parquet.h:152:78: note: 'decoders' declared
here
boost::unordered_map<parquet::Encoding::type, boost::shared_ptr > decoders;
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:211:23: error: no member named 'reset' in
'boost::unordered::unordered_map<parquet::Encoding::type,
boost::shared_ptr<parquet_cpp::Decoder>, boost::hashparquet::Encoding::type,
std::1::equal_toparquet::Encoding::type, std::1::allocator<std::1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >'
decoder.reset(new BoolDecoder());
~~~~~~~ ^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:213:15: error: use of undeclared identifier
'decoder'; did you mean 'decoders'?
decoder.reset(new PlainDecoder(schema->type));
^~~~~~~
decoders
/Users/Aaron/myProgs/parquet-cpp/src/parquet/parquet.h:152:78: note: 'decoders' declared
here
boost::unordered_map<parquet::Encoding::type, boost::shared_ptr > decoders;
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:213:23: error: no member named 'reset' in
'boost::unordered::unordered_map<parquet::Encoding::type,
boost::shared_ptr<parquet_cpp::Decoder>, boost::hashparquet::Encoding::type,
std::1::equal_toparquet::Encoding::type, std::1::allocator<std::1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >'
decoder.reset(new PlainDecoder(schema->type));
~~~~~~~ ^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:215:35: error: use of undeclared identifier
'decoder'; did you mean 'decoders'?
decoders[encoding] = decoder;
^~~~~~~
decoders
/Users/Aaron/myProgs/parquet-cpp/src/parquet/parquet.h:152:78: note: 'decoders' declared
here
boost::unordered_map<parquet::Encoding::type, boost::shared_ptr > decoders;
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:215:33: error: no viable overloaded '='
decoders[encoding] = decoder;
~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:500:18: note: candidate function not
viable: no known conversion from 'boost::unordered_map<parquet::Encoding::type,
boost::shared_ptr >' to 'const boost::shared_ptr<parquet_cpp::Decoder>' for
1st argument
shared_ptr & operator=( shared_ptr const & r ) BOOST_NOEXCEPT
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:509:18: note: candidate template ignored:
could not match 'shared_ptr' against 'unordered_map'
shared_ptr & operator=(shared_ptr const & r) BOOST_NOEXCEPT
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:520:18: note: candidate template ignored:
could not match 'auto_ptr' against 'unordered_map'
shared_ptr & operator=( std::auto_ptr & r )
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:538:77: note: candidate template ignored:
substitution failure [with Ap =
boost::unordered::unordered_map<parquet::Encoding::type,
boost::shared_ptr<parquet_cpp::Decoder>, boost::hashparquet::Encoding::type,
std::__1::equal_toparquet::Encoding::type, std::__1::allocator<std::__1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >]: no type
named 'type' in
'boost::detail::sp_enable_if_auto_ptr<boost::unordered::unordered_map<parquet::Encoding::type,
boost::shared_ptr<parquet_cpp::Decoder>, boost::hashparquet::Encoding::type,
std::_1::equal_toparquet::Encoding::type, std::1::allocator<std::1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >,
boost::shared_ptr<parquet_cpp::Decoder> &>'
typename boost::detail::sp_enable_if_auto_ptr< Ap, shared_ptr & >::type operato...
~~~~ ^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:216:32: error: use of undeclared identifier
'decoder'; did you mean 'decoders'?
current_decoder = decoder.get();
^~~~~~~
decoders
/Users/Aaron/myProgs/parquet-cpp/src/parquet/parquet.h:152:78: note: 'decoders' declared
here
boost::unordered_map<parquet::Encoding::type, boost::shared_ptr > decoders;
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:216:40: error: no member named 'get' in
'boost::unordered::unordered_map<parquet::Encoding::type,
boost::shared_ptr<parquet_cpp::Decoder>, boost::hashparquet::Encoding::type,
std::_1::equal_toparquet::Encoding::type, std::1::allocator<std::1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >'
current_decoder = decoder.get();
~~~~~~~ ^
1 warning and 19 errors generated.
make[2]: *** [src/CMakeFiles/Parquet.dir/parquet.cc.o] Error 1
make[1]: *** [src/CMakeFiles/Parquet.dir/all] Error 2
make: *** [all] Error 2
Aarons-MBP:parquet-cpp Aaron$
Aarons-MBP:parquet-cpp Aaron$ git pull
Already up-to-date.
Aarons-MBP:parquet-cpp Aaron$ make
[ 25%] Built target ThriftParquet
[ 37%] Building CXX object src/CMakeFiles/Parquet.dir/parquet.cc.o
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:79:5: warning: variable 'value_byte_size' is used uninitialized whenever switch default is
taken [-Wsometimes-uninitialized]
default:
^~~~~~~
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:94:46: note: uninitialized use occurs here
values_buffer.resize(config.batch_size * value_byte_size);
^~~~~~~~~~~~~~~
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:59:22: note: initialize the variable 'value_byte_size' to silence this warning
int value_byte_size;
^
= 0
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:167:37: error: reference to 'shared_ptr' is ambiguous
unordered_map<Encoding::type, shared_ptr >::iterator it =
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:717:36: note: candidate found by name lookup is 'boost::shared_ptr'
template friend class shared_ptr;
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:3750:29: note: candidate
found by name lookup is 'std::1::shared_ptr'
class LIBCPP_TYPE_VIS_ONLY shared_ptr
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:167:48: error: 'Decoder' does not refer to a value
unordered_map<Encoding::type, shared_ptr >::iterator it =
^
/Users/Aaron/myProgs/parquet-cpp/src/encodings/encodings.h:27:7: note: declared here
class Decoder {
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:169:11: error: use of undeclared identifier 'it'
if (it != decoders.end()) {
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:176:7: error: reference to 'shared_ptr' is ambiguous
shared_ptr decoder(new DictionaryDecoder(schema->type, &dictionary));
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:717:36: note: candidate found by name lookup is 'boost::shared_ptr'
template friend class shared_ptr;
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:3750:29: note: candidate
found by name lookup is 'std::1::shared_ptr'
class LIBCPP_TYPE_VIS_ONLY shared_ptr
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:177:45: error: use of undeclared identifier 'decoder'; did you mean 'decoders'?
decoders[Encoding::RLE_DICTIONARY] = decoder;
^~~~~~~
decoders
/Users/Aaron/myProgs/parquet-cpp/src/parquet/parquet.h:152:78: note: 'decoders' declared here
boost::unordered_map<parquet::Encoding::type, boost::shared_ptr > decoders;
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:177:43: error: no viable overloaded '='
decoders[Encoding::RLE_DICTIONARY] = decoder;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:500:18: note: candidate function not viable: no known conversion from
'boost::unordered_map<parquet::Encoding::type, boost::shared_ptr >' to 'const boost::shared_ptr<parquet_cpp::Decoder>' for 1st
argument
shared_ptr & operator=( shared_ptr const & r ) BOOST_NOEXCEPT
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:509:18: note: candidate template ignored: could not match 'shared_ptr' against
'unordered_map'
shared_ptr & operator=(shared_ptr const & r) BOOST_NOEXCEPT
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:520:18: note: candidate template ignored: could not match 'auto_ptr' against
'unordered_map'
shared_ptr & operator=( std::auto_ptr & r )
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:538:77: note: candidate template ignored: substitution failure [with Ap =
boost::unordered::unordered_map<parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder>, boost::hashparquet::Encoding::type,
std::__1::equal_toparquet::Encoding::type, std::__1::allocator<std::__1::pair<const parquet::Encoding::type,
boost::shared_ptr<parquet_cpp::Decoder> > > >]: no type named 'type' in
'boost::detail::sp_enable_if_auto_ptr<boost::unordered::unordered_map<parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder>,
boost::hashparquet::Encoding::type, std::__1::equal_toparquet::Encoding::type, std::__1::allocator<std::__1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >, boost::shared_ptr<parquet_cpp::Decoder> &>'
typename boost::detail::sp_enable_if_auto_ptr< Ap, shared_ptr & >::type operator=( Ap r )
~~~~ ^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:202:37: error: reference to 'shared_ptr' is ambiguous
unordered_map<Encoding::type, shared_ptr >::iterator it =
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:717:36: note: candidate found by name lookup is 'boost::shared_ptr'
template friend class shared_ptr;
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:3750:29: note: candidate
found by name lookup is 'std::1::shared_ptr'
class LIBCPP_TYPE_VIS_ONLY shared_ptr
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:202:48: error: 'Decoder' does not refer to a value
unordered_map<Encoding::type, shared_ptr >::iterator it =
^
/Users/Aaron/myProgs/parquet-cpp/src/encodings/encodings.h:27:7: note: declared here
class Decoder {
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:204:11: error: use of undeclared identifier 'it'
if (it != decoders.end()) {
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:205:28: error: use of undeclared identifier 'it'
current_decoder = it->second.get();
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:209:13: error: reference to 'shared_ptr' is ambiguous
shared_ptr decoder;
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:717:36: note: candidate found by name lookup is 'boost::shared_ptr'
template friend class shared_ptr;
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:3750:29: note: candidate
found by name lookup is 'std::1::shared_ptr'
class LIBCPP_TYPE_VIS_ONLY shared_ptr
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:211:15: error: use of undeclared identifier 'decoder'; did you mean 'decoders'?
decoder.reset(new BoolDecoder());
^~~~~~~
decoders
/Users/Aaron/myProgs/parquet-cpp/src/parquet/parquet.h:152:78: note: 'decoders' declared here
boost::unordered_map<parquet::Encoding::type, boost::shared_ptr > decoders;
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:211:23: error: no member named 'reset' in
'boost::unordered::unordered_map<parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder>,
boost::hashparquet::Encoding::type, std::1::equal_toparquet::Encoding::type, std::1::allocator<std::1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >'
decoder.reset(new BoolDecoder());
~~~~~~~ ^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:213:15: error: use of undeclared identifier 'decoder'; did you mean 'decoders'?
decoder.reset(new PlainDecoder(schema->type));
^~~~~~~
decoders
/Users/Aaron/myProgs/parquet-cpp/src/parquet/parquet.h:152:78: note: 'decoders' declared here
boost::unordered_map<parquet::Encoding::type, boost::shared_ptr > decoders;
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:213:23: error: no member named 'reset' in
'boost::unordered::unordered_map<parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder>,
boost::hashparquet::Encoding::type, std::1::equal_toparquet::Encoding::type, std::1::allocator<std::1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >'
decoder.reset(new PlainDecoder(schema->type));
~~~~~~~ ^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:215:35: error: use of undeclared identifier 'decoder'; did you mean 'decoders'?
decoders[encoding] = decoder;
^~~~~~~
decoders
/Users/Aaron/myProgs/parquet-cpp/src/parquet/parquet.h:152:78: note: 'decoders' declared here
boost::unordered_map<parquet::Encoding::type, boost::shared_ptr > decoders;
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:215:33: error: no viable overloaded '='
decoders[encoding] = decoder;
~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:500:18: note: candidate function not viable: no known conversion from
'boost::unordered_map<parquet::Encoding::type, boost::shared_ptr >' to 'const boost::shared_ptr<parquet_cpp::Decoder>' for 1st
argument
shared_ptr & operator=( shared_ptr const & r ) BOOST_NOEXCEPT
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:509:18: note: candidate template ignored: could not match 'shared_ptr' against
'unordered_map'
shared_ptr & operator=(shared_ptr const & r) BOOST_NOEXCEPT
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:520:18: note: candidate template ignored: could not match 'auto_ptr' against
'unordered_map'
shared_ptr & operator=( std::auto_ptr & r )
^
/usr/local/include/boost/smart_ptr/shared_ptr.hpp:538:77: note: candidate template ignored: substitution failure [with Ap =
boost::unordered::unordered_map<parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder>, boost::hashparquet::Encoding::type,
std::__1::equal_toparquet::Encoding::type, std::__1::allocator<std::__1::pair<const parquet::Encoding::type,
boost::shared_ptr<parquet_cpp::Decoder> > > >]: no type named 'type' in
'boost::detail::sp_enable_if_auto_ptr<boost::unordered::unordered_map<parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder>,
boost::hashparquet::Encoding::type, std::_1::equal_toparquet::Encoding::type, std::1::allocator<std::1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >, boost::shared_ptr<parquet_cpp::Decoder> &>'
typename boost::detail::sp_enable_if_auto_ptr< Ap, shared_ptr & >::type operator=( Ap r )
~~~~ ^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:216:32: error: use of undeclared identifier 'decoder'; did you mean 'decoders'?
current_decoder = decoder.get();
^~~~~~~
decoders
/Users/Aaron/myProgs/parquet-cpp/src/parquet/parquet.h:152:78: note: 'decoders' declared here
boost::unordered_map<parquet::Encoding::type, boost::shared_ptr > decoders;
^
/Users/Aaron/myProgs/parquet-cpp/src/parquet.cc:216:40: error: no member named 'get' in
'boost::unordered::unordered_map<parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder>,
boost::hashparquet::Encoding::type, std::__1::equal_toparquet::Encoding::type, std::__1::allocator<std::_1::pair<const
parquet::Encoding::type, boost::shared_ptr<parquet_cpp::Decoder> > > >'
current_decoder = decoder.get();

Environment: Mac Mavericks
Reporter: Aaron Benz

_{Note: This issue was originally created as PARQUET-238. Please see the migration documentation for further details.}

[C++][Parquet] Support INT96 and FIXED_LEN_BYTE_ARRAY types

I would like to add support for INT96 and FIXED_LEN_BYTE_ARRAY parquet types.
Hive data types DATE and TIMESTAMP get mapped to INT96 parquet type.
Hive DECIMAL gets mapped to parquet FIXED_LEN_BYTE_ARRAY type.

Reporter: Deepak Majeti / @majetideepak
Assignee: Deepak Majeti / @majetideepak

_{Note: This issue was originally created as PARQUET-428. Please see the migration documentation for further details.}

[C++][Parquet] Remove boost dependency

At a glance, parquet-cpp slightly uses boost dependency. It seems to be possible to remove boost dependency if we use C++11 feature.

If we remove boost dependency, parquet-cpp can be more portable and lightweight. Also, C+<u>11 would allow us to modernize C+ codes.

Reporter: Hyunsik Choi
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-277. Please see the migration documentation for further details.}

[C++][Parquet] Thrift 0.9.3 cannot be used in conjunction with googletest and C++11 on Linux

Thrift 0.9.3 introduces a #include <thrift/cxxfunctional.h> include which causes tr1/functional to be included, causing a compiler conflict with googletest, which has its own portability macros surrounding its use of std::tr1::tuple. I spent a bunch of time twiddling compiler flags to try to resolve this conflict, but wasn't able to figure it out.

If this is a Thrift bug, we should report it to Thrift. If it's fixable by compiler flags, then we should figure that out and track the issue here, otherwise users with the latest version of Thrift will be unable to compile the parquet-cpp test suite.

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-470. Please see the migration documentation for further details.}

[C++][Parquet] Batch/vectorized decoding of array sizes within each repetition level

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-445. Please see the migration documentation for further details.}

[C++][Parquet] Use the same environment setup script for Travis CI as local sandbox development

Currently the environment setups are slightly different, and so a passing Travis CI build might have a problem with the sandbox build and vice versa.

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-471. Please see the migration documentation for further details.}

[C++][Parquet] Update to latest parquet.thrift

Reporter: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-449. Please see the migration documentation for further details.}

[C++][Parquet] Incorporate googletest thirdparty dependency and add cmake tools (ADD_PARQUET_TEST) to simplify adding new unit tests

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-437. Please see the migration documentation for further details.}

[C++][Parquet] InputStream and RandomAccessSource classes are not threadsafe

We need to ensure that files can be processed in multithreaded applications

Reporter: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-474. Please see the migration documentation for further details.}

[C++][Parquet] Address inconsistencies in boolean decoding

See patch apache/parquet-cpp#12

I suggest adding unit tests to verify the fix proposed in this patch.

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-454. Please see the migration documentation for further details.}

[C++][Parquet] Update RLE encoder/decoder modules from Impala upstream changes and adapt unit tests

Depends on PARQUET-437

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-438. Please see the migration documentation for further details.}

[C++][Parquet] Add a ParquetFileReader class to encapsulate some low-level details of interacting with Parquet files

This is also related to PARQUET-418. I'm beginning work on an adapter between Parquet and in-memory C++ data structures, and it would be helpful for the moment to encapsulate various details like metadata deserialization.

This class can be expanded to include other features (such as yielding column readers) in future patches.

I've inspected the patch in apache/parquet-cpp#18 and expect there to be little overlap. @nongli if you can have a look at that and let us know how to proceed, that would be great.

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-434. Please see the migration documentation for further details.}

[C++][Parquet] Detach thirdparty code from build configuration.

The existing repo has source code for third party dependencies checked into the repo. The build system expects those dependencies in a certain place. This enforces that the built library conform to those exact dependencies without customization.

Managing third party dependencies is better handled through a build environment. It allows the library builder more flexibility over dependency versions and locations. It also cleans up the repo from this third party code.

Reporter: Kalon Mills / @kalaxy
Assignee: Kalon Mills / @kalaxy

Related issues:

C++11, cpplint cleanup, package target and header installation (relates to)

Externally tracked issue: apache/parquet-cpp#16

_{Note: This issue was originally created as PARQUET-267. Please see the migration documentation for further details.}

[C++][Parquet] Add cmake option and #defines to enable/disable struct packing

Follow-up to conversation on PARQUET-428. This will make it easier to run performance experiments without changing any code.

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-464. Please see the migration documentation for further details.}

[C++][Parquet] Improve ColumnReader API

I would like to add some more extensions to the ColumnReader API. These extensions will query certain fields from the corresponding schema element and the column metadata.

Reporter: Deepak Majeti / @majetideepak

_{Note: This issue was originally created as PARQUET-461. Please see the migration documentation for further details.}

[C++][Parquet] Implement a LevelDecoder class (like Impala) which dispatches to RLE or BIT_PACKED decoding as appropriate

This class extends the RleDecoder class.

Reporter: Deepak Majeti / @majetideepak
Assignee: Deepak Majeti / @majetideepak

_{Note: This issue was originally created as PARQUET-462. Please see the migration documentation for further details.}

[C++][Parquet] Add a utility to print contents of a Parquet file to stdout

To improve the usability/testability of parquet-cpp, the library needs a utility to print the contents of a Parquet file. incubator-parquet-cpp used to have a parquet_reader utility, but a) it was not ported to the Apache, and b) it had memory leaks and mismanaged file handles, and required a lot of improvement.

Using parquet_reader as a starting point, I will build a utility for printing a Parquet file contents.

Reporter: Aliaksei Sandryhaila / @asandryh

_{Note: This issue was originally created as PARQUET-418. Please see the migration documentation for further details.}

[C++][Parquet] Add cmake option to skip building the unit tests

This will speed up build-and-install in downstream applications

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-448. Please see the migration documentation for further details.}

[C++][Parquet] Add Debug and Release build types and associated compiler flags

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-447. Please see the migration documentation for further details.}

[C++][Parquet] Develop external predicate pushdown API for column readers

This will happen significantly downstream of where we are at right now, but we should be planning ahead to facilitate scanning Parquet files with externally-defined predicates as a primary use case.

I suggest that the most general (and high performance) predicate will be batch-oriented; i.e. the predicate will be passed a batch of materialized values from one or more columns, and it returns an array of booleans indicating whether or not the predicate is true. We can also develop a row-by-row "scalar" predicate API if users need that.

Reporter: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-473. Please see the migration documentation for further details.}

[C++][Parquet] Fix compiler warnings on OS X / Clang

There have been two patches addressing OS X-related issues:

apache/parquet-cpp#15
apache/parquet-cpp#6

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as PARQUET-455. Please see the migration documentation for further details.}

[C++][Parquet] Improve handling of null values

Currently, the default value of the type is returned for NULL values and is incorrect.
This JIRA will correctly identify a NULL value with the help of an additional variable that will be set for NULL values.
This feature depends on reading the repetition level (PARQUET-169).

Reporter: Deepak Majeti / @majetideepak
Assignee: Deepak Majeti / @majetideepak

_{Note: This issue was originally created as PARQUET-459. Please see the migration documentation for further details.}

apache / arrow Goto Github PK

arrow's Introduction

Apache Arrow

Powering In-Memory Analytics

What's in the Arrow libraries?

Implementation status

How to Contribute

Getting involved

arrow's People

Contributors

Stargazers

Watchers

Forkers

arrow's Issues

Related issues:

Related issues:

Externally tracked issue: apache/parquet-cpp#14

Related issues:

PRs and other links:

Related issues:

Externally tracked issue: apache/parquet-cpp#16

Recommend Projects

Recommend Topics

Recommend Org

Jobs