Comments (5)
A reproducer just with pyarrow as well (using your registered ext type example):
class DummyArray:
def __init__(self):
self._field = pa.field("", pa.null(), metadata={"ARROW:extension:name": "arrow.test"})
self._array = pa.array([None, None], pa.null())
def __arrow_c_array__(self, requested_schema=None):
return self._field.__arrow_c_schema__(), self._array.__arrow_c_array__()[1]
pa.array(DummyArray())
which crashes, and when adding "ARROW:extension:metadata": ""
to the dict, it works.
Now, apart from your fix (thanks for that!), I realize from the example that it is right now actually impossible to register an extension type without metadata (at least from Python).
I don't know it we should fix that (e.g. allow __arrow_ext_serialize__
to return None in addition to bytes?), or whether this is OK as long as we ensure this still works when actually receiving data with extension types without metadata (i.e. your fix for this issue)
from arrow.
We should probably also check that this works in other places that imports extension types, like reading IPC.
from arrow.
For IPC it seems to work:
table = pa.Table.from_arrays(
[pa.array([None, None], pa.null())],
schema=pa.schema([pa.field("a", pa.null(), metadata={"ARROW:extension:name": "arrow.test"})])
)
with pa.ipc.new_file("/tmp/test_ext_no_meta.arrow", table.schema) as f:
f.write(table)
with pa.ipc.open_file("/tmp/test_ext_no_meta.arrow") as f:
result = f.read_all()
print(result.schema)
pa.register_extension_type(DummyExtType())
with pa.ipc.open_file("/tmp/test_ext_no_meta.arrow") as f:
result = f.read_all()
print(result.schema)
And the same for write/read Parquet (where we get the extension type from the ARROW:schema)
from arrow.
Good call! It looks like the version of this that IPC and Parquet (which I believe uses the IPC Schema encoding) use is here and has more or less the same logic.
arrow/cpp/src/arrow/ipc/metadata_internal.cc
Lines 871 to 892 in 065a6da
(Also good call on the registration not mattering!)
from arrow.
Issue resolved by pull request 41763
#41763
from arrow.
Related Issues (20)
- [CI][Conda] The CondaEnvironment@1 (Conda environment) task has been deprecated since February 13, 2019 and will soon be retired HOT 1
- [CI][Integration][Release] RC verification script failed
- [C++] [Python] Add functionality of `STSProfileCredentialsProvider` to default credentials chain for `S3FileSystem` HOT 4
- [C++][S3] Remove GetBucketRegion hack for never AWS SDK versions
- [Java] Transition from gradle-enterprise-maven-extension to develocity-maven-extension HOT 1
- [MATLAB] Add C Data Interface format import/export functionality for `arrow.tabular.RecordBatch` HOT 1
- [Swift] Add Struct (Nested) types
- [GLib][CI] Use vcpkg for C++ dependencies when building GLib libraries with MSVC
- [Java] Jni mvn generate-resources failed because not generate arrow-bom HOT 4
- [C++] Add Compute Kernel for Casting from struct to string
- [C++] Add Compute Kernel for Casting from union to string
- writing/reading parquet enum types from pyarrow
- [Python] Table.from_arrow can't import nan values into a non-null float column
- [R] Segfault when collecting parquet dataset query results HOT 7
- [Python] `pyarrow.write_feather` can't be used in `atexit` contexts to write a `pandas.DataFrame` HOT 1
- [C++] Meson Support for Arrow HOT 4
- [C++][Parquet] Unify normalize dictionary encoding handling
- [C++][Parquet][Benchmark] Adding benchmarking for reading Statistics
- [R] Update relative URLs in README to absolute paths to prevent CRAN check failures
- Fields within a null struct are not initialized with null values HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow.