Comments (6)
This seems like a good idea, but my experience is that some (many?) VTK readers are not happy with non-string based paths or direct data being passed in binary/string form. If PyVista can transform user input to what VTK expects, it makes sense to me, particularly if we do not have to add any dependencies.
from pyvista.
The intern
package has an API which may be a helpful reference for implementing this feature in PyVista. The intern package is used for working with really big datasets. For example, this remote dataset from bossdb
https://bossdb.org/project/maher_briegel2023, is read with the following API:
# Import intern (pip install intern)
from intern import array
# Save a cutout to a numpy array in ZYX order:
channel = array("bossdb://MaherBriegel2023/Lgn200/sbem")
data = channel[30:36, 1024:2048, 1024:2048]
See the implementation code for intern.array
here:
https://github.com/jhuapl-boss/intern/blob/15073c6eed12e1372e2d0448ed1e874df827b3ba/intern/convenience/array.py#L936
from pyvista.
Hey @tkoyama010, I saw your 👍 on the issue and just wanted to check; can I take that as an endorsement of the idea/you being open to merging a PR that implements this?
Sorry for the direct tag; I just want to be sure before I spend any time working on making this happen.
Thanks!
from pyvista.
This is high up there on my wish list and I'm happy to help you make this happen in pyvsita!
@MatthewFlamm makes a great point that we are mostly limited by what the upstream VTK readers can handle.
Some native VTK readers support the ReadFromInputStringOn
option, specifically the XML VTK formats. Here is a routine that will read those files from S3 by fetching the file contents and passing along to the reader directly:
def read_xml_from_s3(uri):
import pyvista as pv
import fsspec, s3fs
from vtkmodules import vtkIOXML
readers = {
"vti": vtkIOXML.vtkXMLImageDataReader,
"vts": vtkIOXML.vtkXMLStructuredGridReader,
"vtr": vtkIOXML.vtkXMLRectilinearGridReader,
"vtu": vtkIOXML.vtkXMLUnstructuredGridReader,
"vtp": vtkIOXML.vtkXMLPolyDataReader,
}
fs = fsspec.filesystem('s3')
ext = uri.split('.')[-1]
try:
reader = readers[ext]()
except KeyError:
raise KeyError(f"Extension {ext} is not supported for reading from S3")
reader.ReadFromInputStringOn()
with fs.open(uri, 'rb') as f:
reader.SetInputString(f.read())
reader.Update()
return pv.wrap(reader.GetOutput())
import pyvista as pv
mesh = read_xml_from_s3("s3://pyvista/examples/nefertiti.vtp")
However, we can't do this for any other VTK readers as far as I am aware, leaving us with needing to write to a temporary file for formats like OBJ. Generally in my experience this is fine (just maybe don't do this for massive datasets). So perhaps a full solution is just some sort of helper routine like the following if the data path/URI is an s3://
path or non-local path:
def read_from_s3(uri):
"""Read any mesh file from S3."""
import os
import pyvista as pv
import fsspec, s3fs
import tempfile
fs = fsspec.filesystem('s3')
basename = os.path.basename(uri)
with tempfile.NamedTemporaryFile(suffix=basename) as tmpf:
with fs.open(uri, 'rb') as rf, open(tmpf.name, 'wb') as wf:
wf.write(rf.read())
return pv.read(tmpf.name)
import pyvista as pv
mesh = read_from_s3("s3://pyvista/examples/nefertiti.obj")
from pyvista.
Hey @banesullivan, thank you for the detailed write up!
I’m new to pyvista and 3D data like this in general, but given I had a need to read data from S3 I thought I’d use this as an opportunity to learn more about it.
I thought I’d write up a short summary of what I’ve found so far this morning, and if you have the capacity I’d love some guidance on what to look at next.
I'm not trying to put any obligation on you here, please feel free to totally ignore this comment
At the very least, writing this up will help clarify my own thoughts.
Naive summary of Pyvista
Pyvista is a Pythonic interface to VTK.
Under the hood it makes use of many readers written in the core VTK project. e.g. this CGNSReader class is "just" a wrapper around this class. Very few of these (as you listed) support being passed the file contents directly, and instead want a filepath that they themselves load from.
Pyvista also makes use of meshio
to read formats that VTK doesn’t natively support. Meshio does appear to support being passed a buffer, which could then make use of fsspec
's OpenFile
objects.
Approach for introducing fsspec/remote file reading
Based on the structure of fileio.py
s read
method, I took at look at first seeing if read_meshio
can take a file handle as a first 'easy' step. As mentioned above, it contains a _read_buffer()
method which in theory should support this.
When trying this diff:
def read_meshio(filename, file_format=None):
# ...
try:
import meshio
except ImportError: # pragma: no cover
raise ImportError("To use this feature install meshio with:\n\npip install meshio")
- # Make sure relative paths will work
- filename = str(Path(str(filename)).expanduser().resolve())
- # Read mesh file
- mesh = meshio.read(filename, file_format)
+ with fsspec.open(filename, 'rb') as f:
+ mesh = meshio.read(f, filename.ext[1:] if file_format is None else file_format)
return from_meshio(mesh)
Running tests/test_meshio.py::test_meshio
fails, with [Errno 2] No such file or directory: '<fsspec.implementations.local.LocalFileOpener object at 0x167bf3d90>’
.
Investigating this shows that meshio's VTUReader
in _vtu.py
stringifies the filename
passed in to the xml tree reader, despite it being happy taking a filename or file object.
From my uninformed perspective this looks like a bug, but I'm aware of how little context I have of this domain and usecase.
It also made me doubt the feasibility of me making a "simple" change that would facilitate trasparent reading of s3://
and other remote URIs.
Thinking of how to continue
Given your comment about how only a subset of readers would support being passed through and your provided snippets, would you prefer:
- updating the
read()
method to handle this internally, entirely transparent to the user- this appears doable but would be non-trivial and potentially messy
- introducing a new method to
fileio.py
similar to the one(s) you shared, which the user has to expressly call if the data is on a remote source, something like:
def read_remote_data(remote_uri):
if remote_uri.file_extension in LIST_OF_SUPPORTED_READERS:
... # fssspec.open(), reader.SetInputString() etc.
else:
... # copy file to local tmpdir and read in from there
from pyvista.
This would be very cool!
from pyvista.
Related Issues (20)
- Reduce size of documentation build or modify actions to use less disk space
- AttributeError: module 'colorcet' has no attribute 'cm' HOT 8
- `add_field_data` breaks `_repr_html_` HOT 1
- pyvista cannot be installed properly (multiple approaches tried)
- Interpolation of scaler field in a 20-node brick element seems incorrect
- Add a figure of CellTypes to explain the shape and the order of node in `pyvista.CellType` section HOT 2
- Reading step CAD model, without relying on file conversion HOT 4
- Type checking failing with usage of wraps
- Support VTK named colors
- `MultiBlock` self-mutates during iteration and breaks `zip` HOT 3
- jupyter trame backend gives different displays under server and client modes. HOT 3
- Include end bound in `pv.voxelize()` and `pv.voxelize_volume()` to avoid cropping the mesh HOT 4
- Unify `extract_surface` and `extract_geometry` into a single filter
- I change the input parameter n_zlabels being 10 in plotter.show_bounds(). It does not work. It seems show six labels at most.
- Plotting issue with non-triangulated concave polygons HOT 5
- Unwanted diagonal lines on quad mesh HOT 1
- align function does not return transformation matrix HOT 2
- Background Color Not Rendering in QtInteractor When Used in Frameless PyQt Window HOT 4
- Add a `connectivity` filter based on shared faces HOT 3
- trame jupyter backend shows 404: Not Found HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyvista.