GithubHelp home page GithubHelp logo

Comments (5)

MatthewFlamm avatar MatthewFlamm commented on June 16, 2024 1

This seems like a good idea, but my experience is that some (many?) VTK readers are not happy with non-string based paths or direct data being passed in binary/string form. If PyVista can transform user input to what VTK expects, it makes sense to me, particularly if we do not have to add any dependencies.

from pyvista.

krishanbhasin-px avatar krishanbhasin-px commented on June 16, 2024

Hey @tkoyama010, I saw your 👍 on the issue and just wanted to check; can I take that as an endorsement of the idea/you being open to merging a PR that implements this?

Sorry for the direct tag; I just want to be sure before I spend any time working on making this happen.

Thanks!

from pyvista.

banesullivan avatar banesullivan commented on June 16, 2024

This is high up there on my wish list and I'm happy to help you make this happen in pyvsita!

@MatthewFlamm makes a great point that we are mostly limited by what the upstream VTK readers can handle.

Some native VTK readers support the ReadFromInputStringOn option, specifically the XML VTK formats. Here is a routine that will read those files from S3 by fetching the file contents and passing along to the reader directly:

def read_xml_from_s3(uri):
    import pyvista as pv
    import fsspec, s3fs
    from vtkmodules import vtkIOXML
    readers = {
        "vti": vtkIOXML.vtkXMLImageDataReader,
        "vts": vtkIOXML.vtkXMLStructuredGridReader,
        "vtr": vtkIOXML.vtkXMLRectilinearGridReader,
        "vtu": vtkIOXML.vtkXMLUnstructuredGridReader,
        "vtp": vtkIOXML.vtkXMLPolyDataReader,
    }
    fs = fsspec.filesystem('s3')
    ext = uri.split('.')[-1]
    try:
        reader = readers[ext]()
    except KeyError:
        raise KeyError(f"Extension {ext} is not supported for reading from S3")
    reader.ReadFromInputStringOn()
    with fs.open(uri, 'rb') as f:
        reader.SetInputString(f.read())
    reader.Update()
    return pv.wrap(reader.GetOutput())
import pyvista as pv
mesh = read_xml_from_s3("s3://pyvista/examples/nefertiti.vtp")

However, we can't do this for any other VTK readers as far as I am aware, leaving us with needing to write to a temporary file for formats like OBJ. Generally in my experience this is fine (just maybe don't do this for massive datasets). So perhaps a full solution is just some sort of helper routine like the following if the data path/URI is an s3:// path or non-local path:

def read_from_s3(uri):
    """Read any mesh file from S3."""
    import os
    import pyvista as pv
    import fsspec, s3fs
    import tempfile
    fs = fsspec.filesystem('s3')
    basename = os.path.basename(uri)
    with tempfile.NamedTemporaryFile(suffix=basename) as tmpf:
        with fs.open(uri, 'rb') as rf, open(tmpf.name, 'wb') as wf:
            wf.write(rf.read())
        return pv.read(tmpf.name)
import pyvista as pv
mesh = read_from_s3("s3://pyvista/examples/nefertiti.obj")

from pyvista.

krishanbhasin-px avatar krishanbhasin-px commented on June 16, 2024

Hey @banesullivan, thank you for the detailed write up!

I’m new to pyvista and 3D data like this in general, but given I had a need to read data from S3 I thought I’d use this as an opportunity to learn more about it.

I thought I’d write up a short summary of what I’ve found so far this morning, and if you have the capacity I’d love some guidance on what to look at next.

I'm not trying to put any obligation on you here, please feel free to totally ignore this comment
At the very least, writing this up will help clarify my own thoughts.

Naive summary of Pyvista

Pyvista is a Pythonic interface to VTK.

Under the hood it makes use of many readers written in the core VTK project. e.g. this CGNSReader class is "just" a wrapper around this class. Very few of these (as you listed) support being passed the file contents directly, and instead want a filepath that they themselves load from.

Pyvista also makes use of meshio to read formats that VTK doesn’t natively support. Meshio does appear to support being passed a buffer, which could then make use of fsspec's OpenFile objects.

Approach for introducing fsspec/remote file reading

Based on the structure of fileio.pys read method, I took at look at first seeing if read_meshio can take a file handle as a first 'easy' step. As mentioned above, it contains a _read_buffer() method which in theory should support this.

When trying this diff:

def read_meshio(filename, file_format=None):
# ...
    try:
        import meshio
    except ImportError:  # pragma: no cover
        raise ImportError("To use this feature install meshio with:\n\npip install meshio")

-    # Make sure relative paths will work
-    filename = str(Path(str(filename)).expanduser().resolve())
-    # Read mesh file
-    mesh = meshio.read(filename, file_format)
+    with fsspec.open(filename, 'rb') as f:
+        mesh = meshio.read(f, filename.ext[1:] if file_format is None else file_format)
    return from_meshio(mesh)

Running tests/test_meshio.py::test_meshio fails, with [Errno 2] No such file or directory: '<fsspec.implementations.local.LocalFileOpener object at 0x167bf3d90>’.

Investigating this shows that meshio's VTUReader in _vtu.py stringifies the filename passed in to the xml tree reader, despite it being happy taking a filename or file object.

From my uninformed perspective this looks like a bug, but I'm aware of how little context I have of this domain and usecase.

It also made me doubt the feasibility of me making a "simple" change that would facilitate trasparent reading of s3:// and other remote URIs.

Thinking of how to continue

Given your comment about how only a subset of readers would support being passed through and your provided snippets, would you prefer:

  • updating the read() method to handle this internally, entirely transparent to the user
    • this appears doable but would be non-trivial and potentially messy
  • introducing a new method to fileio.py similar to the one(s) you shared, which the user has to expressly call if the data is on a remote source, something like:
def read_remote_data(remote_uri):
    if remote_uri.file_extension in LIST_OF_SUPPORTED_READERS:
        ... # fssspec.open(), reader.SetInputString() etc.
    else:
        ... # copy file to local tmpdir and read in from there

from pyvista.

user27182 avatar user27182 commented on June 16, 2024

The intern package has an API which may be a helpful reference for implementing this feature in PyVista. The intern package is used for working with really big datasets. For example, this remote dataset from bossdb
https://bossdb.org/project/maher_briegel2023, is read with the following API:

# Import intern (pip install intern)
from intern import array

# Save a cutout to a numpy array in ZYX order:
channel = array("bossdb://MaherBriegel2023/Lgn200/sbem")
data = channel[30:36, 1024:2048, 1024:2048]

See the implementation code for intern.array here:
https://github.com/jhuapl-boss/intern/blob/15073c6eed12e1372e2d0448ed1e874df827b3ba/intern/convenience/array.py#L936

from pyvista.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.