GithubHelp home page GithubHelp logo

Comments (5)

belldandyxtq avatar belldandyxtq commented on June 13, 2024

Thank you for reporting and finding the root cause.

If we remove the wrapper, what is your plan on supporting pickle on HDFS?

There are a few cases we need to replace the original file object to support some functionalities. For example, reading as text from HDFS and zip

And the internal zip for Python < 3.7

I am thinking about making the functionality of __init__ of file object, which is currently used to determine the proper file object type for opening, to a separate function to be called by open decorator.

In this way, we can for now get rid of the following I/O calls going to file object wrapper while still having the ability to solve the issues like text read on HDFS and internal zip

from pfio.

belldandyxtq avatar belldandyxtq commented on June 13, 2024

To make it more clear, I am thinking about creating a file_object_maker as a replacement of __init__ in the current file_object.py, which is used to cover those cases where some functionalities are not supported by the original file objects like the text read cases in HDFS and zip.

The currently implementation of open_wrapper returns the FileObject defined in fileobject.pyor its derived classes. And the file object replacement takes places in such __init__ of the FileObject, hence the replacement is bound to the file object.

If we extract the __init__ from the FileObject and make it a function (e.g. file_object_maker), we can still have the ability to control which kind of file object to return to user while not returning a always-wrappered FileObject to workaround the issues we have now.

from pfio.

kuenishi avatar kuenishi commented on June 13, 2024

Wrapping where we need it is fine, but FileObject in fileobject.py is unnecessary for now especially for "posix" filesystem and zip container. This is maybe because aligning other filesystems and file objects' behaviour to "posix" is the best way to prevent potential performance issues like this, just until profiler.

from pfio.

belldandyxtq avatar belldandyxtq commented on June 13, 2024

I think we need to wrap zip container for text reading and internal zip.
The only two don't need wrapper are POSIX and HTTP.

And how about pickle on HDFS? an simple plan will be extracting the content and putting into io.bufferedreader in the file_object_maker

from pfio.

kuenishi avatar kuenishi commented on June 13, 2024

This issue for POSIX filesystems is addressed by #38 .

from pfio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.