GithubHelp home page GithubHelp logo

set_root() argument about pfio HOT 8 CLOSED

pfnet avatar pfnet commented on June 3, 2024
set_root() argument

from pfio.

Comments (8)

belldandyxtq avatar belldandyxtq commented on June 3, 2024

Argument '' is not a path but no error reported

This is not documented, but '' works as a reset to the current working directory.

Argument 'foo' is supposed to mean relative local directory regarding "root" path or from current working directory?

Argument 'foo' is supposed to mean from current working directory

Argument 'hdfs' is supposed to mean relative path of local directory, but it seems it's understood by ChainerIO as hdfs:/// URI.

As the set_root accept the scheme, https://chainerio.readthedocs.io/en/latest/reference.html#chainerio.set_root
"hdfs" is recognized as scheme. If the hdfs refers to local directory, file://hdfs is the correct argument.

from pfio.

kuenishi avatar kuenishi commented on June 3, 2024

It should be wrong that 'foo' is relative path while 'hdfs' is understood as scheme, regarding URI specification. I would say this is a bug. Also, '' should not be current directory. No path specification does not define such behaviour, thus no user would expect it.

from pfio.

kuenishi avatar kuenishi commented on June 3, 2024

Maybe I had reviewed that documentation spec of the argument of set_root() before, I changed my mind that we can make it more natural.

from pfio.

belldandyxtq avatar belldandyxtq commented on June 3, 2024

The set_root() currently is a combination of create_handler, root = xxx and set_default_handler, and I think the combination is what causes the confusion here. If the behavior of recognize the "hdfs" as the scheme in set_root is unnatural for you, then I would suggest that we split the functionality of this set_root, and make it create_handler (which exists), handler.root = (which exists), and set_default_handler.

The later one seems more natural to me as well, what do you think? @kuenishi

from pfio.

kuenishi avatar kuenishi commented on June 3, 2024

I don't think that is a good idea, because it forces users to rewrite current one line of Python code, replacing with three lines. My suggestion is just to stop accepting a string that represents scheme, but to accept only URI (or URL, whatever). That removes confusion to users, and even removes unclearness of what .root should be when scheme is given. Unclearness hereby, means that an empty string can't let users imagine it's a home directory for HDFS. This is because no path standards defines that an empty string is a path. But if it's URI, for example, file:/// gives explicit notion of what .root should be.

from pfio.

kuenishi avatar kuenishi commented on June 3, 2024

Accidentally found I've repeated same discussion in #86 (comment) .

from pfio.

belldandyxtq avatar belldandyxtq commented on June 3, 2024

I am trying to remove the scheme support from root, but I notice that might force users to explicitly specify the CWD.
e.g.

chainerio.set_root("file:///user/tianqi/some_dir")
# or hdfs
chainerio.set_root("hdfs:///user/tianqi/some_dir")

In case of posix, this can be avoid by supporting relative URI as you suggested in #86.

chainerio.set_root("file://")
# or 
chainerio.set_root("file://some_dir")

However, in case of hdfs, as we have the cluster name at the front of the URI, the case can be complex

# relative directory
chainerio.set_root("hdfs://some_dir")
#  cluster name
chainerio.set_root("hdfs://some_cluster")

One solution would be we check the available cluster names to distinguish between them
@kuenishi what do you think.

from pfio.

kuenishi avatar kuenishi commented on June 3, 2024

"file://some_dir" doesn't stand for relative path. Please refer to URI standard or try with your browser. In the same sense "hdfs://some_dir" does not stand for relative path.

However, it is not a good practice for users to set root by full path either in file:// or hdfs:// scheme. Setting root has turned out to be important question about how we define root or current working directory. I think we made mistake somewhere on designing the concept of root. The difficulty stems from impedance mismatch of single namespace of local filesystem and multiple remote network filesystems. "mount"ing , or "connect"ing to a remote network filesystem would solve this mismatch but it will need another line of Python code. We might choose either, a line of Python code or let users define full path instead of relative home directory.

from pfio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.