Comments (8)
Argument '' is not a path but no error reported
This is not documented, but '' works as a reset to the current working directory.
Argument 'foo' is supposed to mean relative local directory regarding "root" path or from current working directory?
Argument 'foo' is supposed to mean from current working directory
Argument 'hdfs' is supposed to mean relative path of local directory, but it seems it's understood by ChainerIO as hdfs:/// URI.
As the set_root
accept the scheme, https://chainerio.readthedocs.io/en/latest/reference.html#chainerio.set_root
"hdfs" is recognized as scheme. If the hdfs
refers to local directory, file://hdfs
is the correct argument.
from pfio.
It should be wrong that 'foo' is relative path while 'hdfs' is understood as scheme, regarding URI specification. I would say this is a bug. Also, ''
should not be current directory. No path specification does not define such behaviour, thus no user would expect it.
from pfio.
Maybe I had reviewed that documentation spec of the argument of set_root()
before, I changed my mind that we can make it more natural.
from pfio.
The set_root()
currently is a combination of create_handler
, root = xxx
and set_default_handler
, and I think the combination is what causes the confusion here. If the behavior of recognize the "hdfs" as the scheme in set_root
is unnatural for you, then I would suggest that we split the functionality of this set_root
, and make it create_handler
(which exists), handler.root =
(which exists), and set_default_handler
.
The later one seems more natural to me as well, what do you think? @kuenishi
from pfio.
I don't think that is a good idea, because it forces users to rewrite current one line of Python code, replacing with three lines. My suggestion is just to stop accepting a string that represents scheme, but to accept only URI (or URL, whatever). That removes confusion to users, and even removes unclearness of what .root
should be when scheme is given. Unclearness hereby, means that an empty string can't let users imagine it's a home directory for HDFS. This is because no path standards defines that an empty string is a path. But if it's URI, for example, file:///
gives explicit notion of what .root
should be.
from pfio.
Accidentally found I've repeated same discussion in #86 (comment) .
from pfio.
I am trying to remove the scheme support from root, but I notice that might force users to explicitly specify the CWD
.
e.g.
chainerio.set_root("file:///user/tianqi/some_dir")
# or hdfs
chainerio.set_root("hdfs:///user/tianqi/some_dir")
In case of posix, this can be avoid by supporting relative URI as you suggested in #86.
chainerio.set_root("file://")
# or
chainerio.set_root("file://some_dir")
However, in case of hdfs, as we have the cluster name at the front of the URI, the case can be complex
# relative directory
chainerio.set_root("hdfs://some_dir")
# cluster name
chainerio.set_root("hdfs://some_cluster")
One solution would be we check the available cluster names
to distinguish between them
@kuenishi what do you think.
from pfio.
"file://some_dir"
doesn't stand for relative path. Please refer to URI standard or try with your browser. In the same sense "hdfs://some_dir"
does not stand for relative path.
However, it is not a good practice for users to set root by full path either in file://
or hdfs://
scheme. Setting root has turned out to be important question about how we define root or current working directory. I think we made mistake somewhere on designing the concept of root. The difficulty stems from impedance mismatch of single namespace of local filesystem and multiple remote network filesystems. "mount"ing , or "connect"ing to a remote network filesystem would solve this mismatch but it will need another line of Python code. We might choose either, a line of Python code or let users define full path instead of relative home directory.
from pfio.
Related Issues (20)
- Path.glob has different behavior from standard pathlib.Path
- Opening a giant (exceeding 4GB~?) zip in S3 using `pfio.v2.from_url` raises "BadZipFile: Bad magic number for central directory" HOT 1
- File-like object returned from `open_url` is extremely slow with S3
- Support OpenTelemetry Instrumentation HOT 1
- Support PPE profiling
- Support PPE profiling HOT 1
- pfio.v2.lazify() may fail in case the PFIO-related context has things that can't pickle HOT 1
- Use $XDG_CACHE_HOME for file cache directory by default
- Document a tip on shutil.copyfileobj()
- `ValueError: buffer size must be strictly positive` when opening an empty file in S3 with "rb" mode
- `S3.read(-1)` for a large file (2^31+α bytes) fails due to an SSL `OverflowError` HOT 4
- Support Google Cloud Storage
- Concurrency control in sparse file cache
- Cleanup after test run; a lot of temporary files left HOT 1
- Deprecate `reset_on_fork` flag
- Drop Python 3.7 as of PEP537
- Introduce type checking with mypy
- Drop Python 3.8 as of PEP-569
- Sparse file cache blown out after `checkfork()`
- Support Python 3.12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pfio.