remotestorage / spec Goto Github PK
View Code? Open in Web Editor NEWremoteStorage Protocol Specification
Home Page: https://tools.ietf.org/html/draft-dejong-remotestorage
remoteStorage Protocol Specification
Home Page: https://tools.ietf.org/html/draft-dejong-remotestorage
Can we define that the server MAY (MUST?) interpret % encoded names as UTF8 (which is what https://tools.ietf.org/html/rfc3986 RFC 3986 recommends), and that it MAY (MUST?) then return these strings in the JSON as UTF-8 characters (and the JSON MUST be UTF8 or ASCII and not another encoding like UTF16).
Also that the client must not provide a % coded URL that is invalid UTF8 or will get 400 error.
We cannot create a system that does not allow item names that are not suitable for most people to use as filenames. We could just say everything round trips as % encoded, but then for most people the file name on their server will be unreadable in their language, as will the listings they get from the client unless they specifically unencode in the client (but as JSON supports unicode this seems a worse solution).
This also means that you can have files with spaces etc without % signs.
A friend of mine had a conversation with Dropbox about their JS lib and if they'd consider adding remoteStorage support. Here's part of the response:
Subject: Re: dropbox.js Pull Request #51
Date: ############################
From: Victor Costan ################[...]
Regarding remotestorage-00, I wouldn't expect it to be implemented any
time soon. Dropbox API development is limited by engineering manpower,
so for this to be implemented, it would have to bring more benefits
than other features in the pipeline. The only convincing argument in
favor of remotestorage-00 would be "there are a bunch of apps that can
use this today". I know some of the stuff in the pipeline and, believe
me, they have much better stories :)When the time comes to implement a standard, I hope Dropbox will have
someone involved with the standard process, and I hope that person
will have some decent experience with building apps. I read
remotestorage-00, and I think it's not based on real-world experience.
It only supports one conflict resolution strategy (most recent write
wins), the access model doesn't seem amenable to extensions, and it
doesn't include any support for staying up-to-date with server
changes, such as /delta. I think these place serious limitations on
the kinds of apps that can be built based on this API, so I hope it
will be improved before/as it gets adopted.I'll be sure to keep an eye on this though. Thank you very much for
telling me about it!Victor
Like (optionally) supporting SPDY it also makes sense to mention support for HTTP range requests/responses for GET requests.
If this is supported, then for private files also the ?access_token=foo query parameter needs to be supporting when doing GET requests.
Use case: video and audio playback.
http://tools.ietf.org/html/rfc2616
http://tools.ietf.org/html/draft-ietf-httpbis-p5-range-25
What is the rationale for having versions in directory listings? I can't see a use case off hand.
The issues seem to be:
this draft is in its third iteration now http://tools.ietf.org/html/draft-daboo-aggregated-service-discovery-03
i'm not sure whether it adds anything that webfinger doesn't already provide itself. the main goal of this spec seems to be to add for instance icons for service providers. it doesn't really go into the actual end-point descriptors a lot. it just says uri / host, ?port, and ?ssl and ?auth for that. also, it introduces a ttl fields which seems odd.
anyway, just wanted to make sure we had a ticket about this
The server SHOULD NOT expire bearer tokens unless they are revoked, and MAY require the user to register applications as OAuth clients before first use; if no client registration is required, then the server MAY ignore the client_id parameter in favour of relying on the redirect_uri parameter for client identification.
The OAuth 2.0 specification says:
expires_in RECOMMENDED. The lifetime in seconds of the access token. For example, the value "3600" denotes that the access token will expire in one hour from the time the response was generated. If omitted, the authorization server SHOULD provide the expiration time via other means or document the default value.
So my suggestion is to not go against spec recommendations. It may make sense to say: may be valid for 8 hours, or 24 hours, or a week.
I wonder why GET requests "SHOULD be responded with the full document contents", instead of MUST. The same for directory listings (here)
i think there is a way, something like "If-None-Match: *" or similar, to say that a request should fail if there is anything there. this is useful to avoid race conditions when creating new documents (the first time you PUT a document).
CouchDB requests always fail if a document exists and you do not provide its currently existing version in your PUT. i think we should allow non-conditional requests, but at the same time provide a way to properly cover all possible race conditions, and right now 'virgin PUTs' are still impossible to get right (unless you use long random item names that are unlikely to clash)
"and an empty folder MUST NOT be listed as an item in its parent folder."
This can never happen right? If you delete a document from a folder and it was the last document the folder should also be deleted.
Do you also want to support the case where there are empty folders and then just not list them?
There is no context in directory listings, so if you download a directory listing you cannot retrieve the documents in it unless you also have written down its full URL. Either requiring a baseurl in the listing, or providing full URLs for the contents would work eg
{
"baseurl": "http://example.com/public",
"listing":{
"file": "abcdef"
}
}
or
[
"file": {"version": "abcd", "url", "http://example.com/public/file"}
]
(also how about a link to the parent folder)
The RemoteStorage 01 spec describes sessions with a strict dependency on one particular authorisation mechanism, OAuth2's Implicit Grants. Such a directed choice should either be loosened or explained, I think. There may be solid reasons to make a deliberate choice based on the problem domain at hand -- but none I can think of.
One specific form of authorisation that is not supported (but ought to be, I think) is authorisation in the backend, away from the web sphere. If we want RemoteStorage provided by various parties across the Internet, as well as do it ourselves and see network administrators run services for it, then we should give them a way to circumvent the security hazards and added delays of hopping from one secure website to another.
To spell it out: The OAuth model centralises computation stress on server-side TLS, it decentralises credentials, decentralises credential caching and introduces multiple parties who need to do a lot of talking and verifying each other. It is not ideal ;-) and specifically in many situations where we would like to see RemoteStorage, it is likely that administrators don't want this overhead on top of relatively light storage services.
An obvious alternative approach would be to stick to filesystem permissions, or ACLs which may or may not be centrally managed in a directory. This gives more reliable policy enforcement, it adds no overhead relative to file systems, and the only added responsibilities for supporting RemoteStorage is authenticated HTTP. Depending on the local situation and personal preference, this may not even require TLS.
Imagine how simple it then becomes to serve RemoteStorage on a trusted local network: Simply layer the HTTP protocols over a local filesystem or Samba share, incorporate something as lightweight as Kerberos authentication over HTTP, and after registering it for webfinger your RemoteStorage is ready to go! This would even be possible in an embedded/router environment, which already offer Samba-mounts for USB-sticks. Current RemoteStorage is not nearly as likely in that sort of environment.
In short, I think the strict dependency on OAuth and zooming in on Implicit Tokens is too strict, and could easily hold back the success of the RemoteStorage concept. Both simpler and more complex schema's are likely to be useful to someone -- and should be supported in the spec and, hopefully at some point, in the JavaScript toolkit.
for clarification, especially also of the URI encoding, the CORS headers, the ETags, and everything together
because parameter can imply that we're talking about the query parameter instead of the fragment here.
would be good to add an example of storage-first, showing also how the fragment is made up of parameters and how OAuth scopes are joined as a spaces-separated list
A friend of mine wrote:
i was wondering if remoteStorage should not have a provision for 'soft delete'. Currently you only have a 'hard delete' option, which is good for people that don't break into a sweat using rm / on the command line. Normal users want to be able to do an undelete at the very minimum, if not have zfs-like snapshots where they can go back to a state at a certain time. If remoteStorage wants to replace clouddrives, and clouddrives want to replace my regular drive, that is pretty much the expectation to live up to.
If I am working on something that matters to me and I am using more or less experimental tools or tools from different sources (say todo and timetracker), it would at the very least make sense to me to have a sort of 'garbage bin' where data is pushed to for possible undeleting - so I can see where things went wrong. Losing data is one of the reasons people want cloud services, so remoteStorage should be sensitive to that fact...
I don't think we should add any undo verb or trashbin the protocol, but i do think he has a point, and we should remark on this in the spec, for instance, servers MAY offer an interface where a user can roll back the content of their account content to a previous versions.
i myself would probably do this by committing snapshots to git (should work out how that works if the data itself contains .git/ dirs), and then you know you can always go into the server and recover any and all previous versions with git log
and git checkout
i'm against Access Control Lists! sharing in remoteStorage should work like git: you publish your version, and send pull requests to the people you collaborate with.
having said that, i think it's fair to say that this decision is what is currently costing us most market share. people will say "oh, but i need sharing/collaboration", and develop their unhosted web app for the Dropbox platform or the GoogleDrive platform instead (which both do have sharing/collaboration built in). see http://community.remotestorage.io/t/collaboration-through-public-sharing/84/8?u=michielbdejong for a discussion of how this affects app development.
it's worth noting that this would clash with #39 (comment) because it would involve giving people outside your Kerberos domain access to documents on your account
Currently, my implementation has for Access-Control-Allow-Headers
: "Authorization, If-None-Match, Content-Type, Origin, ETag"
And for Access-Control-Allow-Methods
: "GET, PUT, DELETE, HEAD, OPTIONS"
Is that complete? Redundant?
Hi,
As root:rw
and root:r
are confusing to request full access to the storage and could also be used to indicate the name of a category/module/folder it makes sense to use something else here.
We propose to use *:rw
and *:r
instead. The *
is a valid scope token, and not a valid category/module/folder name, so this solves the issue of possibly falling back to :rw
and :r
by mistake.
Hello,
After thinking about the mechanism of authorisation, I've turned to the conceptual authorisation model (with :r, :w, :rw privileges and a distinction into public space) and I was wondering if it isn't too strongly focussed on personal applications. It is open for discussion if business applications of remoteStorage make any sense, of course. It might be possible to simplify the spec (!) and leave more freedom to the remoteStorage implementer/hoster.
It is common to distinguish CRUD (create, read, update, delete) privileges, so a user administrator is able to create a resource that users can read and/or edit, but not delete. This is not possible in the current model. The implicit creation (and deletion?!?) of folders may not be equally suitable at every roll-out, and might be a default policy (resource can create/delete its own folders) that can be adapted by administrators. I suspect you want the ability to silently add an application, with data folders underneath that could be helpful for users to edit -- distinctions like these could be configured by the authorisation policy administrator.
Another authorisation practice in business that is not foreseen in remoteStorage is group-wise access. This is basically a generalisation of the public URIs that you specified -- you could define a public group to do just what you describe for public URIs. And another group to define specwriters. And another to describe... This degree of flexibility enables groups to co-operate on a document through remoteStorage, and that is a useful application of the protocol in business use!
Interestingly, the best way to enable these facilities might not be to expand the specification, but rather to remove overly specific parts. I've also argued from another angle that authorisation is too specific -- in fact, I would propose not to mention the {r,w} rights or the public paths in the spec, unless a JS-app needs to be able to rely on them and has no REST-mechanism to find this information. I also wonder if explicitly creating and deleting folders is too much effort for a JS-app; doing this explicitly means that it can also be offered to users in the interface they use. For an implementation it is a big duh that choices like these are needed, but it sounds to me like an issues shared between the resource server, the authorisation setup and the customer. The remoteStorage spec is mainly of interest to the relation between the JS-app and the REST-store, and I wonder if the possible authorisation privileges and their enforcement should be part of that spec?
Just hoping to stir up improvements, in the hope that it is useful.
Schrijven is schrappen :-)
Cheers,
-Rick
Can we
Clearly there needs to be an optional public mechanism, but some servers will be all or mostly public and some not at all, so it should be set by policy no convention.
Will think of some ways in which it can be discovered.
Whenever a "new" user logs in to the remoteStorage server their user directory may not exist yet and will only be created on the first PUT request. So all request prior to a PUT will fail on the remoteStorage server.
The spec says that only PUT (and DELETE) requests are able to create/delete folders.
Is it really the intention that folders do not need to exist at all, not even the "root" folder of the user? That is what my implementation does right now. It also only creates the "root" folder on the first PUT request.
we use 'folder' 48 times and 'directory' 5 times to mean the same thing.
in 2012-04
they were still called directories, but starting with the -00 we are using the term folder, so let's then use it consistently.
There was a discussion about this in #6 that more or less ended with the decision to
brush this under the carpet until -02.
Now, the spec states
Whether or not a document and a directory with matching names can
exist side-by-side in the same directory, is left undefined. This
means a server MAY fail with a 412 response code if a PUT request
would result in such a document-with-directory name clash. Clients
are therefore encouraged to choose directory names that avoid
clashing with document names and vice versa.
Leaving it undecided like this is not very satisfactory. We should make a decision and stick to that.
I understand that Linux FS based servers have problems to allow for matching names. So my suggestion would be to disallow it by the spec.
Also I don't understand why a 412 response code is used here. There is no precondition that is failing. I think we should use 409 instead.
it still says 'December 2012'
http://community.remotestorage.io/t/rs-desktop-data-browser/167/9
see https://github.com/settings/applications for an example
this is pretty trivial for storage providers to implement (especially if they already implement storage-first), and very useful for debugging, scripting, headless hosted apps, and desktop apps.
this would also resolve remotestorage/remotestorage.js#444 / litewrite/litewrite#195 i think
Upon reading the spec, while preparing to implement remoteStorage, I was somewhat unpleasantly surprised to find that apparently all files stored in a remoteStorage endpoint are publicly accessible by default.
The spec suggests to "use unguessable file names", which seems to be a textbook example of security through obscurity, and has the following issues:
The solution to this seems simple to me: a 'public' flag. The Content-Type should already be stored separately from the data itself (according to the spec anyway), so it should be trivial to have a separate 'public' flag be stored alongside the file in a similar manner. Whether the storage gateway would serve a file over the unauthenticated /public/ interface, would then depend on whether the 'public' flag for the file were enabled.
Is there something I'm missing here, or was this simply an oversight in the spec?
EDIT: Briefly after posting this, somebody else pointed out that they interpreted the spec as saying that there is a separate /public/ endpoint, and that you'd have to explicitly store files there to make them publicly accessible. I'm not sure myself, so could you confirm this? If that is indeed the case, ignore all of the above :)
I think there's a typo in the spec on line 274. It includes a question mark in a storage URL example:
https://example.com/some?path/to/storage
The spec speaks of "credentials" but has more been tried than the username / password model that the entire authentication world wants to get away from?
The strict reliance on bearer tokens gives me the feeling that no desires exist to join this useful movement. Note that these tokens are so incredibly weakly protected that they can only run over TLS -- and leave your security at the mercy of the browser's HTML environment, which has a track record of playing tricks on its users as a result of its flexible/pluggable nature. The problem being that these tokens grant permissions by just being shown -- they act like passwords.
There are more potent authentication mechanisms that would simplify the use of RemoteStorage and at the same time improve security, by performing authentication in the context outside the webpages' HTML and JavaScript:
Please, do not constrain the specification to just a single form of authentication, and test if it can do the other mechanisms as well. If pragmatic choices have to be made to get work done, then please keep the specification as general as possible -- and provide a versioned JavaScript library to implement the subset that you can currently handle, and prepare for adding these other mechanisms.
The web is held back by people sticking to the lowest common denominator, and RemoteStorage is such a powerful idea that it could make a really loud call for proper authentication integration with browsers / users.
The storage provider needs a way of notifying the client, that it cannot store anymore data, because the user isn't allowed to store more data (usually referred to as hitting the "quota").
WebDAV defines the status code 507 for that, which we could reuse.
Julian Reschke suggested we should include the quotes on the ETag in the folder listing.
i guess then we should also quote the Content-Length header as a string, not as a number.
i think since we only use strong ETags, we can go either way with this. it's only a syntax issue, doesn't affect how things work, we just have to agree on one way of doing it :)
current:
"items": {
"abc": {
"ETag": "DEADBEEFDEADBEEFDEADBEEF",
"Content-Type": "image/jpeg",
"Content-Length": 82352
}
"def/": {
"ETag": "1337ABCD1337ABCD1337ABCD"
}
}
alternative:
"items": {
"abc": {
"ETag": "\"DEADBEEFDEADBEEFDEADBEEF\"",
"Content-Type": "image/jpeg",
"Content-Length": "82352"
}
"def/": {
"ETag": "\"1337ABCD1337ABCD1337ABCD\""
}
}
this would be more compatible with the W/"abcdabcdabcd"
syntax of weak ETags (even though we don't support those)
I guess it should be http://www.w3.org/TR/cors/ instead?
Consider using BNF to specify the allowed characters in path names:
%x25 / %x2D / %x2E / %x30-39 / %x41-5A / %x5F / %x61-7A
see http://www.rfc-editor.org/rfc/rfc7033.txt it's now an rfc. should refer to that in -02 and follow the final property format.
there is a practical limit of about 2000 characters (http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers ) for module names and other strings that the spec deals with, but we're not saying anything about this.
in practice this limit will probably never come into play, but we could say something like servers may reply with a 500 error if a scope uses more than 10 modules, or if any of the module names in there exceeds 50 characters.
we should either explicitly allow this or explicitly forbid this. i thought we were explicitly forbidding it, but i searched through the spec and couldn't find anything about this
we introduced these as SHOULDs now, we can make them MUSTs in -03
https://github.com/remotestorage/spec/blob/master/draft-dejong-remotestorage-02.txt#L168-L169
i'm not sure if this is useful, it's extra text and extra code that serves no purpose other than "playing by the book". still, it's nice to be seen to play by the book...
postponing this to version -02
in line https://github.com/remotestorage/spec/blob/master/draft-dejong-remotestorage-02.txt#L231 the {}
should now be
{ "@context": "http://remotestorage.io/spec/folder-description", "items": {} }
with the new folder description format.
https://www.ietf.org/mail-archive/web/apps-discuss/current/msg10931.html
the commenters in that thread are obviously coming to if from a different angle, but their point is interesting.
maybe we should say that we are defining an API rather than a protocol?
Instead of only allowing "Authorization: Bearer xyz"
it should also be made possible to use ?access_token=xyz
token parameter. This is needed when embedding images, audio, video in a page without needing to use XMLHttpRequest to fetch the blob first before displaying.
App that needs this: music player.
Remote storage defines version numbers as "The current version of a document is the 13-digit decimal number representing the number of milliseconds between 00:00 UCT, 1 January 1970, and the last time its content or content type were set or changed successfully." Is there any reason why a version number has these requirements? The specification only uses the version number as an opaque identifier, so requiring that it must be a strong etag seems to be enough.
There should be a reference pointing to https://tools.ietf.org/html/rfc4627.
as suggested by @mnot (personal communication), we should discuss, from a Computer Science point of view, how people can use the ETag
, If-Match
and If-None-Match
headers to implement distributed versioning, and how this works when clients have been making changes while they were offline, and run into conflicts when they come online again. i'll draft a paragraph about this.
this causes the problem described in remotestorage/rs-serve#1
we should see if this bug is serious enough to fix it and publish -02 now, instead of leaving it broken until December.
for remotestorage.js it's probably not a problem, but it sure is ugly.
Right now I have no way to find out the size or MIME type of a resource, without retrieving the entire resource itself. Similarly the only way to find out the most recent version of a resource is to retrieve the parent directory.
So I'd like to propose that we require support for the HEAD verb:
A successful HEAD request to any valid resource MUST result in either a "200 OK"
or a "404 Not Found" response (based on whether it exists or not), including at least
the "ETag" and "Content-Type" headers. For documents the response MUST also
include the correct "Content-Length" header. For folders the "Content-Length" MAY
be omitted.
path or unrecognized http verb, etcetera), as well as for
all PUT and DELETE requests to folders,'''
For unrecognized verbs I think a 405 should be returned ("method not allowed")
``` \* 401 for all requests that don't have a bearer token with
sufficient permissions,'''
This is "out of scope" here, it can be listed but is actually part of the Bearer token spec. That one states that for missing tokens or invalid tokens a 401 should be returned, and a 403 when there is not sufficient permission.
in https://github.com/remotestorage/spec/blob/master/draft-dejong-remotestorage-02.txt#L478-L487 we say an open web app manifest may contain a remotestorage
fields listing data scopes an app will be using.
given recent work on https://wiki.mozilla.org/WebAPI/DataStore by @bakulf and others, we should change this to match their format:
{
...
datastores-access: {
"contacts": {
"access": "readonly",
"description": ...
}
},
...
}
I think the JSON format of directory listings can (and should) be improved. From the current spec, a directory is listed like this:
{
"abc": "DEADBEEFDEADBEEFDEADBEEF",
"def/": "1337ABCD1337ABCD1337ABCD"
}
The problems I see with this are, in no particular order:
Therefore I'd like to suggest a response format like this:
{ "contents" :
[ { "name" : "abc", "tag" : "DEADBEEFDEADBEEFDEADBEEF" }
, { "name" : "def/", "tag" : "1337ABCD1337ABCD1337ABCD" }
]
}
The proposed format
HEAD
requests for determining the file size. While I strongly support this proposal, I think issuing numerous HEAD
requests purely for determining the file sizes would, because of the necessary server roundtrips, be slower than it should be.I don't consider my proposal "final", e.g. if it's called contents
or items
or whatever is not the scope of this post, it's just about the general structure. Maybe it's a good idea to add a version
property to the top-level object, I don't really have an opinion about this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.