GithubHelp home page GithubHelp logo

remotestorage / spec Goto Github PK

View Code? Open in Web Editor NEW
87.0 87.0 5.0 461 KB

remoteStorage Protocol Specification

Home Page: https://tools.ietf.org/html/draft-dejong-remotestorage

JavaScript 11.80% HTML 88.20%

spec's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spec's Issues

i18n issues

Can we define that the server MAY (MUST?) interpret % encoded names as UTF8 (which is what https://tools.ietf.org/html/rfc3986 RFC 3986 recommends), and that it MAY (MUST?) then return these strings in the JSON as UTF-8 characters (and the JSON MUST be UTF8 or ASCII and not another encoding like UTF16).

Also that the client must not provide a % coded URL that is invalid UTF8 or will get 400 error.

We cannot create a system that does not allow item names that are not suitable for most people to use as filenames. We could just say everything round trips as % encoded, but then for most people the file name on their server will be unreadable in their language, as will the listings they get from the client unless they specifically unencode in the client (but as JSON supports unicode this seems a worse solution).

This also means that you can have files with spaces etc without % signs.

Relayed thoughts of a Dropbox.js author

A friend of mine had a conversation with Dropbox about their JS lib and if they'd consider adding remoteStorage support. Here's part of the response:

Subject: Re: dropbox.js Pull Request #51
Date: ############################
From: Victor Costan ################

[...]

Regarding remotestorage-00, I wouldn't expect it to be implemented any
time soon. Dropbox API development is limited by engineering manpower,
so for this to be implemented, it would have to bring more benefits
than other features in the pipeline. The only convincing argument in
favor of remotestorage-00 would be "there are a bunch of apps that can
use this today". I know some of the stuff in the pipeline and, believe
me, they have much better stories :)

When the time comes to implement a standard, I hope Dropbox will have
someone involved with the standard process, and I hope that person
will have some decent experience with building apps. I read
remotestorage-00, and I think it's not based on real-world experience.
It only supports one conflict resolution strategy (most recent write
wins), the access model doesn't seem amenable to extensions, and it
doesn't include any support for staying up-to-date with server
changes, such as /delta. I think these place serious limitations on
the kinds of apps that can be built based on this API, so I hope it
will be improved before/as it gets adopted.

I'll be sure to keep an eye on this though. Thank you very much for
telling me about it!

Victor

versions in directory listings

What is the rationale for having versions in directory listings? I can't see a use case off hand.

The issues seem to be:

  1. What are you going to do with the version number? Unless you have a copy of a document, its not much use for conditional GET etc. If you do you can do a HEAD request to get it
  2. It could be expensive to compute, eg a server that does not cache etags in the file system (as xattr or whatever) might have to compute them in order to provide a directory listing.
  3. It requires directories themselves to have version numbers, which may be hard to generate (eg in file system storage), and they are not that useful (as you can't PUT a directory anyway), and it would be nice to allow a server to not have directory versions visible so a simpler implementation can work.

consider adopting draft-daboo-aggregated-service-discovery

this draft is in its third iteration now http://tools.ietf.org/html/draft-daboo-aggregated-service-discovery-03

i'm not sure whether it adds anything that webfinger doesn't already provide itself. the main goal of this spec seems to be to add for instance icons for service providers. it doesn't really go into the actual end-point descriptors a lot. it just says uri / host, ?port, and ?ssl and ?auth for that. also, it introduces a ttl fields which seems odd.

anyway, just wanted to make sure we had a ticket about this

Do not "SHOULD NOT" expiring tokens

The server SHOULD NOT expire bearer tokens unless they are revoked, and MAY require the user to register applications as OAuth clients before first use; if no client registration is required, then the server MAY ignore the client_id parameter in favour of relying on the redirect_uri parameter for client identification.

The OAuth 2.0 specification says:

expires_in RECOMMENDED. The lifetime in seconds of the access token. For example, the value "3600" denotes that the access token will expire in one hour from the time the response was generated. If omitted, the authorization server SHOULD provide the expiration time via other means or document the default value.

So my suggestion is to not go against spec recommendations. It may make sense to say: may be valid for 8 hours, or 24 hours, or a week.

conditional first PUTs

i think there is a way, something like "If-None-Match: *" or similar, to say that a request should fail if there is anything there. this is useful to avoid race conditions when creating new documents (the first time you PUT a document).

CouchDB requests always fail if a document exists and you do not provide its currently existing version in your PUT. i think we should allow non-conditional requests, but at the same time provide a way to properly cover all possible race conditions, and right now 'virgin PUTs' are still impossible to get right (unless you use long random item names that are unlikely to clash)

Empty folder in directory listing

"and an empty folder MUST NOT be listed as an item in its parent folder."

This can never happen right? If you delete a document from a folder and it was the last document the folder should also be deleted.

Do you also want to support the case where there are empty folders and then just not list them?

base urls in directory listing/hypertext

There is no context in directory listings, so if you download a directory listing you cannot retrieve the documents in it unless you also have written down its full URL. Either requiring a baseurl in the listing, or providing full URLs for the contents would work eg

{
"baseurl": "http://example.com/public",
"listing":{
"file": "abcdef"
}
}

or

[
"file": {"version": "abcd", "url", "http://example.com/public/file"}
]

(also how about a link to the parent folder)

Specify for a variety of authorisation mechanisms

The RemoteStorage 01 spec describes sessions with a strict dependency on one particular authorisation mechanism, OAuth2's Implicit Grants. Such a directed choice should either be loosened or explained, I think. There may be solid reasons to make a deliberate choice based on the problem domain at hand -- but none I can think of.

One specific form of authorisation that is not supported (but ought to be, I think) is authorisation in the backend, away from the web sphere. If we want RemoteStorage provided by various parties across the Internet, as well as do it ourselves and see network administrators run services for it, then we should give them a way to circumvent the security hazards and added delays of hopping from one secure website to another.

To spell it out: The OAuth model centralises computation stress on server-side TLS, it decentralises credentials, decentralises credential caching and introduces multiple parties who need to do a lot of talking and verifying each other. It is not ideal ;-) and specifically in many situations where we would like to see RemoteStorage, it is likely that administrators don't want this overhead on top of relatively light storage services.

An obvious alternative approach would be to stick to filesystem permissions, or ACLs which may or may not be centrally managed in a directory. This gives more reliable policy enforcement, it adds no overhead relative to file systems, and the only added responsibilities for supporting RemoteStorage is authenticated HTTP. Depending on the local situation and personal preference, this may not even require TLS.

Imagine how simple it then becomes to serve RemoteStorage on a trusted local network: Simply layer the HTTP protocols over a local filesystem or Samba share, incorporate something as lightweight as Kerberos authentication over HTTP, and after registering it for webfinger your RemoteStorage is ready to go! This would even be possible in an embedded/router environment, which already offer Samba-mounts for USB-sticks. Current RemoteStorage is not nearly as likely in that sort of environment.

In short, I think the strict dependency on OAuth and zooming in on Implicit Tokens is too strict, and could easily hold back the success of the RemoteStorage concept. Both simpler and more complex schema's are likely to be useful to someone -- and should be supported in the spec and, hopefully at some point, in the JavaScript toolkit.

add remark about server-side backups

A friend of mine wrote:

i was wondering if remoteStorage should not have a provision for 'soft delete'. Currently you only have a 'hard delete' option, which is good for people that don't break into a sweat using rm / on the command line. Normal users want to be able to do an undelete at the very minimum, if not have zfs-like snapshots where they can go back to a state at a certain time. If remoteStorage wants to replace clouddrives, and clouddrives want to replace my regular drive, that is pretty much the expectation to live up to.

If I am working on something that matters to me and I am using more or less experimental tools or tools from different sources (say todo and timetracker), it would at the very least make sense to me to have a sort of 'garbage bin' where data is pushed to for possible undeleting - so I can see where things went wrong. Losing data is one of the reasons people want cloud services, so remoteStorage should be sensitive to that fact...

I don't think we should add any undo verb or trashbin the protocol, but i do think he has a point, and we should remark on this in the spec, for instance, servers MAY offer an interface where a user can roll back the content of their account content to a previous versions.

i myself would probably do this by committing snapshots to git (should work out how that works if the data itself contains .git/ dirs), and then you know you can always go into the server and recover any and all previous versions with git log and git checkout

consider adding support for sharing/collaboration

i'm against Access Control Lists! sharing in remoteStorage should work like git: you publish your version, and send pull requests to the people you collaborate with.

having said that, i think it's fair to say that this decision is what is currently costing us most market share. people will say "oh, but i need sharing/collaboration", and develop their unhosted web app for the Dropbox platform or the GoogleDrive platform instead (which both do have sharing/collaboration built in). see http://community.remotestorage.io/t/collaboration-through-public-sharing/84/8?u=michielbdejong for a discussion of how this affects app development.

it's worth noting that this would clash with #39 (comment) because it would involve giving people outside your Kerberos domain access to documents on your account

Make explicit which CORS headers/methods are needed

Currently, my implementation has for Access-Control-Allow-Headers: "Authorization, If-None-Match, Content-Type, Origin, ETag"

And for Access-Control-Allow-Methods: "GET, PUT, DELETE, HEAD, OPTIONS"

Is that complete? Redundant?

do not use root:rw and root:r as scope, use *:rw and *:r instead

Hi,

As root:rw and root:r are confusing to request full access to the storage and could also be used to indicate the name of a category/module/folder it makes sense to use something else here.

We propose to use *:rw and *:r instead. The * is a valid scope token, and not a valid category/module/folder name, so this solves the issue of possibly falling back to :rw and :r by mistake.

See: https://tools.ietf.org/html/rfc6749#appendix-A.4

Biased towards personal use?

Hello,

After thinking about the mechanism of authorisation, I've turned to the conceptual authorisation model (with :r, :w, :rw privileges and a distinction into public space) and I was wondering if it isn't too strongly focussed on personal applications. It is open for discussion if business applications of remoteStorage make any sense, of course. It might be possible to simplify the spec (!) and leave more freedom to the remoteStorage implementer/hoster.

It is common to distinguish CRUD (create, read, update, delete) privileges, so a user administrator is able to create a resource that users can read and/or edit, but not delete. This is not possible in the current model. The implicit creation (and deletion?!?) of folders may not be equally suitable at every roll-out, and might be a default policy (resource can create/delete its own folders) that can be adapted by administrators. I suspect you want the ability to silently add an application, with data folders underneath that could be helpful for users to edit -- distinctions like these could be configured by the authorisation policy administrator.

Another authorisation practice in business that is not foreseen in remoteStorage is group-wise access. This is basically a generalisation of the public URIs that you specified -- you could define a public group to do just what you describe for public URIs. And another group to define specwriters. And another to describe... This degree of flexibility enables groups to co-operate on a document through remoteStorage, and that is a useful application of the protocol in business use!

Interestingly, the best way to enable these facilities might not be to expand the specification, but rather to remove overly specific parts. I've also argued from another angle that authorisation is too specific -- in fact, I would propose not to mention the {r,w} rights or the public paths in the spec, unless a JS-app needs to be able to rely on them and has no REST-mechanism to find this information. I also wonder if explicitly creating and deleting folders is too much effort for a JS-app; doing this explicitly means that it can also be offered to users in the interface they use. For an implementation it is a big duh that choices like these are needed, but it sounds to me like an issues shared between the resource server, the authorisation setup and the customer. The remoteStorage spec is mainly of interest to the relation between the JS-app and the REST-store, and I wonder if the possible authorisation privileges and their enforcement should be part of that spec?

Just hoping to stir up improvements, in the hope that it is useful.

Schrijven is schrappen :-)

Cheers,
-Rick

/public

Can we

  1. not hardcode /public
  2. make it optional instead, as some servers will have no use for public resources and the user might accidentally put something there not understanding this English word

Clearly there needs to be an optional public mechanism, but some servers will be all or mostly public and some not at all, so it should be set by policy no convention.

Will think of some ways in which it can be discovered.

Folder do not exist on first login...

Whenever a "new" user logs in to the remoteStorage server their user directory may not exist yet and will only be created on the first PUT request. So all request prior to a PUT will fail on the remoteStorage server.

The spec says that only PUT (and DELETE) requests are able to create/delete folders.

Is it really the intention that folders do not need to exist at all, not even the "root" folder of the user? That is what my implementation does right now. It also only creates the "root" folder on the first PUT request.

consistently use 'folder', not 'directory'

we use 'folder' 48 times and 'directory' 5 times to mean the same thing.

in 2012-04 they were still called directories, but starting with the -00 we are using the term folder, so let's then use it consistently.

Decide whether to allow or disallow documents and directories with same name

There was a discussion about this in #6 that more or less ended with the decision to

brush this under the carpet until -02.

Now, the spec states

Whether or not a document and a directory with matching names can
exist side-by-side in the same directory, is left undefined. This
means a server MAY fail with a 412 response code if a PUT request
would result in such a document-with-directory name clash. Clients
are therefore encouraged to choose directory names that avoid
clashing with document names and vice versa.

Leaving it undecided like this is not very satisfactory. We should make a decision and stick to that.
I understand that Linux FS based servers have problems to allow for matching names. So my suggestion would be to disallow it by the spec.

Also I don't understand why a 412 response code is used here. There is no precondition that is failing. I think we should use 409 instead.

Publicly accessible URLs by default

Upon reading the spec, while preparing to implement remoteStorage, I was somewhat unpleasantly surprised to find that apparently all files stored in a remoteStorage endpoint are publicly accessible by default.

The spec suggests to "use unguessable file names", which seems to be a textbook example of security through obscurity, and has the following issues:

  1. Many developers who start out with remoteStorage will not be aware that unguessable filenames are a security requirement, and will develop insecure software as a result. Non-security is the default.
  2. Even if they are aware of this requirement, they will have to implement cryptographically secure random filename generation to make the filenames truly unguessable - this is bound to go wrong. Not only is there a notable lack of reliable cryptographical libraries and documentation for Javascript, most developers will also be unable to judge whether something is cryptographically secure.
  3. Even if the developer were to be aware of the guessability requirement, were capable of judging cryptographical security, and implemented everything perfectly... there would still be a trade-off between ease of development and security. If the application needs to retain filenames, it would require an entirely separate metadata store to connect unguessable random filenames to original, real filenames. As an additional concern, this would cause interoperability concerns, as another application may not necessarily be able to read out this metadata store, and could therefore not recover the filenames.

The solution to this seems simple to me: a 'public' flag. The Content-Type should already be stored separately from the data itself (according to the spec anyway), so it should be trivial to have a separate 'public' flag be stored alongside the file in a similar manner. Whether the storage gateway would serve a file over the unauthenticated /public/ interface, would then depend on whether the 'public' flag for the file were enabled.

Is there something I'm missing here, or was this simply an oversight in the spec?

EDIT: Briefly after posting this, somebody else pointed out that they interpreted the spec as saying that there is a separate /public/ endpoint, and that you'd have to explicitly store files there to make them publicly accessible. I'm not sure myself, so could you confirm this? If that is indeed the case, ignore all of the above :)

Typo?

I think there's a typo in the spec on line 274. It includes a question mark in a storage URL example:

https://example.com/some?path/to/storage

Specify for a variety of authentication mechanisms

The spec speaks of "credentials" but has more been tried than the username / password model that the entire authentication world wants to get away from?

The strict reliance on bearer tokens gives me the feeling that no desires exist to join this useful movement. Note that these tokens are so incredibly weakly protected that they can only run over TLS -- and leave your security at the mercy of the browser's HTML environment, which has a track record of playing tricks on its users as a result of its flexible/pluggable nature. The problem being that these tokens grant permissions by just being shown -- they act like passwords.

There are more potent authentication mechanisms that would simplify the use of RemoteStorage and at the same time improve security, by performing authentication in the context outside the webpages' HTML and JavaScript:

  • X.509 client certificates (and some day, thanks to RFC 6091, OpenPGP keys)
  • Kerberos Single Sign-on through the HTTP Negotiate method (RFC 4559) wraps GSSAPI exchanges into base64 encoded headers -- which is almost omnipresent.
  • Secure Remote Passwords (RFC 5054 puts it in TLS, but it is available in more places)

Please, do not constrain the specification to just a single form of authentication, and test if it can do the other mechanisms as well. If pragmatic choices have to be made to get work done, then please keep the specification as general as possible -- and provide a versioned JavaScript library to implement the subset that you can currently handle, and prepare for adding these other mechanisms.

The web is held back by people sticking to the lowest common denominator, and RemoteStorage is such a powerful idea that it could make a really loud call for proper authentication integration with browsers / users.

Support for quota imposed by the storage provider

The storage provider needs a way of notifying the client, that it cannot store anymore data, because the user isn't allowed to store more data (usually referred to as hitting the "quota").

WebDAV defines the status code 507 for that, which we could reuse.

transport header values literally in folder listings

Julian Reschke suggested we should include the quotes on the ETag in the folder listing.

i guess then we should also quote the Content-Length header as a string, not as a number.

i think since we only use strong ETags, we can go either way with this. it's only a syntax issue, doesn't affect how things work, we just have to agree on one way of doing it :)

current:

     "items": {
       "abc": {
         "ETag": "DEADBEEFDEADBEEFDEADBEEF",
         "Content-Type": "image/jpeg",
         "Content-Length": 82352
       }
       "def/": {
         "ETag": "1337ABCD1337ABCD1337ABCD"
       }
     }

alternative:

     "items": {
       "abc": {
         "ETag": "\"DEADBEEFDEADBEEFDEADBEEF\"",
         "Content-Type": "image/jpeg",
         "Content-Length": "82352"
       }
       "def/": {
         "ETag": "\"1337ABCD1337ABCD1337ABCD\""
       }
     }

this would be more compatible with the W/"abcdabcdabcd" syntax of weak ETags (even though we don't support those)

allowed characters syntax

Consider using BNF to specify the allowed characters in path names:

%x25 / %x2D / %x2E / %x30-39 / %x41-5A / %x5F / %x61-7A

limit length of module names, base URLs, auth URLs, redirect URLs, etc?

there is a practical limit of about 2000 characters (http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers ) for module names and other strings that the spec deals with, but we're not saying anything about this.

in practice this limit will probably never come into play, but we could say something like servers may reply with a 500 error if a scope uses more than 10 modules, or if any of the module names in there exceeds 50 characters.

Support "access_token" query parameter to retrieve documents

Instead of only allowing "Authorization: Bearer xyz" it should also be made possible to use ?access_token=xyz token parameter. This is needed when embedding images, audio, video in a page without needing to use XMLHttpRequest to fetch the blob first before displaying.

App that needs this: music player.

Why are version numbers time stamps?

Remote storage defines version numbers as "The current version of a document is the 13-digit decimal number representing the number of milliseconds between 00:00 UCT, 1 January 1970, and the last time its content or content type were set or changed successfully." Is there any reason why a version number has these requirements? The specification only uses the version number as an opaque identifier, so requiring that it must be a strong etag seems to be enough.

discuss how distributed versioning would work from a CS point of view

as suggested by @mnot (personal communication), we should discuss, from a Computer Science point of view, how people can use the ETag, If-Match and If-None-Match headers to implement distributed versioning, and how this works when clients have been making changes while they were offline, and run into conflicts when they come online again. i'll draft a paragraph about this.

Add support for HEAD requests

Right now I have no way to find out the size or MIME type of a resource, without retrieving the entire resource itself. Similarly the only way to find out the most recent version of a resource is to retrieve the parent directory.

So I'd like to propose that we require support for the HEAD verb:

A successful HEAD request to any valid resource MUST result in either a "200 OK"
or a "404 Not Found" response (based on whether it exists or not), including at least
the "ETag" and "Content-Type" headers. For documents the response MUST also
include the correct "Content-Length" header. For folders the "Content-Length" MAY
be omitted.

HTTP return codes

             path or unrecognized http verb, etcetera), as well as for
             all PUT and DELETE requests to folders,'''

For unrecognized verbs I think a 405 should be returned ("method not allowed")

```    \* 401 for all requests that don't have a bearer token with
             sufficient permissions,'''

This is "out of scope" here, it can be listed but is actually part of the Bearer token spec. That one states that for missing tokens or invalid tokens a 401 should be returned, and a 403 when there is not sufficient permission.

Improve Format of Directory Listings

I think the JSON format of directory listings can (and should) be improved. From the current spec, a directory is listed like this:

{
  "abc": "DEADBEEFDEADBEEFDEADBEEF",
  "def/": "1337ABCD1337ABCD1337ABCD"
}

The problems I see with this are, in no particular order:

  • It is impossible to specify a schema for this object. While this is not a problem per-se, it feels strange that in other places remotestorage encourages the usage of schemas, but uses such an anomaly in a central place.
  • It cannot be easily extended in future versions, because there is no obvious way to add new properties to this object which would not look like files or folders. And even if we can come up with an convention to do so, it would likely be error prone to implement client-side. The lack of extension points has already been criticized in issue #4.
  • It is cumbersome to parse using any language / JSON library I'm aware of (except JavaScript, where using objects as associative arrays is quite standard).

Therefore I'd like to suggest a response format like this:

{ "contents" :
  [ { "name" : "abc",  "tag" : "DEADBEEFDEADBEEFDEADBEEF" }
  , { "name" : "def/", "tag" : "1337ABCD1337ABCD1337ABCD" }
  ]
}

The proposed format

  • is easy enough to translate to the current format client-side in JavaScipt (where it's likely beneficial having an assoc. array of the "old" format),
  • offers obvious extension points for the directory listing as a whole (the top-level object),
  • offers obvious extension points for the listed items (the objects in the list); see also issue #21 which opts for supporting HEAD requests for determining the file size. While I strongly support this proposal, I think issuing numerous HEAD requests purely for determining the file sizes would, because of the necessary server roundtrips, be slower than it should be.
  • there can be a schema for this format, and because of this
  • highly efficient parsers can be built using standard tools.

I don't consider my proposal "final", e.g. if it's called contents or items or whatever is not the scope of this post, it's just about the general structure. Maybe it's a good idea to add a version property to the top-level object, I don't really have an opinion about this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.