cs3org / cs3apis Goto Github PK
View Code? Open in Web Editor NEW:arrows_clockwise: Connect Storage and Application Providers
Home Page: https://buf.build/cs3org-buf/cs3apis
License: Apache License 2.0
:arrows_clockwise: Connect Storage and Application Providers
Home Page: https://buf.build/cs3org-buf/cs3apis
License: Apache License 2.0
the usershare resource should use cs3.types.UserId instead of string for owner and creator
When asking a gateway for quota, we need to query all available storage providers and return the quota for every path segment, the response does not contain this information
Consider if a UserId (identified of a user in the system) should not have a message on its own (instead of just string).
This is to foresee an external scope (as done for ResourceId breaking it into opaque id and storage id).
By design a transfer tool is used for the data transfers. Depending on the tool or configuration transfers can be executed using different methods (streaming, third-party copy push/pull). Currently in our cs3 reference implementation reva we use rclone which supports both streaming and third-party push transfer (cs3org/reva#3491) and we can configure this.
I propose to change PullTransfer
method from the datatx
module to CreateTransfer
.
The CS3APIs implementation should allow disabling the creation of a version when saving a file.
At the same time, it should be possible to manually create a version, so a new method needs to be added.
This will be useful for apps that want to do auto save, which should not overwrite other versions (which might happen if a limited number of versions is allowed).
The storage implementation should be able to handle these cases. EOS already does: https://its.cern.ch/jira/browse/EOS-4503
Broken after v1beta1 change due to new depths in the tree
The key is a string and does not contain information enough to be routable, it needs the storage_id
When using the usual CS3API calls to delete a file, that file is not physically deleted but moved into a recycle bin. This is extremely problematic when performing health checks: These checks upload about 100kb per run and delete the files afterwards. However, since these files are only moved to some other place, they quickly pile up, resulting in full volumes.
As uploading files is an essential check, we either need (a) an option to truly delete files, (b) a way of clearing the recycle bin of the current user (might already be there), (c) another workaround.
To perform redirections on operations we need a redirection status code.
To know where to redirect we need a storage resource reference AND a protocol type.
A storage reference can point to other storage providers inside the CS3 cloud and we can connect to them using the CS3APIs but the reference can point to an OCM share or to a remote storage like a FTP server, therefore we need a protocol type so the client know how to fetch the information from the referenced resource, in case of OCM, webdav.
The main one is the Go one: https://github.com/cs3org/go-cs3apis
But we should publish for mainline languages.
Add tus feature and version to https://cs3org.github.io/cs3apis/#cs3.storage.registry.v1beta1.ProviderInfo
To properly handle [crappy proxies] we also need to be able to return the chunk size: tus/tus-resumable-upload-protocol#93
https://cs3org.github.io/cs3apis/#cs3.storage.provider.v1beta1.ResourcePermissions should have:
""
...This should return details such as the tagged and git versions, build date and platform, language version, and maybe some capabilities such as third party integrations and supported protocols (for uploads?)
This is related to the ListContainerStreamRequest and the Opaque map on the review branch.
For the time being we can implement searching and filtering by using the Opaque property of the ListContainerStreamRequest. In order to collect candidates for new ListContainerStreamRequest properties we should collect the different property keys somewhere.
@moscicki @labkode do you already have keys in mind? I can see
search
with a json encoded pattern
, a prefix
or a lucene query payloadpagination
with limit
and offset
or cursor
, more details in owncloud/web#116While the usecase of limiting a user to only add grants seems valid, it already comes with the requirement to let him manage the grants he created, which requires UpdateGrant permissions.
Having a single permissions for a SetGrant would be sufficient. If users should be limited to manage their own, all or only the permissions for the groups they are a member of is implementation specific and would require the share owner to be part of the Grant, which it currently is not. Currently, a resource can only be owned by a single User
We should allow sending MD Metadata with
Furthermore, we need to be able to manipulate metadata, in order to
I see the WhoAmIRequest
lets you specify the token
which you received from AuthenticateResponse
, but none of the other method Request objects do.
I see revad throws "core access token not found" when you try to do for instance a ListReceivedOCMShared
call.
Is there another way to pass the token in a grpc request?
currently, only the share manager knows who created a share. That means that both: the share manager and the storage provider are necessary to reconstruct all share information. The share manager can currently not be rebuild using the storage provider alone.
The favorite flag is a user specific property for a file that cannot be mapped to extended attributes without leaking who has marked a file as a favorite.
It is a specific case of a tag, which is user individual as well. I see these types of tags
This can be solved using different namespaces or scopes for tags
Obviously this only is secure when the u/s/g/a namespaces are not accessible by users in the filesystem. public tags can be mapped to extended attributes, eg. dublin core metadata.
Ref #99 (comment)
A resource info can contain in the metadata a pointer to a share id.
The CI for PRs triggers to steps: one on PR and other on push. For some reason one never starts and I can only merge based on my admin status.
The storage providers ResourceInfo should be able to return more than one checksum.
We need a way to let clients specify which checksums they are requesting. In cs3org/reva#1400 I used a checksum
metadata key in the Stat and ListContainer to indicate I want to read all checksums. Do we want to be able to specify which algorithm? Using sth like checksum:sha1
, checksum:md5
, checksum:sha1,md5,adler32
similar to HTTP headers? ure use a format similar to what TUS is using ([algo] [hash]
)?
InitiateFileUploadRequest needs a way to pass the expected checksum. For both implementing OC 1- style checksums as well as the TUS checksumming extension. In cs3org/reva#1400 I am using an "Upload-Checksum" in the Opaque properties of the request: https://github.com/cs3org/reva/pull/1400/files#diff-198f1004a921b3627f7572a239452974429a7de6e4fa47f445c2ad35d2cd9026R320
Currently, there is no support for additional file / folder metadata like extended attributes. Webdav has properties, which I would like to map onto extended attributes. The cs3 apis should allow doing that.
While investigating cs3org/reva#2148 I ran into a question:
Why is there on the one hand a cs3/ocm/core
package, with:
And on the other hand a cs3/sharing/collaboration
package, with:
I think that ocm/core
is used by the sender and sharing/collaboration
is used by the receiver of a share, but if that's really the explanation then we should probably rename them to something like ocm/sender
and ocm/receiver
, respectively?
Currently in Reva the permissions of a user are stored closely to the file as grants.
Now we want to introduce global permissions which do not belong to a certain resource but allow the user to do things like Create a new space
or user has role "admin"
(roles are just a collection of permissions).
I want to propose a new simple API to add and list (global) permissions of a user. Although this API could be used to also store resource specific permissions, we would only use it for global permissions for now.
The service would store role and permissions assignments and could answer queries like has user x the permissions 'create-space'
.
@ishank011 @labkode
Did you already think of something like that? Are you against such a service? If not I will create a PR.
/cc @refs
I saw that we are using ctime in a confusing way. ctime actually is the change time. the difference to mtime is that mtime indicates when the content changed. ctime indicates when the metadata changed. For a good explanation of atime, mtime and ctime see https://www.unixtutorial.org/atime-ctime-mtime-in-unix-filesystems
Some filesystems do track the creation time, but it is better identified as btime
, or birthime
. AFAICT the linux kernel supports reading the btime with the statx call. I don't know how widely that is used. There is a golang module that sums it up: https://github.com/djherbis/times#supported-times
While there is no easy way to get the birth time we should use btime in addition to ctime and document properly which is used for what.
see https://github.com/cs3org/cs3apis/blob/master/cs3/storageprovider/v0alpha/resources.proto#L101-L104
In order to transparently handle symlinks the cs3api should have a dedicated call to create symlinks. We plan to use it so that our settings, accounts and store service can use the cs3 api to persist data.
When implementing initial checksum support I stumbled over a difference with the webdav api that may need clarification in eg.:
cs3apis/cs3/storage/provider/v1beta1/provider_api.proto
Lines 337 to 340 in 8a19a7f
We currently translate an allprops PROPFIND to a *
key, but allprops should only return some default properties. Some live properties might be expensive to compute so they are not calculated in an allprops PROPFIND. See https://tools.ietf.org/html/rfc4918#section-9.1
An account management system, like LDAP, can contain different type of accounts with different connotations.
For example, a primary, secondary, service, or guest account.
I suggest to add an free text format field for time being.
java stubs should also be made available: https://github.com/grpc/grpc-java/tree/master/compiler
For ownCloud Web we needed a different way to authenticate downloads of password protected link shares. The current design of the share manager doesn't allow us to implement this alternative authentication method namely pre-signed urls.
We are using the share password hash as the signing key but in the current implementation the share password hash is not accessible outside of the share manager.
To allow this alternative authentication method I propose three options:
PublicShare
struct
GetPublicShare
and GetPublicShareByToken
PublicShare
is included in a responseThis http://localhost:8080/index.php/s/2LfFihmw6cD7qxU
is an example public link. The last part 2LfFihmw6cD7qxU
is the random token identifying the share.
For password protected shares the credentials can be included in the 'Authorization' header using basic auth where 'public' is the username and the share password is the password.
When we want to download a file using the anchor tag <a href="...."
we can't supply the Authorization header to the request.
In ownCloud 10 this worked by using cookies but in ownCloud Web we don't have cookies.
Because of that we have decided to use signed URLs. The advantage of signed-URLs is that, just like with unprotected public links, the possession of the URL is proof of authorization. Meaning anyone who has the signed URL can download the resource.
The general idea is this:
The client sends a PROPFIND request to the backend (including the credentials in the auth header)
The client has to request the `downloadURL` attribute to receive the signed URL
The backend does the normal credential check
The backend generates the PROPFIND response
When the client requested the `downloadURL` attribute the signed URL is generated and returned as the `downloadURL`
The client can now issue GET requests the `downloadURL`
The detailed signing process looks as follows:
Let's assume we have a shared tree like this:
Share Root
some-file.txt
Folder/
another-file.txt
To create the signature we need:
The resource path relative to the share root: `/Folder/another-file.txt`
A timestamp of the expiration date for the signature. Right now it's 30 minutes from the moment of signing
The share token
The share password (hash)
And then we calculate the HMAC like this:
HMAC_SHA512/256(SHA256(share_password_hash), share_token + resource_path + timestamp)
The HMAC_SHA512/256 just means that we are using SHA512 as the hash function but want to receive a code with a length of 256 bits.
Then to create the signed URL we take the resource path and add the signature and the expiration timestamp as query parameters.
To verify the signature the backend will repeat the process above but this time it will take the timestamp from the requested URL.
If the current time is earlier than the expiration time continue with step 2. otherwise return an authentication failed response
Calculate the HMAC
Compare the signature passed via the query parameter with the calculated one
If it doesn't match return an authentication failed response
Continue serving the resource
Frequently Asked Questions
Let's assume we would use hashes like this:
Hash(share_token + resource_path) = signature
Then all the information needed to create the signature would be public and the security of our share would be reduced to a public link without password.
Okay then let's include a secret like this:
Hash(share_token + resource_path + secret) = signature
Cool we just reinvented HMAC so let's use that instead. ;)
If we wanted to do client side signing we would need a shared secret between the backend and the client.
The client knows the share password can't we use that?
The backend shouldn't store the password in plain text but in hashed format. Also the password should be hashed with a random salt.
The client could in fact also hash the password before using it as a HMAC key but the client doesn't have access to the random salt from the backend. Therefore it could never generate the same HMAC.
We could generate a new secret just for signing but this would require more changes to the API because the client would need to be able to request that secret.
Also we would need to store that in addition to the other share data.
Technically yes but the share token is public and therefore anyone with the knowledge of the share token can create the signature and this again reduces the security of the share to a public link without password.
I hope this document clarified the design of signed URLs for password protected shares.
Do we expose them to end users via CanonicalMetadata?
This issue attempts to give a compact overview of protocol issues I ran into while implementing the ios-sdk
.
I found the number of endpoints to hit before being able to start a session in the iOS app unnecessarily long:
.well-known/openid-configuration
to detect OIDCstatus.php
to detect redirections, version and maintenance modeuser.php
to retrieve the current usercapabilities.php
to retrieve the current set of capabilitiesA single endpoint for fetching all of this information in one request would simplify this greatly. such an endpoint could be passed a list of info segments to include, like f.ex.: info.php?include=status,user,capabilities
(or an extended status.php
with that functionality).
Clients currently have to request a thumbnail for a file to determine if a thumbnail is available. In a directory with many files for whose file types no thumbnails are available, this generates a lot of unnecessary requests.
What would help:
capabilities
) for which the server supports thumbnail generationoc:thumbnail-available
)The latter approach would have the benefit of keeping the logic for which item thumbnails are available in the server.
Related issue: owncloud/core#31267
PROPFIND
responsesPROPFIND
s typically return two d:propstat
tags for every item.
First, the part with information on the item:
<d:response>
<d:href>/remote.php/dav/files/admin/</d:href>
<d:propstat>
<d:prop>
<d:resourcetype>
<d:collection/>
</d:resourcetype>
<d:getlastmodified>Fri, 23 Feb 2018 11:52:05 GMT</d:getlastmodified>
<d:getetag>"5a9000658388d"</d:getetag>
<d:quota-available-bytes>-3</d:quota-available-bytes>
<d:quota-used-bytes>5812174</d:quota-used-bytes>
<oc:size>5812174</oc:size>
<oc:id>00000009ocre5kavbk8j</oc:id>
<oc:permissions>RDNVCK</oc:permissions>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
Then, second, the part with requested information not available for the item:
<d:propstat>
<d:prop>
<d:creationdate/>
<d:getcontentlength/>
<d:displayname/>
<d:getcontenttype/>
</d:prop>
<d:status>HTTP/1.1 404 Not Found</d:status>
</d:propstat>
… and finally the closing tag:
</d:response>
The second part (on information not available) consumes bandwidth, CPU cycles, memory and power - all of which are very valuable on a mobile device - but provides no benefit at all.
It'd therefore be great to be able to omit the second/useless part. And if it needs to be there to conform with the WebDAV standard, provide an option to omit it.
Related to the aforementioned issue: I like and respect WebDAV (a lot!), but where it really falls short is in the expression of information in a compact format. Above XML, after all, just contains this information:
location: /remote.php/dav/files/admin/
type: collection
last-modified: Fri, 23 Feb 2018 11:52:05 GMT
Etag: "5a9000658388d"
quota-available: -3
quota-used: 5812174
size: 5812174
fileID: 00000009ocre5kavbk8j
permissions: RDNVCK
Even in HTTP-header-esque notation, it's already a lot less bytes. Now imagine what a binary format could achieve ([VID]
is a VarInt encoding block type + length, [VIN]
is a VarInt encoding a number):
[VID]/remote.php/dav/files/admin/[VID][VIN][VID][VIN][VID]"5a9000658388d"[VID][VIN][VID][VIN][VID][VIN][VID]00000009ocre5kavbk8j[VID][VIN]
Therefore, when working on a new architecture, I believe it could be good future-proofing to move away from WebDAV internally - and isolate everything related to WebDAV into a backend that performs the conversion between the native internals and WebDAV requests and responses.
With protocol-agnostic internals, a new backend providing access via new, more compact request and response formats would not be far away.
There's currently no way to detect changes to shares without requesting and comparing them in entirety.
It'd be great to have an ETag
returned for Sharing API responses that only changes if the requested shares in the response change. And then, also support forIf-None-Match
.
In order to find changed items, it is currently necessary to check the ETag
of the root folder, if it changed, fetch its contents, then find items with changed ETag
s among it … and repeat this until all changes have been discovered.
Some ideas on how this could be improved:
Delivered as part of the PROPFIND
response. That SyncAnchor
could then be used to make a subsequent request for a list of changes that occurred since that PROPFIND
response. If a SyncAnchor
is too old, a respective error would be returned.
Clients could subscribe to change events. Every change on the server would then generate an event and put it in the client's event queue.
While the client is connected, events could be delivered immediately (push).
If the client can't consume the events as fast as they are generated (f.ex. because it's not connected or only through a slow connection), and the number of events in the queue surpasses a limit (f.ex. 1000 events - or a time threshold), they get dropped and replaced with a single reload
event, which would tell the client to perform old-style change detection by traversing the tree.
Alternatively, the subscription could expire at that point, and the client receive an error response when trying to resume the connection to the event subscription endpoint.
Currently, not all changes to an item that are relevant to clients lead to a new ETag
. Examples include a change in sharing of oc:favorite
status.
For this reason, the new iOS app currently has to:
PROPFIND
(depth 1) when the user navigates to a directory, to make sure sharing and favorite status of the items are up-to-date when presented to the user. The toll on bandwidth/CPU/memory/battery can be significant for folders with a lot of items. If all relevant changes were propagated to the ETag
, a depth 0 PROPFIND
would be sufficient to detect changes.I understand that the ETag
may, by definition and meaning, not be the right vehicle for propagating these changes. A MTag
(or any other name) for tracking and propagating these metadata changes could also solve this issue.
Implementing offline support in the new iOS app and SDK required finding a way to efficiently track an item from local generation to upload to file on the server.
Since the oc:id
(File ID
) of an item is generated by the server, it can't be known beforehand - and therefore can't be used at the beginning of the item's lifecycle.
The solution implemented in the new iOS SDK was to generate a unique ID locally and track the item by Local ID
. This works well, but involves quite a bit of complexity to ensure that, once a file is on the server, it always get's the same Local ID
attached.
A great simplification would be if it was possible for clients to provide a Client ID
, which gets stored alongside the File ID
and doesn't change for the lifetime of the item on the server.
And then, the option to have all APIs (especially PROPFIND
and sharing) also include that Client ID
in addition to the existing File ID
in responses.
Client-generated IDs would be prefixed with an app-specific prefix, f.ex. ios:[UUID goes here]
.
In cases, where no Client ID
was provided for an item, its Item ID
would be returned instead.
Moved files present a challenge as they often first get noticed by clients as missing from the folder they were previously in. Only to reappear in a different folder while changes are discovered.
In consequence, the client needs to keep the missing item around until it has finished change discovery and can't f.ex. remove local copies immediately.
A way to perform a PROPFIND
on a File ID
to determine its current whereabouts and status would be great to make this more efficient.
Item operations – like uploads or folder creation – only return rudimentary information on the item operated upon.
While that information is sufficient to form subsequent requests, it's not sufficient for clients to generate a full-fledged local representation of that item with all metadata.
In consequence, after performing item operations, it's often necessary to perform a subsequent PROPFIND
on the item to retrieve the entire set of metadata.
This is not a big issue, but costs (some) performance and there's - in theory - a small window between the operation finishing and the PROPFIND
during which another client could replace, move or delete the item again.
If the response contained more metadata (requested f.ex. through a new HTTP request header listing the tags requested), the operation would be entirely atomic.
Where is the Dockerfile for https://hub.docker.com/r/cs3org/cs3apis?
Currently there is no locking support. The CS3 apis should allow implementing locking functionality for eg webdav or s3.
allows the server to allocate files or check available quota.
Needed for tus creation extension, which always expects a Upload-Length
.
then replace Opaque data in
We should make this optional to allow the clients to initialize an upload using tus and the creation-defer-length
extension. Using 0 to indicate creation-defer-length
does not work, because you might want to create an empty file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.