cs3org / cs3apis Goto Github PK

View Code? Open in Web Editor NEW

51.0 11.0 29.0 1.01 MB

:arrows_clockwise: Connect Storage and Application Providers

Home Page: https://buf.build/cs3org-buf/cs3apis

License: Apache License 2.0

Makefile 5.92% Go 94.08%

cs3-apis

cs3apis's Introduction

The CS3APIS connect Storage and Applications Providers.

API Documentation

https://buf.build/cs3org-buf/cs3apis

Officialy compiled libraries

The libraries for different languages are compiled from the protobuf definitions in this repo. When a commit to master is made the CI takes care to create a new version of the library in the following languages. Please note that the versioning used in the libraries below differs from language to language, however they point to the same source commit. This is due to the way the different package managers handle package versions.

Go: https://github.com/cs3org/go-cs3apis
Python: https://github.com/cs3org/python-cs3apis
Javascript: https://github.com/cs3org/js-cs3apis (to be used from Web applications - frontend usage only)
NodeJS: https://github.com/cs3org/node-cs3apis (to be used from NodeJS applications - backend)

Repository packages

Go: https://pkg.go.dev/github.com/cs3org/go-cs3apis
Python: https://pypi.org/project/cs3apis/
Javascript: https://www.npmjs.com/package/@cs3org/cs3apis
NodeJS: https://www.npmjs.com/package/@cs3org/node-cs3apis

Local compilation

You need to have Docker installed. The artifacts will be available under the build directory.

$ git clone https://github.com/cs3org/cs3apis
$ cd cs3apis
$ make

The build folder will be generated. It will be owned by root.

See the Makefile to find all the available build options.

Overview

This repository contains the interface definitions of public CS3APIS that support the gRPC protocol. You can also use these definitions with open source tools to generate client libraries, documentation, and other artifacts.

CS3 APIs use Protocol Buffers version 3 (proto3) as their Interface Definition Language (IDL) to define the API interface and the structure of the payload messages.

Repository Structure

This repository uses a directory hierarchy that reflects the CS3 feature set. In general, every API has its own root directory, and each major version of the API has its own subdirectory. The proto package names exactly match the directory: this makes it easy to locate the proto definitions and ensures that the generated client libraries have idiomatic namespaces in most programming languages.

NOTE: The major version of an API is used to indicate breaking change to the API.

cs3apis's People

Contributors

Stargazers

Watchers

cs3apis's Issues

storage: allow sending metadata with CreateDir and Upload requests

We should allow sending MD Metadata with

FS.CreateDir and
FS.Upload to send along the original ctime and mtime as well as extended attributes.

Furthermore, we need to be able to manipulate metadata, in order to

implement touch()
set extended attributes

Automatize generation of CS3APIs

The main one is the Go one: https://github.com/cs3org/go-cs3apis

But we should publish for mainline languages.

Run language-specific builds in PRs without pushing to the corresponding repos

Ref #99 (comment)

Flesh out CanonicalMetadata

see https://github.com/cs3org/cs3apis/blob/master/cs3/storageprovider/v0alpha/resources.proto#L101-L104

executable bit
macos lables, some discussion in owncloud/client#4511
acls?

New service: UserProvider and User Registry

Add CreateSymlink to storage provider

In order to transparently handle symlinks the cs3api should have a dedicated call to create symlinks. We plan to use it so that our settings, accounts and store service can use the cs3 api to persist data.

Set shareID into resource information

A resource info can contain in the metadata a pointer to a share id.

How do `ocm/core` and `sharing/collaboration` relate?

While investigating cs3org/reva#2148 I ran into a question:

Why is there on the one hand a cs3/ocm/core package, with:

CreateOCMShareRequest

And on the other hand a cs3/sharing/collaboration package, with:

CreateShareRequest

I think that ocm/core is used by the sender and sharing/collaboration is used by the receiver of a share, but if that's really the explanation then we should probably rename them to something like ocm/sender and ocm/receiver, respectively?

Add an API to get server info

This should return details such as the tagged and git versions, build date and platform, language version, and maybe some capabilities such as third party integrations and supported protocols (for uploads?)

Create admin API like Google does it for its services

Account transfers
Integration with other deployed system for user lifecycle management

storage: allow searching and filtering

This is related to the ListContainerStreamRequest and the Opaque map on the review branch.

For the time being we can implement searching and filtering by using the Opaque property of the ListContainerStreamRequest. In order to collect candidates for new ListContainerStreamRequest properties we should collect the different property keys somewhere.

@moscicki @labkode do you already have keys in mind? I can see

search with a json encoded pattern, a prefix or a lucene query payload
pagination with limit and offset or cursor, more details in owncloud/web#116

Protocol issues experienced while developing the new iOS SDK

This issue attempts to give a compact overview of protocol issues I ran into while implementing the ios-sdk.

Info endpoints

I found the number of endpoints to hit before being able to start a session in the iOS app unnecessarily long:

.well-known/openid-configuration to detect OIDC
status.php to detect redirections, version and maintenance mode
user.php to retrieve the current user
capabilities.php to retrieve the current set of capabilities

A single endpoint for fetching all of this information in one request would simplify this greatly. such an endpoint could be passed a list of info segments to include, like f.ex.: info.php?include=status,user,capabilities (or an extended status.php with that functionality).

Thumbnails

Clients currently have to request a thumbnail for a file to determine if a thumbnail is available. In a directory with many files for whose file types no thumbnails are available, this generates a lot of unnecessary requests.

What would help:

a list of mime types (provided through f.ex. capabilities) for which the server supports thumbnail generation
the DAV endpoint providing this info for every item with a custom tag (f.ex. oc:thumbnail-available)

The latter approach would have the benefit of keeping the logic for which item thumbnails are available in the server.

Related issue: owncloud/core#31267

Unnecessary large `PROPFIND` responses

PROPFINDs typically return two d:propstat tags for every item.

First, the part with information on the item:

<d:response>
    <d:href>/remote.php/dav/files/admin/</d:href>
    <d:propstat>
        <d:prop>
            <d:resourcetype>
                <d:collection/>
            </d:resourcetype>
            <d:getlastmodified>Fri, 23 Feb 2018 11:52:05 GMT</d:getlastmodified>
            <d:getetag>"5a9000658388d"</d:getetag>
            <d:quota-available-bytes>-3</d:quota-available-bytes>
            <d:quota-used-bytes>5812174</d:quota-used-bytes>
            <oc:size>5812174</oc:size>
            <oc:id>00000009ocre5kavbk8j</oc:id>
            <oc:permissions>RDNVCK</oc:permissions>
        </d:prop>
        <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>

Then, second, the part with requested information not available for the item:

    <d:propstat>
        <d:prop>
            <d:creationdate/>
            <d:getcontentlength/>
            <d:displayname/>
            <d:getcontenttype/>
        </d:prop>
        <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>

… and finally the closing tag:

</d:response>

The second part (on information not available) consumes bandwidth, CPU cycles, memory and power - all of which are very valuable on a mobile device - but provides no benefit at all.

It'd therefore be great to be able to omit the second/useless part. And if it needs to be there to conform with the WebDAV standard, provide an option to omit it.

WebDAV

Related to the aforementioned issue: I like and respect WebDAV (a lot!), but where it really falls short is in the expression of information in a compact format. Above XML, after all, just contains this information:

location: /remote.php/dav/files/admin/
type: collection
last-modified: Fri, 23 Feb 2018 11:52:05 GMT
Etag: "5a9000658388d"
quota-available: -3
quota-used: 5812174
size: 5812174
fileID: 00000009ocre5kavbk8j
permissions: RDNVCK

Even in HTTP-header-esque notation, it's already a lot less bytes. Now imagine what a binary format could achieve ([VID] is a VarInt encoding block type + length, [VIN] is a VarInt encoding a number):

[VID]/remote.php/dav/files/admin/[VID][VIN][VID][VIN][VID]"5a9000658388d"[VID][VIN][VID][VIN][VID][VIN][VID]00000009ocre5kavbk8j[VID][VIN]

Therefore, when working on a new architecture, I believe it could be good future-proofing to move away from WebDAV internally - and isolate everything related to WebDAV into a backend that performs the conversion between the native internals and WebDAV requests and responses.

With protocol-agnostic internals, a new backend providing access via new, more compact request and response formats would not be far away.

Sharing API

There's currently no way to detect changes to shares without requesting and comparing them in entirety.

It'd be great to have an ETag returned for Sharing API responses that only changes if the requested shares in the response change. And then, also support forIf-None-Match.

Change detection

In order to find changed items, it is currently necessary to check the ETag of the root folder, if it changed, fetch its contents, then find items with changed ETags among it … and repeat this until all changes have been discovered.

Some ideas on how this could be improved:

Sync Anchor

Delivered as part of the PROPFIND response. That SyncAnchor could then be used to make a subsequent request for a list of changes that occurred since that PROPFIND response. If a SyncAnchor is too old, a respective error would be returned.

Event subscription

Clients could subscribe to change events. Every change on the server would then generate an event and put it in the client's event queue.

While the client is connected, events could be delivered immediately (push).

If the client can't consume the events as fast as they are generated (f.ex. because it's not connected or only through a slow connection), and the number of events in the queue surpasses a limit (f.ex. 1000 events - or a time threshold), they get dropped and replaced with a single reload event, which would tell the client to perform old-style change detection by traversing the tree.

Alternatively, the subscription could expire at that point, and the client receive an error response when trying to resume the connection to the event subscription endpoint.

Metadata change propagation

Currently, not all changes to an item that are relevant to clients lead to a new ETag. Examples include a change in sharing of oc:favorite status.

For this reason, the new iOS app currently has to:

always retrieve a full PROPFIND (depth 1) when the user navigates to a directory, to make sure sharing and favorite status of the items are up-to-date when presented to the user. The toll on bandwidth/CPU/memory/battery can be significant for folders with a lot of items. If all relevant changes were propagated to the ETag, a depth 0 PROPFIND would be sufficient to detect changes.
poll the server for a list of favorites when the user enters a view presenting a list of all of them.
poll the server for a full (=> also see Sharing API above) list of shares on a regular basis to keep file.

I understand that the ETag may, by definition and meaning, not be the right vehicle for propagating these changes. A MTag (or any other name) for tracking and propagating these metadata changes could also solve this issue.

File IDs / Local IDs

Implementing offline support in the new iOS app and SDK required finding a way to efficiently track an item from local generation to upload to file on the server.

Since the oc:id (File ID) of an item is generated by the server, it can't be known beforehand - and therefore can't be used at the beginning of the item's lifecycle.

The solution implemented in the new iOS SDK was to generate a unique ID locally and track the item by Local ID. This works well, but involves quite a bit of complexity to ensure that, once a file is on the server, it always get's the same Local ID attached.

A great simplification would be if it was possible for clients to provide a Client ID, which gets stored alongside the File ID and doesn't change for the lifetime of the item on the server.

And then, the option to have all APIs (especially PROPFIND and sharing) also include that Client ID in addition to the existing File ID in responses.

Client-generated IDs would be prefixed with an app-specific prefix, f.ex. ios:[UUID goes here].

In cases, where no Client ID was provided for an item, its Item ID would be returned instead.

Moved files

Moved files present a challenge as they often first get noticed by clients as missing from the folder they were previously in. Only to reappear in a different folder while changes are discovered.

In consequence, the client needs to keep the missing item around until it has finished change discovery and can't f.ex. remove local copies immediately.

A way to perform a PROPFIND on a File ID to determine its current whereabouts and status would be great to make this more efficient.

Atomic operations

Item operations – like uploads or folder creation – only return rudimentary information on the item operated upon.

While that information is sufficient to form subsequent requests, it's not sufficient for clients to generate a full-fledged local representation of that item with all metadata.

In consequence, after performing item operations, it's often necessary to perform a subsequent PROPFIND on the item to retrieve the entire set of metadata.

This is not a big issue, but costs (some) performance and there's - in theory - a small window between the operation finishing and the PROPFIND during which another client could replace, move or delete the item again.

If the response contained more metadata (requested f.ex. through a new HTTP request header listing the tags requested), the operation would be entirely atomic.

Permissions and Roles API

Currently in Reva the permissions of a user are stored closely to the file as grants.
Now we want to introduce global permissions which do not belong to a certain resource but allow the user to do things like Create a new space or user has role "admin" (roles are just a collection of permissions).
I want to propose a new simple API to add and list (global) permissions of a user. Although this API could be used to also store resource specific permissions, we would only use it for global permissions for now.

The service would store role and permissions assignments and could answer queries like has user x the permissions 'create-space'.

@ishank011 @labkode
Did you already think of something like that? Are you against such a service? If not I will create a PR.

/cc @refs

Switch to protobuf golang bindings V2

protolock: add to make deps

storage: extended attributes / properties

Currently, there is no support for additional file / folder metadata like extended attributes. Webdav has properties, which I would like to map onto extended attributes. The cs3 apis should allow doing that.

How do we represent retention periods in CS3?

Do we expose them to end users via CanonicalMetadata?

Allow user individual tags (and favorites)

The favorite flag is a user specific property for a file that cannot be mapped to extended attributes without leaking who has marked a file as a favorite.

It is a specific case of a tag, which is user individual as well. I see these types of tags

public that are maganed by everyone
private (or user individual) tags that are only visible to the user
system tags that are only visible to the system
group tags that are only visible to a group ...

This can be solved using different namespaces or scopes for tags

public = p:
private = u:: for user specific
system = s: for system
group = g::
app? = a:: for apps?

Obviously this only is secure when the u/s/g/a namespaces are not accessible by users in the filesystem. public tags can be mapped to extended attributes, eg. dublin core metadata.

UserId

Consider if a UserId (identified of a user in the system) should not have a message on its own (instead of just string).

This is to foresee an external scope (as done for ResourceId breaking it into opaque id and storage id).

Change key type for recyle restore

The key is a string and does not contain information enough to be routable, it needs the storage_id

ctime != creation time => ctime = change time, btime = birth time

I saw that we are using ctime in a confusing way. ctime actually is the change time. the difference to mtime is that mtime indicates when the content changed. ctime indicates when the metadata changed. For a good explanation of atime, mtime and ctime see https://www.unixtutorial.org/atime-ctime-mtime-in-unix-filesystems

Some filesystems do track the creation time, but it is better identified as btime, or birthime. AFAICT the linux kernel supports reading the btime with the statx call. I don't know how widely that is used. There is a golang module that sums it up: https://github.com/djherbis/times#supported-times

While there is no easy way to get the birth time we should use btime in addition to ctime and document properly which is used for what.

Generate C++ Apis

Document that OpaqueId for user should be non-reassignable

Allow handling multiple checksums

The storage providers ResourceInfo should be able to return more than one checksum.
We need a way to let clients specify which checksums they are requesting. In cs3org/reva#1400 I used a checksum metadata key in the Stat and ListContainer to indicate I want to read all checksums. Do we want to be able to specify which algorithm? Using sth like checksum:sha1, checksum:md5, checksum:sha1,md5,adler32 similar to HTTP headers? ure use a format similar to what TUS is using ([algo] [hash])?
InitiateFileUploadRequest needs a way to pass the expected checksum. For both implementing OC 1- style checksums as well as the TUS checksumming extension. In cs3org/reva#1400 I am using an "Upload-Checksum" in the Opaque properties of the request: https://github.com/cs3org/reva/pull/1400/files#diff-198f1004a921b3627f7572a239452974429a7de6e4fa47f445c2ad35d2cd9026R320

ResourcePermissions are missing arbitrary metadata flags

https://cs3org.github.io/cs3apis/#cs3.storage.provider.v1beta1.ResourcePermissions should have:

GetArbitraryMetadata, if false arbitrary metadata will not be listed in stat
SetArbitraryMetadata, also affects UnsetArbitraryMetadata, because if you have Set you can affectively set metadata to ""...

add upload length to InitiateFileUpload

allows the server to allocate files or check available quota.
Needed for tus creation extension, which always expects a Upload-Length.

then replace Opaque data in

We should make this optional to allow the clients to initialize an upload using tus and the creation-defer-length extension. Using 0 to indicate creation-defer-length does not work, because you might want to create an empty file.

Why don't all logged-in Request objects have a `token` field?

I see the WhoAmIRequest lets you specify the token which you received from AuthenticateResponse, but none of the other method Request objects do.

I see revad throws "core access token not found" when you try to do for instance a ListReceivedOCMShared call.
Is there another way to pass the token in a grpc request?

Allow disabling versions on upload and add CreateFileVersion

The CS3APIs implementation should allow disabling the creation of a version when saving a file.
At the same time, it should be possible to manually create a version, so a new method needs to be added.

This will be useful for apps that want to do auto save, which should not overwrite other versions (which might happen if a limited number of versions is allowed).

The storage implementation should be able to handle these cases. EOS already does: https://its.cern.ch/jira/browse/EOS-4503

Add GetHome to CS3 APIS Gateway

Document point releases across bindings

Change Quota response

When asking a gateway for quota, we need to query all available storage providers and return the quota for every path segment, the response does not contain this information

GetProvider is not routable through gateway

Fix documentation generation

Broken after v1beta1 change due to new depths in the tree

Add Deny ACL

Change `datatx.PullTransfer` to `datatx.CreateTransfer`

By design a transfer tool is used for the data transfers. Depending on the tool or configuration transfers can be executed using different methods (streaming, third-party copy push/pull). Currently in our cs3 reference implementation reva we use rclone which supports both streaming and third-party push transfer (cs3org/reva#3491) and we can configure this.

I propose to change PullTransfer method from the datatx module to CreateTransfer.

Extend UserID/GroupID definition with type support

An account management system, like LDAP, can contain different type of accounts with different connotations.

For example, a primary, secondary, service, or guest account.

I suggest to add an free text format field for time being.

`*` in arbitrary _metadata_keys may need clarification

When implementing initial checksum support I stumbled over a difference with the webdav api that may need clarification in eg.:

cs3apis/cs3/storage/provider/v1beta1/provider_api.proto

Lines 337 to 340 in 8a19a7f

 // OPTIONAL. 

 // Arbitrary metadata be included with the resource. 

 // A key with the name '*' means to return all available arbitrary metadata. 

 repeated string arbitrary_metadata_keys = 3;

We currently translate an allprops PROPFIND to a * key, but allprops should only return some default properties. Some live properties might be expensive to compute so they are not calculated in an allprops PROPFIND. See https://tools.ietf.org/html/rfc4918#section-9.1

Authentication methods for password protected link shares.

tl;dr

For ownCloud Web we needed a different way to authenticate downloads of password protected link shares. The current design of the share manager doesn't allow us to implement this alternative authentication method namely pre-signed urls.
We are using the share password hash as the signing key but in the current implementation the share password hash is not accessible outside of the share manager.

To allow this alternative authentication method I propose three options:

Include the password hash in the PublicShare struct
- The simplest solution
- May leak the password hash when the PublicShare struct is included in a response
Add the password hash as a return value of GetPublicShare and GetPublicShareByToken
- Second simplest solution
- Doesn't leak the hash if the PublicShare is included in a response
Implement different authentication strategies which can be used in the share manager
- Complex to implement
- Mixes transport specific auth stuff with the share manager

Long explanation

This http://localhost:8080/index.php/s/2LfFihmw6cD7qxU is an example public link. The last part 2LfFihmw6cD7qxU is the random token identifying the share.
For password protected shares the credentials can be included in the 'Authorization' header using basic auth where 'public' is the username and the share password is the password.

When we want to download a file using the anchor tag <a href="...." we can't supply the Authorization header to the request.
In ownCloud 10 this worked by using cookies but in ownCloud Web we don't have cookies.
Because of that we have decided to use signed URLs. The advantage of signed-URLs is that, just like with unprotected public links, the possession of the URL is proof of authorization. Meaning anyone who has the signed URL can download the resource.

The general idea is this:

The client sends a PROPFIND request to the backend (including the credentials in the auth header)
    The client has to request the `downloadURL` attribute to receive the signed URL
The backend does the normal credential check
The backend generates the PROPFIND response
    When the client requested the `downloadURL` attribute the signed URL is generated and returned as the `downloadURL`
The client can now issue GET requests the `downloadURL`

The detailed signing process looks as follows:

Let's assume we have a shared tree like this:

Share Root
    some-file.txt
    Folder/
        another-file.txt

To create the signature we need:

The resource path relative to the share root: `/Folder/another-file.txt`
A timestamp of the expiration date for the signature. Right now it's 30 minutes from the moment of signing
The share token
The share password (hash)

And then we calculate the HMAC like this:

HMAC_SHA512/256(SHA256(share_password_hash), share_token + resource_path + timestamp)

The HMAC_SHA512/256 just means that we are using SHA512 as the hash function but want to receive a code with a length of 256 bits.
Then to create the signed URL we take the resource path and add the signature and the expiration timestamp as query parameters.

To verify the signature the backend will repeat the process above but this time it will take the timestamp from the requested URL.

If the current time is earlier than the expiration time continue with step 2. otherwise return an authentication failed response
Calculate the HMAC
Compare the signature passed via the query parameter with the calculated one
    If it doesn't match return an authentication failed response
Continue serving the resource

Frequently Asked Questions

Why do we need HMAC instead of just hashes?

Let's assume we would use hashes like this:

Hash(share_token + resource_path) = signature

Then all the information needed to create the signature would be public and the security of our share would be reduced to a public link without password.

Okay then let's include a secret like this:

Hash(share_token + resource_path + secret) = signature

Cool we just reinvented HMAC so let's use that instead. ;)

Can we do the signing on the client side?

If we wanted to do client side signing we would need a shared secret between the backend and the client.
The client knows the share password can't we use that?

The backend shouldn't store the password in plain text but in hashed format. Also the password should be hashed with a random salt.
The client could in fact also hash the password before using it as a HMAC key but the client doesn't have access to the random salt from the backend. Therefore it could never generate the same HMAC.

Okay what about another secret?

We could generate a new secret just for signing but this would require more changes to the API because the client would need to be able to request that secret.
Also we would need to store that in addition to the other share data.

Can we use the share token as a signing key?

Technically yes but the share token is public and therefore anyone with the knowledge of the share token can create the signature and this again reduces the security of the share to a public link without password.

I hope this document clarified the design of signed URLs for password protected shares.

storageprovider should persist share creator

currently, only the share manager knows who created a share. That means that both: the share manager and the storage provider are necessary to reconstruct all share information. The share manager can currently not be rebuild using the storage provider alone.

CI does not start - fix Drone build

The CI for PRs triggers to steps: one on PR and other on push. For some reason one never starts and I can only merge based on my admin status.

generate java stubs

java stubs should also be made available: https://github.com/grpc/grpc-java/tree/master/compiler

Possibility to truly (physically) delete files

When using the usual CS3API calls to delete a file, that file is not physically deleted but moved into a recycle bin. This is extremely problematic when performing health checks: These checks upload about 100kb per run and delete the files afterwards. However, since these files are only moved to some other place, they quickly pile up, resulting in full volumes.

As uploading files is an essential check, we either need (a) an option to truly delete files, (b) a way of clearing the recycle bin of the current user (might already be there), (c) another workaround.

announce tus feature in Providerinfo

Add tus feature and version to https://cs3org.github.io/cs3apis/#cs3.storage.registry.v1beta1.ProviderInfo

To properly handle [crappy proxies] we also need to be able to return the chunk size: tus/tus-resumable-upload-protocol#93

New Service: RolService

Where is the Dockerfile for cs3org/cs3apis?

Where is the Dockerfile for https://hub.docker.com/r/cs3org/cs3apis?

Redirection status code

To perform redirections on operations we need a redirection status code.

To know where to redirect we need a storage resource reference AND a protocol type.

A storage reference can point to other storage providers inside the CS3 cloud and we can connect to them using the CS3APIs but the reference can point to an OCM share or to a remote storage like a FTP server, therefore we need a protocol type so the client know how to fetch the information from the referenced resource, in case of OCM, webdav.

AddGrant and UpdateGrant should be SetGrant

While the usecase of limiting a user to only add grants seems valid, it already comes with the requirement to let him manage the grants he created, which requires UpdateGrant permissions.

Having a single permissions for a SetGrant would be sufficient. If users should be limited to manage their own, all or only the permissions for the groups they are a member of is implementation specific and would require the share owner to be part of the Grant, which it currently is not. Currently, a resource can only be owned by a single User

	// OPTIONAL.
	// Arbitrary metadata be included with the resource.
	// A key with the name '*' means to return all available arbitrary metadata.
	repeated string arbitrary_metadata_keys = 3;