connormanning / arbiter Goto Github PK
View Code? Open in Web Editor NEWUniform access to the filesystem, HTTP, S3, GCS, Dropbox, etc.
License: MIT License
Uniform access to the filesystem, HTTP, S3, GCS, Dropbox, etc.
License: MIT License
Hi,
I was using this code from entwine and realized that the time calculation coming from the asUnix
method was wrong. It is 1 hour off due to the difference between summer and winter time (I am in the UK). I managed to nail down where it goes wrong but I am not sure what would be the best way to fix that.
Basically, the Time(std::string)
constructor returns -3600
instead of 0
for "1970-01-01T00:00:00Z".
Any suggestion about how to fix that?
Thanks a lot!
Creating a local copy of a remote file, for example, should use a streaming interface. We could probably generically use an interface like that throughout the project to simplify current overloads and provide more flexibility to the API.
Currently, those functions throw an exception.
Implementing them would allow using Entwine (github.com/connormanning/entwine) on Windows.
Hi @connormanning,
I'm wondering why I'm getting this kind of errors/warning from libcurl ⬇️ on almost each GET
? Any idea ?
* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 201 Created
< Content-Length: 0
< Content-MD5: 11s/OdJPxbPSF++CSFatuQ==
< Last-Modified: Fri, 13 May 2022 15:06:24 GMT
< ETag: "0x8DA34F225B455A7"
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: 7cd10aa7-301e-0045-69db-663f4d000000
< x-ms-version: 2021-06-08
< x-ms-content-crc64: R0F/zL8cQ7c=
< x-ms-request-server-encrypted: true
< Date: Fri, 13 May 2022 15:06:24 GMT
<
* Connection #0 to host mynicestorageaccount.blob.core.windows.net left intact
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Found bundle for host: 0x7f3d344c3940 [serially]
* Can not multiplex, even if we wanted to
* Re-using existing connection #0 with host mynicestorageaccount.blob.core.windows.net
* Connected to mynicestorageaccount.blob.core.windows.net (20.150.74.100) port 443 (#0)
> GET /mynicestorageaccount/ept/00000023/ept-data/8-122-11-128.laz?anicesaskey HTTP/1.1
Host: mynicestorageaccount.blob.core.windows.net
Accept: */*
Accept-Encoding: deflate, gzip
Looking at:
Line 353 in 76fa252
it seems like the backward slashes are supposed to be forward slashes (as per the manual).
I had a potential issue in entwine where it could not successfully read/write or sign a request to S3 - where changing to a different S3 path without spaces resolved it.
I'm planning on investigating it further.
now this should theoretically be easy on the grounds that it has a compatibility api so the google one could be identical to the amazon one just with the endpoint updated, in practice code like this makes me suspect that it might not be so simple.
Write tests against arbiter onedrive driver in order to keep it up to date. This test requires browser interaction, so might not be doable with pure scripting.
You will need these variables satisfied:
client_id
, tenant_id
, redirect_uri
, base_url
, client_secret
, grant_type
, and scope="offline_access%20files.read.all%20user.read.all"
.
The query grant_type
will be authorization_code
in the first step, and in the refresh step will be refresh_token
.
This test will need to hit https://login.microsoftonline.com/common/oauth2/v2.0/authorize
with the variables above in the query parameters. This will allow a user to authenticate via oauth with microsoft. Once you've logged in you will be redirected to the url ${redirect_uri}?code=xxxxyyyy&state=12345
. Take the code from this url and use it in the next step.
Next you will use that code as the query param code
and will hit https://login.microsoftonline.com/common/oauth2/v2.0/token
. All other params stay the same. This will return a json response with a refresh_token
and access_token
key. Parse these two into their own variables and continue.
To refresh you will hit https://login.microsoftonline.com/common/oauth2/v2.0/token
, but with the addition of access_token
, refresh_token
, and grant_type=refresh_token
. This will provide the same json response, with access_token
and refresh_token
keys.
If at any point you have access_token
and refresh_token
, you can use these config for the onedrive driver. The config must be
"onedrive": {
"access_token": xxxx,
"refresh_token": yyyy,
"client_id": zzzz,
"redirect_uri": aaaa,
"client_secret": bbbb,
"tenant_id": cccc
}
}
As described in PDAL/PDAL#1448
There are lots of side-effects when creating an Arbiter object. A simple construction of an arbiter object should do little to nothing. One should be able to find out if arbiter supports various files, for example, without creating a thread pool or attempting to create various drivers.
Windows globbing is not yet working. We need to do a couple of things to make it work:
\\
Currently, the S3
driver only supports accessing role credentials through querying the instance metadata using V1 method
The v2 of IMDS adds a token component that gets added into the headers of subsequent requests.
Many organizations are moving towards v2 after various security incidents.
If you try to run a pipeline on an instance with imdbsv2 enforced , the response (CURL_VERBOSE=1
) looks roughly like
> GET /latest/meta-data/iam/security-credentials/HTTP/1.1
Host: 169.254.169.254
Accept:*/*
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 401 Unauthorized
< Content-Length: 123
< Content-Type: text/html
< Date: ...
< Connection: close
< Server: EC2ws
In the meantime, I believe you should be able to generate the token and manually embed it in a config.json
, something like:
{
"arbiter":
{"s3": {
"headers" :
{"X-aws-ec2-metadata-token" : "ABCDEFGHI"}
}
}
}
I don't yet have a fully working example of the above logic, but will continue to tinker and update later.
PDAL/PDAL#1300 is compiling with an older clang AppleClang 6.0.0.6000057
and getting an error with one of the expressions:
vendor/arbiter/arbiter.cpp:1361:25: error: return type 'basic_string<[3 * ...]>'
must match previous return type 'const basic_string<[3 * ...]>' when
lambda expression has unspecified explicit return type
return out + c;
^
Continuing the discussion of connormanning/entwine#99 (comment) here, since it seems to be an Arbiter issue: Arbiter announces non-existing headers in its Authorization header, which some servers (notably Minio) don't like. Thanks for the investigation @harshavardhana
in some cases, the s3 endpoint from the endpoint url goes through really slow pipes at NGA but pointing to the cloudfront url is significantly faster. By setting a custom AWS_ENDPOINT_URL environment variable, we can route the s3 calls to the cloudfront url without affecting the general endpoints and while still using s3 for lists
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.