GithubHelp home page GithubHelp logo

connormanning / arbiter Goto Github PK

View Code? Open in Web Editor NEW
17.0 4.0 17.0 1.76 MB

Uniform access to the filesystem, HTTP, S3, GCS, Dropbox, etc.

License: MIT License

CMake 0.54% C++ 86.56% Python 4.02% Makefile 1.48% Shell 6.92% M4 0.38% C 0.09% Batchfile 0.01%

arbiter's Introduction

Arbiter

Arbiter provides simple/fast/thread-safe C++ access to filesystem, HTTP, S3, and Dropbox resources in a uniform way. It is designed to be extendible, so other resource types may also be abstracted.

API sample

Full docs live here. The core API is intended to be as simple as possible.

using namespace arbiter;

Arbiter a;

std::string fsPath, httpPath, s3Path;
std::string fsData, httpData, s3Data;
std::vector<std::string> fsGlob, s3Glob;

// Read and write data.
fsPath = "~/fs.txt";  // Tilde expansion is supported on both Unix and Windows.
a.put(fsPath, "Filesystem contents!");
fsData = a.get(fsPath);

httpPath = "http://some-server.com/http.txt";
a.put(httpPath, "HTTP contents!");
httpData = a.get(httpPath);

// S3 credentials can be inferred from the environment or well-known FS paths.
s3Path = "s3://some-bucket/s3.txt";
a.put(s3Path, "S3 contents!");
s3Data = a.get(s3Path);

// Resolve globbed directory paths.
fsGlob = a.resolve("~/data/*");
s3Glob = a.resolve("s3://some-bucket/some-dir/*");

Some drivers accept (or might require) explicit values for configuration.

Arbiter takes a std::string json argument. Here is an example using nlohmann::json that is also used internally in Arbiter.

#include <arbiter/util/json.hpp>
using namespace arbiter;

json config = {
    { "dropbox", {
        {"token", "My dropbox token"}
    }},
    {"s3", {
        {"region", "ap-southeast-2"},
        {"access", "My access key"},
        {"secret", "My secret key"}
    }}
};

std::string configStr = config.dump();
Arbiter a(configStr);

// Now dropbox and S3 paths are accessible.
auto data = a.get("dropbox://my-file.txt");

Using Arbiter in your project

Installation

Arbiter uses CMake for its build process. To build and install, run from the top level:

mkdir build && cd build
cmake -G "<CMake generator type>" ..    # For example: cmake -G "Unix Makefiles" ..
make
make install

Then simply include the header in your project:

#include <arbiter/arbiter.h>

...and link with the library with -larbiter.

Amalgamation

The amalgamation method lets you integrate Arbiter into your project by adding a single source and a single header to your project. Create the amalgamation by running from the top level:

python amalgamate.py

Then copy dist/arbiter.hpp and dist/arbiter.cpp into your project tree and include them in your build system like any other source files. With this method you'll need to link the Curl dependency into your project manually.

Once the amalgamated files are integrated with your source tree, simply #include "arbiter.hpp" and get to work.

Dependencies

Arbiter depends on Curl, which comes preinstalled on most UNIX-based machines. To manually link (for amalgamated usage) on Unix-based operating systems, link with -lcurl. Arbiter also works on Windows, but you'll have to obtain Curl yourself there.

Arbiter requires C++11.

arbiter's People

Contributors

abellgithub avatar andrensairr avatar connormanning avatar gui2dev avatar harshavardhana avatar hobu avatar mccarthyryanc avatar nicolas-chaulet avatar olsen232 avatar paul-thompson-helix avatar pravinshinde825 avatar rnijveld avatar valgur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

arbiter's Issues

Time calculation is off

Hi,
I was using this code from entwine and realized that the time calculation coming from the asUnix method was wrong. It is 1 hour off due to the difference between summer and winter time (I am in the UK). I managed to nail down where it goes wrong but I am not sure what would be the best way to fix that.

Basically, the Time(std::string) constructor returns -3600 instead of 0 for "1970-01-01T00:00:00Z".
Any suggestion about how to fix that?
Thanks a lot!

explicit return type on lambda expression

PDAL/PDAL#1300 is compiling with an older clang AppleClang 6.0.0.6000057 and getting an error with one of the expressions:

vendor/arbiter/arbiter.cpp:1361:25: error: return type 'basic_string<[3 * ...]>'
must match previous return type 'const basic_string<[3 * ...]>' when
lambda expression has unspecified explicit return type
                        return out + c;
                        ^

Streaming copy interface

Creating a local copy of a remote file, for example, should use a streaming interface. We could probably generically use an interface like that throughout the project to simplify current overloads and provide more flexibility to the API.

Support for Instance Metadata Service Version 2 (IMDSv2) in S3 driver

Currently, the S3 driver only supports accessing role credentials through querying the instance metadata using V1 method

The v2 of IMDS adds a token component that gets added into the headers of subsequent requests.
Many organizations are moving towards v2 after various security incidents.

If you try to run a pipeline on an instance with imdbsv2 enforced , the response (CURL_VERBOSE=1) looks roughly like

> GET /latest/meta-data/iam/security-credentials/HTTP/1.1
Host: 169.254.169.254
Accept:*/*

* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 401 Unauthorized
< Content-Length: 123
< Content-Type: text/html
< Date: ...
< Connection: close
< Server: EC2ws

In the meantime, I believe you should be able to generate the token and manually embed it in a config.json, something like:

{
	"arbiter":
	{"s3": {
		"headers" :
		{"X-aws-ec2-metadata-token" : "ABCDEFGHI"}
		}
	}
}

I don't yet have a fully working example of the above logic, but will continue to tinker and update later.

google cloud storage

now this should theoretically be easy on the grounds that it has a compatibility api so the google one could be identical to the amazon one just with the endpoint updated, in practice code like this makes me suspect that it might not be so simple.

Protocol errors in libcurl

Hi @connormanning,

I'm wondering why I'm getting this kind of errors/warning from libcurl ⬇️ on almost each GET ? Any idea ?

* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 201 Created
< Content-Length: 0
< Content-MD5: 11s/OdJPxbPSF++CSFatuQ==
< Last-Modified: Fri, 13 May 2022 15:06:24 GMT
< ETag: "0x8DA34F225B455A7"
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: 7cd10aa7-301e-0045-69db-663f4d000000
< x-ms-version: 2021-06-08
< x-ms-content-crc64: R0F/zL8cQ7c=
< x-ms-request-server-encrypted: true
< Date: Fri, 13 May 2022 15:06:24 GMT
< 
* Connection #0 to host mynicestorageaccount.blob.core.windows.net left intact
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Protocol "az" not supported or disabled in libcurl
* Closing connection -1
* Found bundle for host: 0x7f3d344c3940 [serially]
* Can not multiplex, even if we wanted to
* Re-using existing connection #0 with host mynicestorageaccount.blob.core.windows.net
* Connected to mynicestorageaccount.blob.core.windows.net (20.150.74.100) port 443 (#0)
> GET /mynicestorageaccount/ept/00000023/ept-data/8-122-11-128.laz?anicesaskey HTTP/1.1
Host: mynicestorageaccount.blob.core.windows.net
Accept: */*
Accept-Encoding: deflate, gzip

add AWS_ENDPOINT_URL environment support

in some cases, the s3 endpoint from the endpoint url goes through really slow pipes at NGA but pointing to the cloudfront url is significantly faster. By setting a custom AWS_ENDPOINT_URL environment variable, we can route the s3 calls to the cloudfront url without affecting the general endpoints and while still using s3 for lists

Window globbing

Windows globbing is not yet working. We need to do a couple of things to make it work:

  • separator on windows needs to be \\
  • we need to use unicode aware functions and wstrings when passing filenames to windows
  • we should link shlwapi.lib to use windows pathname manipulation functions
  • FindFirstFileW method does not return full paths, and they will need to be built up as we recurse through the glob

OneDrive driver CI Loop

Write tests against arbiter onedrive driver in order to keep it up to date. This test requires browser interaction, so might not be doable with pure scripting.

You will need these variables satisfied:
client_id, tenant_id, redirect_uri, base_url, client_secret, grant_type, and scope="offline_access%20files.read.all%20user.read.all".
The query grant_type will be authorization_code in the first step, and in the refresh step will be refresh_token.

This test will need to hit https://login.microsoftonline.com/common/oauth2/v2.0/authorize with the variables above in the query parameters. This will allow a user to authenticate via oauth with microsoft. Once you've logged in you will be redirected to the url ${redirect_uri}?code=xxxxyyyy&state=12345. Take the code from this url and use it in the next step.

Next you will use that code as the query param code and will hit https://login.microsoftonline.com/common/oauth2/v2.0/token. All other params stay the same. This will return a json response with a refresh_token and access_token key. Parse these two into their own variables and continue.

To refresh you will hit https://login.microsoftonline.com/common/oauth2/v2.0/token, but with the addition of access_token, refresh_token, and grant_type=refresh_token. This will provide the same json response, with access_token and refresh_token keys.

If at any point you have access_token and refresh_token, you can use these config for the onedrive driver. The config must be

 "onedrive": {
    "access_token": xxxx, 
    "refresh_token": yyyy,
    "client_id": zzzz,
    "redirect_uri": aaaa,
    "client_secret": bbbb,
    "tenant_id": cccc
  }
}

Arbiter Ctor Does Too Much

There are lots of side-effects when creating an Arbiter object. A simple construction of an arbiter object should do little to nothing. One should be able to find out if arbiter supports various files, for example, without creating a thread pool or attempting to create various drivers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.