GithubHelp home page GithubHelp logo

briantist / galactory Goto Github PK

View Code? Open in Web Editor NEW
32.0 5.0 7.0 226 KB

An Ansible Galaxy proxy for Artifactory

License: GNU General Public License v3.0

Dockerfile 0.57% Python 99.43%
ansible ansible-galaxy ansible-galaxy-collections artifactory hacktoberfest

galactory's Introduction

Test codecov

galactory

galactory is An Ansible Galaxy proxy for Artifactory.

Using an Artifactory Generic repository as its backend, galactory implements a limited subset of the Galaxy API (v2 & v3) to allow for installing and publishing collections.

It can also be set up to transparently proxy an upstream Galaxy server, storing the pulled artifacts in Artifactory, to be served as local artifacts from then on. This helps avoid throttling errors on busy CI systems, and allows for internal/private collections to declare dependencies on upstream collections (dependencies will only be installed from the same Galaxy server where a collection was installed from).

Acknowledgements

This project is heavily inspired by amanda.

Artifactory compatibility

All features of galactory should work with the free-of-cost Artifactory OSS. Please report any usage that appears to require a Pro license.

How to use

There isn't any proper documentation yet. The help output is below.

Pulling out this bit about configuration for emphasis:

Args that start with -- (eg. --listen-addr) can also be set in a config file (/etc/galactory.d/*.conf or ~/.galactory/*.conf or specified via -c). Config file syntax allows:

If an arg is specified in more than one place, then commandline values override environment variables which override config file values which override defaults.

defaults < config < environment variables < command line (last one found wins)

usage: python -m galactory [-h] [-c CONFIG] [--listen-addr LISTEN_ADDR]
                           [--listen-port LISTEN_PORT] [--server-name SERVER_NAME]
                           [--preferred-url-scheme PREFERRED_URL_SCHEME]
                           --artifactory-path ARTIFACTORY_PATH
                           [--artifactory-api-key ARTIFACTORY_API_KEY]
                           [--artifactory-access-token ARTIFACTORY_ACCESS_TOKEN]
                           [--use-galaxy-key] [--use-galaxy-auth]
                           [--galaxy-auth-type {api_key,access_token}] [--prefer-configured-key]
                           [--prefer-configured-auth] [--publish-skip-configured-key]
                           [--publish-skip-configured-auth] [--log-file LOG_FILE]
                           [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--log-headers]
                           [--log-body] [--proxy-upstream PROXY_UPSTREAM]
                           [-npns NO_PROXY_NAMESPACE] [--cache-minutes CACHE_MINUTES]
                           [--cache-read CACHE_READ] [--cache-write CACHE_WRITE]
                           [--use-property-fallback]
                           [--health-check-custom-text HEALTH_CHECK_CUSTOM_TEXT]
                           [--api-version {v2,v3}] [--upload-format {base64,raw,auto}]

galactory is a partial Ansible Galaxy proxy that uploads and downloads collections, using an
Artifactory generic repository as its backend.

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        The path to a config file. [env var: GALACTORY_CONFIG]
  --listen-addr LISTEN_ADDR
                        The IP address to listen on. [env var: GALACTORY_LISTEN_ADDR]
  --listen-port LISTEN_PORT
                        The TCP port to listen on. [env var: GALACTORY_LISTEN_PORT]
  --server-name SERVER_NAME
                        The host name and port of the server, as seen from clients. Used for
                        generating links. [env var: GALACTORY_SERVER_NAME]
  --preferred-url-scheme PREFERRED_URL_SCHEME
                        Sets the preferred scheme to use when constructing URLs. Defaults to
                        the request scheme, but is unaware of reverse proxies.
                        [env var: GALACTORY_PREFERRED_URL_SCHEME]
  --artifactory-path ARTIFACTORY_PATH
                        The URL of the path in Artifactory where collections are stored.
                        [env var: GALACTORY_ARTIFACTORY_PATH]
  --artifactory-api-key ARTIFACTORY_API_KEY
                        If set, is the API key used to access Artifactory. If set with artifactory-access-token, this
                        value will not be used.
                        [env var: GALACTORY_ARTIFACTORY_API_KEY]
  --artifactory-access-token ARTIFACTORY_ACCESS_TOKEN
                        If set, is the Access Token used to access Artifactory. If set with artifactory-api-key, this
                        value will be used and the API key will be ignored.
                        [env var: GALACTORY_ARTIFACTORY_ACCESS_TOKEN]
  --use-galaxy-key      If set, uses the Galaxy token sent in the request as the Artifactory auth. DEPRECATED: This
                        option will be removed in v0.11.0. Please use --use-galaxy-auth going forward.
                        [env var: GALACTORY_USE_GALAXY_KEY]
  --use-galaxy-auth     If set, uses the Galaxy token sent in the request as the Artifactory auth.
                        [env var: GALACTORY_USE_GALAXY_AUTH]
  --galaxy-auth-type {api_key,access_token}
                        Auth received via a Galaxy request should be interpreted as this type of auth.
                        [env var: GALACTORY_GALAXY_AUTH_TYPE]
  --prefer-configured-key
                        If set, prefer the confgured Artifactory auth over the Galaxy token.
                        DEPRECATED: This option will be removed in v0.11.0.
                        Please use --prefer-configured-auth going forward.
                        [env var: GALACTORY_PREFER_CONFIGURED_KEY]
  --prefer-configured-auth
                        If set, prefer the confgured Artifactory auth over the Galaxy token.
                        [env var: GALACTORY_PREFER_CONFIGURED_AUTH]
  --publish-skip-configured-key
                        If set, publish endpoint will not use configured auth, only auth included in a Galaxy
                        request.
                        DEPRECATED: This option will be removed in v0.11.0.
                        Please use --publish-skip-configured-auth going forward.
                        [env var: GALACTORY_PUBLISH_SKIP_CONFIGURED_KEY]
  --publish-skip-configured-auth
                        If set, publish endpoint will not use configured auth, only auth included in a Galaxy
                        request.
                        [env var: GALACTORY_PUBLISH_SKIP_CONFIGURED_AUTH]
  --log-file LOG_FILE   If set, logging will go to this file instead of the console.
                        [env var: GALACTORY_LOG_FILE]
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        The desired logging level. [env var: GALACTORY_LOG_LEVEL]
  --log-headers         Log the headers of every request (DEBUG level only).
                        [env var: GALACTORY_LOG_HEADERS]
  --log-body            Log the body of every request (DEBUG level only).
                        [env var: GALACTORY_LOG_BODY]
  --proxy-upstream PROXY_UPSTREAM
                        If set, then find, pull and cache results from the specified galaxy server
                        in addition to local. [env var: GALACTORY_PROXY_UPSTREAM]
  -npns NO_PROXY_NAMESPACE, --no-proxy-namespace NO_PROXY_NAMESPACE
                        Requests for this namespace should never be proxied. Can be specified
                        multiple times. [env var: GALACTORY_NO_PROXY_NAMESPACE]
  --cache-minutes CACHE_MINUTES
                        The time period that a cache entry should be considered valid.
                        [env var: GALACTORY_CACHE_MINUTES]
  --cache-read CACHE_READ
                        Look for upsteam caches and use their values.
                        [env var: GALACTORY_CACHE_READ]
  --cache-write CACHE_WRITE
                        Populate the upstream cache in Artifactory. Should be false when no auth is
                        provided or the auth has no permission to write.
                        [env var: GALACTORY_CACHE_WRITE]
  --use-property-fallback
                        Set properties of an uploaded collection in a separate request after publshinng.
                        Requires a Pro license of Artifactory. This feature is a workaround for an
                        Artifactory proxy configuration error and may be removed in a future version.
                        [env var: GALACTORY_USE_PROPERTY_FALLBACK]
  --health-check-custom-text HEALTH_CHECK_CUSTOM_TEXT
                        Sets custom_text field for health check endpoint responses.
                        [env var: GALACTORY_HEALTH_CHECK_CUSTOM_TEXT]
  --api-version {v2,v3}
                        The API versions to serve. Can be set to limit functionality to specific versions only.
                        Defaults to all supported versions.
                        [env var: GALACTORY_API_VERSION]
  --upload-format {base64,raw,auto}
                        Galaxy accepts the uploaded collection tarball as either raw bytes or base64 encoded.
                        Ansible 2.9 uploads raw bytes, later versions upload base64. By default galactory will
                        try to auto-detect. Use this option to turn off auto-detection and force a specific format.
                        [env var: GALACTORY_UPLOAD_FORMAT]

Args that start with '--' (eg. --listen-addr) can also be set in a config file
(/etc/galactory.d/*.conf or ~/.galactory/*.conf or specified via -c). Config file syntax allows:
key=value, flag=true, stuff=[a,b,c] (for details, see syntax at https://goo.gl/R74nmi). If an arg
is specified in more than one place, then commandline values override environment variables which
override config file values which override defaults.

Install

python3 -m pip install galactory

Container

Latest tagged release:

docker run --rm ghcr.io/briantist/galactory:latest --help

Latest commit on main:

docker run --rm ghcr.io/briantist/galactory:main --help

galactory's People

Contributors

briantist avatar debben avatar dependabot[bot] avatar jcox10 avatar mamercad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

galactory's Issues

REQUESTS_CA_BUNDLE not working

In our corporate environment, we are behind a proxy that does SSL interceptions (man in the middle), and it has a custom CA Cert that needs to be used. Normally I just import this CA into the system CA bundle, then tell python requests to use it with the REQUESTS_CA_BUNDLE environment variable and everything works fine. However, it looks like there is an issue in the upstream.py with the prepared requests, specifically see this section in the docs for prepared requests:

When you are using the prepared request flow, keep in mind that it does not take into account the environment. This can cause problems if you are using environment variables to change the behaviour of requests. For example: Self-signed SSL certificates specified in REQUESTS_CA_BUNDLE will not be taken into account. As a result an SSL: CERTIFICATE_VERIFY_FAILED is thrown. You can get around this behaviour by explicitly merging the environment settings into your session.

I have tested this out by replacing the two s.send calls in upstream.py with the environment merging as shown in the documentation, and it works fine. Fixing that will enable a lot of us that are stuck behind our terrible corporate proxies.

Single-source the version

See if we can do some version single-sourcing thing in the package so we can have the version in one place, and then also output it in various places (health check, etc.)

Allow for disabling the upstream proxy cache, and/or caching locally

Right now, using an upstream always caches the upstream results in Artifactory. But this turns what could have been a purely read-only usage (available without artifactory authentication if so configured), into a usage that usually needs an API key so that the cache can be written.

This fits with one of the main design goals I had, which was to avoid 429 throttling errors from an upstream Galaxy server when hit from multiple places, but it there are valid uses (like limited command line use on local dev machines) that would benefit from being able to run without an API key, where throttling is not really a concern, but having a single upstream for both local and remote collections is important (dependency resolving for example).

Also of note is that ansible-galaxy caches request responses too, so the local workstation use case needs a cache even less than very ephemeral use cases like CI runs.


So to that end I have two ideas:

  • Make caching upstream responses optional
    • This is easier to implement: a user can set this when they don't have an API key and would otherwise need one to write, and the proxying will still work it will just not be caching
    • sort of ironically, the hardest part of this is probably the downloads themselves, only because the way it's written now, the file is downloaded from the upstream, uploaded to artifactory, and then streamed from artifactory to the client. Shouldn't be hard to workaround, just need to give it a little thought
  • The other idea is to allow caching locally, like in temp files or something. Either instead of or in addition to the Artifactory cache.
    • "Instead of" is probably easier to implement
      • it means more requests to the upstream when usage of galactory scales horizontally (lots of CI jobs, local workstation usage)
      • it's faster for long-running processes (don't need additional requests to artifactory)
    • "In addition" is a little more difficult because I need to determine precedence.
      • local access is faster, but might be more out of date (not sure that's a concern actually...)
      • should precedence be configurable?
      • should we have a mode that defaults to remote cache but allows graceful fallback to local cache when authentication fails? that sounds like a bad idea

The options are not mutually exclusive, so I think being able to disable the cache is a better first step.

Add properties to help denote proxied collections

When a collection download is proxied, it becomes local in artifactory and is then indistinguishable from collections you published directly. That allows for finding and downloading those collections even if the upstream is unavailable.

But in terms of repository management, it means everything is mixed together.

If you only have one or a few namespaces for your locally published collections, like everything is company_name.collection, you can filter that way (anything not in the company_name namespace is proxied, theoretically), but it might be nice to have explicit properties for those.

For example, adding a boolean proxied property with a true or false value.
Maybe adding a source property which if blank, means direct publish, and otherwise means it was proxied from somewhere, and giving the URL of its original location.

These could be very helpful in keeping your artifactory repository clean: you may want to never delete old versions of your internal collections, while regularly culling older versions of proxied ones (if they are needed again, they will be proxied again).

Bad request on collection publish

Hello,

I'm trying to get up and running with galactory. I'm able to get the code running locally. When I try the following test though:

ansible-galaxy collection init debben.helloworld
ansible-galaxy collection build debben/helloworld
ansible-galaxy collection publish -s http://127.0.0.1:8888 debben-helloworld-1.0.0.tar.gz

I get a 400 back from galactory. When I set a breakpoint within galactory source, the failure seems to be coming from here:
https://github.com/briantist/galactory/blob/main/galactory/api/v2/collections.py#L176

Where request.files['file'] is empty. Is there any limitations on the ansible-galaxy version or a better way to inspect how the request is being parsed? I've not had much experience with python+flask+werkzeug but I'm hoping this issue is just some bug resolved by updating a dependency.

Remove reliance on Artifactory archive file download

Right now we're relying on a feature of Artifactory where you can directly download an individual file out of an archive. We use this to read the MANIFEST.json out of the collection to to get the collection_info. This features is only available with a Pro license or higher in Artifactory, so I'd like to avoid it.

I've already started on this path by adding the collection_info contents as a property on the uploaded collection, but am not consuming it yet. Also, we're still using the archive file download to get the collection_info in the first place, to avoid parsing the archive in the code.

So to solve this, we need to start parsing the archive to extract the manifest before we upload it, both for explicit uploads and for upstream implicit uploads.

We can once again borrow from amanda and the parsing being done there to implement this.

This would also get us one step closer to being able to run integration tests with the artifactory-oss container.

Run integration tests against Artifactory

I started implementing the necessary stuff in pytest to launch an artifactory-oss container and get it set up so we can run integration tests against it.

I was hamstrung in a number of ways, since jfrog makes a point to put basic features behind a paywall and require pro or enterprise licensing.

For one thing, we need to first address #5 since that is used in basically every API call.

But we have other issues too: it turns out the APIs to create a repository are also behind paid licenses... so we actually cannot programmatically set up our test environment with a free license. They expect you to manually connect to the web UI and click through to create it.

This is really demoralizing. I am trying to think through ways of creating a test image container or something, with the repository already set up. It's going to suck, because every time we want to change something, at least part of the process will have to be done manually.

I'm also worried that I'll keep uncovering more and more things as we go along that make it difficult or impossible to get a good test environment.

But I feel like I have to try because I don't like the current situation and the lack of tests.

Proxied requests need to control what information they send upstream

For example, if you access galactory in a web browser, and it makes an upstream request to galaxy.ansible.com, the user agent and Accept headers are passed along, resulting in galaxy responding with HTML instead of JSON. Since we need to parse the JSON, this results in a 500 error.

Some errors in requests are not retried

Since the retries we use don't set status_forcelist, it won't retry status code errors, like 500/504/etc., which can lead to failed requests that would have otherwise been ok on retry.

Need to think over which codes make for good defaults, and a good way to allow configuration of codes.

Support pagination

We currently don't support pagination at all, which isn't too much of a problem for content that's already local to artifactory unless you have a huge number of collections, because it will return all the results, but it is a problem for upstream proxying support since we only ever get the first set of results from an upstream. The query string parameters are passed through, so for example if page_size=100 is passed to the original request, we do pass that along, but in older galaxy clients they don't, and the upstream server's default page size is 10. Any results that aren't in the first result, whatever size that is, will not be returned.

I guess realistically, it hasn't come up much since in most cases clients are looking for either the latest version, or a specific version, or a range that probably falls within the first set of results.

But this is a quite a deficiency.

Permission denied

Hi,

there's a great change I'm missing something obvious,
but I can't get galactory to work.
My config file at ~/.galactory/galactory.conf looks like this:
listen-addr=127.0.0.1 listen-port=80 server-name=galactory.lan.example.com:80 artifactory-path=https://artifactory.lan.example.com/artifactory/backsys-galactory/ artifactory-api-key=<myArtifactoryAccessToken> proxy-upstream=https://galaxy.ansible.com cache-write=true log-level=DEBUG log-file=galactory.log

galactory does read the conf file, because the log file I specified was created.
However, all I get is a
[galactory@myserver ~]$ python3 -m galactory -c .galactory/galactory.conf Serving Flask app 'galactory' Debug mode: off Permission denied

Any advice on how to solve this?

Many thanks.

Traceback when publishing `amazon.aws`

When trying to publish amazon.aws:==5.2.0, I'm seeing:

ERROR:galactory:Exception on /api/v2/collections/amazon/aws/ [GET]
Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/galactory/api/v2/collections.py", line 47, in collection
    results = _collection_listing(repository, namespace, collection, scheme=scheme)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/galactory/utilities.py", line 162, in _collection_listing
    collections = collected_collections(repo, namespace, collection)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/galactory/utilities.py", line 143, in collected_collections
    for c in discover_collections(repo, namespace=namespace, name=name, scheme=scheme):
  File "/venv/lib/python3.11/site-packages/galactory/utilities.py", line 107, in discover_collections
    collection_info = json.loads(props['collection_info'][0])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 482 (char 481)

500 error on collections/{namespace}/{collection} when a collection only contains pre-release versions

We usually want to return the latest final version in the latest_verstion key returned by this API, so at the moment we filter out all non-final versions when we build the list, with the thought that eventually a final version will populate the field.

But in a collection where there are only pre-release versions, that field will never be populated. A different part of the code assumes it always will be populated, and crashes.

The question then is what should be in that key in that case.

I think that the logic should be that it contains the latest pre-release. If even a single final version exists, it will take precedence over any pre-release.

That seems to be supported by:

internals: "fast detection" for collection iteration doesn't work for non-stable versions

f_namespace, f_name, f_version = p.name.replace('.tar.gz', '').split('-')

Pretty minor, but the fast detection will never work for a version that contains a hyphen like 1.2.3-alpha because the split will have more than 3 results.

This should be fixable by setting the second argument of split to 3.

This "failure" really just means falling back to "slow" detection that requires some additional requests.

theforeman.foreman collection could not be installed when using Galactory as a proxy.

Hi there again!

We've found another fancy issue in our setup of Galactory as a proxy / inner source for Ansible collections.

Fast forward to root cause: storing collection metadata in the Artifactory file properties was a nice idea, however for some collections, specifically theforeman.foreman, metadata is such huge in size that it hits the Artifactory property value limit and thus is failed to properly deserialize.

Reproducer:

docker run -it --rm -p8080:8080 briantist/galactory:0.11.2 -- --listen-addr 0.0.0.0 --listen-port 8080 --server-name galaxy.local:8080 --artifactory-path https://artifactory.local/artifactory/$REPO/ansible_collections/ --artifactory-access-token $TOKEN  --proxy-upstream https://galaxy.ansible.com --galaxy-auth-type access_token --use-galaxy-auth --prefer-configured-auth --publish-skip-configured-auth --api-version v3
  • ansible.cfg is configured to use galactory as default galaxy server.
[defaults]
collections_path=.collections
[galaxy]
server=http://galaxy.local:8080/
  • Installing some collection works:
$ ansible-galaxy collection install cisco.nxos:==5.2.1

Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading http://galaxy.local/download/cisco-nxos-5.2.1.tar.gz to /Users/m.shonichev/.ansible/tmp/ansible-local-54036f2c1i_ng/tmpi7sor7ba/cisco-nxos-5.2.1-vjgrss63
Installing 'cisco.nxos:5.2.1' to '/home/m.shonichev/.collections/ansible_collections/cisco/nxos'
cisco.nxos:5.2.1 was installed successfully

and some collections don't:

$ ansible-galaxy collection install theforeman.foreman:==3.15.0
Starting galaxy collection install process
Process install dependency map
ERROR! Error when getting available collection versions for theforeman.foreman from cmd_arg (http://galaxy.local/api) (HTTP Code: 500, Message: INTERNAL SERVER ERROR Code: Unknown)

container logs reveal stack trace:

INFO:galactory:{'X-Request-Id': '7dbd5b2cc0c40aeda98dd1f83fad97b9', 'X-Real-Ip': '10.0.0.2', 'X-Forwarded-For': '10.0.0.2', 'X-Forwarded-Host': 'galaxy.local', 'X-Forwarded-Port': '8080', 'X-Forwarded-Proto': 'http', 'X-Forwarded-Scheme': 'http', 'X-Scheme': 'http', 'Accept-Encoding': 'identity', 'User-Agent': 'ansible-galaxy/2.13.6 (Linux; python:3.10.13)', 'Accept': 'application/json, */*'}
Mon, Dec 11 2023 1:59:33 pm
INFO:galactory:Cache miss: http://galaxy.local/api/v3/collections/theforeman/foreman/
Mon, Dec 11 2023 1:59:33 pm
ERROR:galactory:Exception on /api/v3/collections/theforeman/foreman/ [GET]
Mon, Dec 11 2023 1:59:33 pm
Traceback (most recent call last):
Mon, Dec 11 2023 1:59:33 pm
  File "/venv/lib/python3.11/site-packages/flask/app.py", line 2190, in wsgi_app
Mon, Dec 11 2023 1:59:33 pm
    response = self.full_dispatch_request()
Mon, Dec 11 2023 1:59:33 pm
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mon, Dec 11 2023 1:59:33 pm
  File "/venv/lib/python3.11/site-packages/flask/app.py", line 1486, in full_dispatch_request
Mon, Dec 11 2023 1:59:33 pm
    rv = self.handle_user_exception(e)
Mon, Dec 11 2023 1:59:33 pm
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mon, Dec 11 2023 1:59:33 pm
  File "/venv/lib/python3.11/site-packages/flask/app.py", line 1484, in full_dispatch_request
Mon, Dec 11 2023 1:59:33 pm
    rv = self.dispatch_request()
Mon, Dec 11 2023 1:59:33 pm
         ^^^^^^^^^^^^^^^^^^^^^^^
Mon, Dec 11 2023 1:59:33 pm
  File "/venv/lib/python3.11/site-packages/flask/app.py", line 1469, in dispatch_request
Mon, Dec 11 2023 1:59:33 pm
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
Mon, Dec 11 2023 1:59:33 pm
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mon, Dec 11 2023 1:59:33 pm
  File "/venv/lib/python3.11/site-packages/galactory/api/v3/collections.py", line 103, in collection
Mon, Dec 11 2023 1:59:33 pm
    colcol = CollectionCollection.from_collections(discover_collections(repo=repository, namespace=namespace, name=collection))
Mon, Dec 11 2023 1:59:33 pm
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mon, Dec 11 2023 1:59:33 pm
  File "/venv/lib/python3.11/site-packages/galactory/models.py", line 185, in from_collections
Mon, Dec 11 2023 1:59:33 pm
    for collection in collections:
Mon, Dec 11 2023 1:59:33 pm
  File "/venv/lib/python3.11/site-packages/galactory/utilities.py", line 120, in discover_collections
Mon, Dec 11 2023 1:59:33 pm
    coldata = CollectionData.from_artifactory_path(path=p, properties=props, stat=info)
Mon, Dec 11 2023 1:59:33 pm
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mon, Dec 11 2023 1:59:33 pm
  File "/venv/lib/python3.11/site-packages/galactory/models.py", line 34, in from_artifactory_path
Mon, Dec 11 2023 1:59:33 pm
    collection_info = json.loads(properties['collection_info'][0])
Mon, Dec 11 2023 1:59:33 pm
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mon, Dec 11 2023 1:59:33 pm
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
Mon, Dec 11 2023 1:59:33 pm
    return _default_decoder.decode(s)
Mon, Dec 11 2023 1:59:33 pm
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mon, Dec 11 2023 1:59:33 pm
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
Mon, Dec 11 2023 1:59:33 pm
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
Mon, Dec 11 2023 1:59:33 pm
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mon, Dec 11 2023 1:59:33 pm
  File "/usr/local/lib/python3.11/json/decoder.py", line 353, in raw_decode
Mon, Dec 11 2023 1:59:33 pm
    obj, end = self.scan_once(s, idx)
Mon, Dec 11 2023 1:59:33 pm
               ^^^^^^^^^^^^^^^^^^^^^^
Mon, Dec 11 2023 1:59:33 pm
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 2398 (char 2397)

The upstream collection was successfully downloaded from https://galaxy.ansible.com and stored in the Artifactory repository.

However, following are the properties of file (actually a tail of):
Screenshot 2023-12-11 at 14 56 46

Apparently, The Foreman team decides to write down all the seven tribes of relatives (cats included!) as a collection authors list and while Galactory tries to put collection_info into file properties, an 2400 chars limit was hit and JSON was corrupted.

Collection meta:
https://github.com/theforeman/foreman-ansible-modules/blob/a69d83fb8681ca80878d8143d27594eb72ee295d/galaxy.yml#L4

From the Artifactory docs:

Property keys are limited up to 255 characters and property values are limited up to 2,400 characters. Using properties with values over this limit might cause backend issues.

I'm not sure that limit can be raised up easily, at least documentation is obscure about. Also I'm not sure if removing metadata from properties would not break the Galactory codebase to pieces.

So, currently we had to add upstream galaxy as a backup server as a workaround:

[ansible]
collections_path = .collections

[galaxy]
server_list = galactory, upstream
ignore_certs=true

[galaxy_server.galactory]
url=http://galaxy.local:8080


[galaxy_server.upstream]
url=https://galaxy.ansible.com

Allow for configured API key to be used to write cache / proxied collections only -- or, allow for upload endpoint to require request-based key

Right now, you can supply the API key for Artifactory through a parameter/env var/config file, and/or by setting it in ansible-galaxy, where it will be sent along with the request.

However, you can't control which uses to apply the key to.

The scenario I am thinking about is this:

  • A central instance of Galactory
  • You want to set the API key on this running instance, so that it can populate the cache, and so that when proxying upstream, it can write the upstream collections into artifactory (making them local)
  • Doing this however, means that anyone hitting this instance, anonymously, can directly upload collections too

What I think might be a good idea being able to have the configured key be used for cache and proxy writing, but disallow it for direct uploads, instead requiring a key be passed in with the request.

Also the local caching described in #4 is another possible (partial) side-workaround: if the central instance uses local storage for caching instead of artifactory, it does not need to be configured with a key at all, however that would prevent storing proxied collections, so it would only account for caching of API responses.

500 error on any collection that doesn't exist locally

This is a pretty bad bug in the v0.11.0 release I just put out.
If you ask for a collection and there are no versions of that collection locally (n artifactory), then the request will crash.

This affects both v2 and v3 protocols.

Will fix shortly.

JSON error getting ansible.windows collection versions

I'm getting a JSON parse error when galaxy tries to get the ansible.windows collection. I can't figure out the source JSON to see the character number its complaining about. This is the only collection that has an issue, all the other collections download just fine. (We don't use ansible on windows, but one of the roles requires the collection).

ERROR:galactory:Exception on /api/v2/collections/ansible/windows/versions/ [GET]
galactory_1  | Traceback (most recent call last):
galactory_1  |   File "/venv/lib/python3.11/site-packages/flask/app.py", line 2525, in wsgi_app
galactory_1  |     response = self.full_dispatch_request()
galactory_1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
galactory_1  |   File "/venv/lib/python3.11/site-packages/flask/app.py", line 1822, in full_dispatch_request
galactory_1  |     rv = self.handle_user_exception(e)
galactory_1  |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
galactory_1  |   File "/venv/lib/python3.11/site-packages/flask/app.py", line 1820, in full_dispatch_request
galactory_1  |     rv = self.dispatch_request()
galactory_1  |          ^^^^^^^^^^^^^^^^^^^^^^^
galactory_1  |   File "/venv/lib/python3.11/site-packages/flask/app.py", line 1796, in dispatch_request
galactory_1  |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
galactory_1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
galactory_1  |   File "/venv/lib/python3.11/site-packages/galactory/api/v2/collections.py", line 84, in versions
galactory_1  |     collections = collected_collections(repository, namespace=namespace, name=collection, scheme=scheme)
galactory_1  |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
galactory_1  |   File "/venv/lib/python3.11/site-packages/galactory/utilities.py", line 143, in collected_collections
galactory_1  |     for c in discover_collections(repo, namespace=namespace, name=name, scheme=scheme):
galactory_1  |   File "/venv/lib/python3.11/site-packages/galactory/utilities.py", line 107, in discover_collections
galactory_1  |     collection_info = json.loads(props['collection_info'][0])
galactory_1  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
galactory_1  |   File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
galactory_1  |     return _default_decoder.decode(s)
galactory_1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^
galactory_1  |   File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
galactory_1  |     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
galactory_1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
galactory_1  |   File "/usr/local/lib/python3.11/json/decoder.py", line 353, in raw_decode
galactory_1  |     obj, end = self.scan_once(s, idx)
galactory_1  |                ^^^^^^^^^^^^^^^^^^^^^^
galactory_1  | json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 543 (char 542)
galactory_1  | 172.18.0.1 - - [09/Mar/2023:21:59:17 +0000] "GET /api/v2/collections/ansible/windows/versions/?page_size=100 HTTP/1.0" 500 265 "-" "ansible-galaxy/2.13.3 (Linux; python:3.8.10)"

The arm64 container build doesn't run the container

I think I just couldn't figure out how to actually execute the container in the GHA runners, since we need to build using QEMU and docker buildx. There's probably a way I just haven't figured out yet.

Right now, the only "test" we do of the container builds is running them to get --help output, which is basic, but it is something.

For example, it catches the lack of Python 3.12 support, and with the ARM build not running the code there's no catching of that condition:

Add user-contributed examples to the repository

For now thinking mostly about Dockerfiles or other configs for using galactory with various production WSGI servers.

Could also be config patterns and example scripts for other operations, like if someone has a cleanup script for Artifactory.

Open to suggestions on other types of examples that might be helpful.

Galactory no longer support proxying to galaxy.ansible.com out of the box

Hello, Brian!

First of all thank you for the great software package you've delivered!

I try to use Galactory in scenario 'proxy/caching the upstream Ansible Galaxy' as to lower the outgoing bandwidth for the CI.

Recently, Ansible Galaxy NG has gone out of the beta and they bumped their main site to the 'v3' API version.

If we browse directly to https://galaxy.ansible.com/api/ we would see that no 'v2' API is supported now.

Which leads me to the problem, Galactory can no longer proxy requests to the upstream Galaxy, because ansible-galaxy client seems to ultimately using 'v2' version when negotiating with Galactory, no matter which --api-version option value I use.

Scenario:

Galactory galactory:0.11.1 image is running with following options:

--listen-addr 0.0.0.0 --listen-port 80 --server-name galaxy.local --artifactory-path https://artifactory.local:443/artifactory/ansible_collections/ --artifactory-access-token $(ART_TOKEN) --proxy-upstream https://galaxy.ansible.com --galaxy-auth-type access_token --use-galaxy-auth --prefer-configured-auth --publish-skip-configured-auth

client version & test command:

$ ansible --version
ansible [core 2.15.4]
  config file = /tmp/aaa/ansible.cfg
  configured module search path = ['/Users/m.shonichev/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.11/site-packages/ansible
  ansible collection location = /Users/m.shonichev/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.11.4 (main, Jun 20 2023, 16:52:35) [Clang 13.0.0 (clang-1300.0.29.30)] (/usr/local/opt/[email protected]/bin/python3.11)
  jinja version = 3.1.2
  libyaml = True

$ ansible-galaxy collection install --server http://galaxy.local  cisco.nxos

Starting galaxy collection install process
Process install dependency map
ERROR! Failed to resolve the requested dependencies map. Could not satisfy the following requirements:
* cisco.nxos:* (direct request)

Log:

Tue, Oct 10 2023 5:52:46 pm INFO:galactory:Rewriting 'http://galaxy.local/api/v2/collections/cisco/nxos/' to 'https://galaxy.ansible.com/api/v2/collections/cisco/nxos/'
Tue, Oct 10 2023 5:52:46 pm INFO:galactory:https://galaxy.ansible.com/api/v2/collections/cisco/nxos/?page_size=100
Tue, Oct 10 2023 5:52:46 pm INFO:galactory:None
Tue, Oct 10 2023 5:52:46 pm INFO:galactory:{'X-Request-Id': '5cdd2b7897d98e98c3d41f7aa9e73146', 'X-Real-Ip': '10.0.0.1', 'X-Forwarded-For': '10.0.0.1', 'X-Forwarded-Host': 'galaxy.local', 'X-Forwarded-Port': '443', 'X-Forwarded-Proto': 'http', 'X-Forwarded-Scheme': 'http', 'X-Scheme': 'http', 'Accept-Encoding': 'identity', 'User-Agent': 'ansible-galaxy/2.15.4 (Darwin; python:3.11.4)', 'Accept': 'application/json, */*'}
Tue, Oct 10 2023 5:52:46 pm INFO:werkzeug:10.0.0.2- - [10/Oct/2023 14:52:46] "GET /api/v2/collections/cisco/nxos/ HTTP/1.1" 404 -

However, if use galaxy.ansible.com directly, the collection is installed successfully:

$ ansible-galaxy collection install cisco.nxos --server https://galaxy.ansible.com

Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/artifacts/cisco-nxos-5.2.1.tar.gz to /Users/m.shonichev/.ansible/tmp/ansible-local-69069ixz4ij5e/tmp0vgzpo5b/cisco-nxos-5.2.1-rpenpawh
Installing 'cisco.nxos:5.2.1' to '/Users/m.shonichev/.ansible/collections/ansible_collections/cisco/nxos'
Downloading https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/artifacts/ansible-netcommon-5.2.0.tar.gz to /Users/m.shonichev/.ansible/tmp/ansible-local-69069ixz4ij5e/tmp0vgzpo5b/ansible-netcommon-5.2.0-pisa6k3g
cisco.nxos:5.2.1 was installed successfully

Is there any workaround or quick fix for this situation?

Look at using flask-caching

Somewhat related:

https://flask-caching.readthedocs.io/en/latest/

The current caching mechanism for upstream requests (which caches in Artifactory) is custom written. It might be better written as a custom caching backend for flask-caching instead.

By separating out the backend that way, we could also unlock the ability to use any cache backend supported, like local filesystem or whatever.

We can also look to start using this to cache other data (combined listings, etc.).

Provide better error handling for tight permissions in Artifactory

See also:

With only write permissions on a repository, but not update/delete, re-uploading a collection can result in a 404 (if a certain Artifactory setting is enabled), or possibly a 403 otherwise (needs confirmation).

This will cause a stacktrace and an unhelpful response.

We should try to determine when this is the case and return a more helpful error message.

The configuration of being able to upload a collection once but not overwrite it should definitely be supported.

The above would also probably wreak havoc on caching and would likely need caching to be disabled. Possible solutions for that:

  • #4 (this possibility adds another vote for a local storage caching option)
  • allow for specifying cache responses to go to another artifactory repo entirely, that way the permissions can be separate
    • (check whether artifactory supports permissions at a folder level or only repository level, could be used instead of above)
    • a sub-idea of this: allow an entirely separate repository for all upstream content, including the collections it pulls down
      • this could be nice because it totally separates internal collections from proxied content, which at the moment, are indistinguishable programmatically
      • it would allow for say, more aggressive cleanup of proxied collections
      • would not work for users who populate upstream content separately, unless we have some way to indicate multiple source artifactory repositories
      • implementation could get complicated

Direct collection URL (`href` field) is wrong

In most (all) of the API responses, the href field for a collection is pointing back at the URL you just hit. This is only accurate on the collections/<namespace>/<collection>/ endpoint, because that's where that field should be pointing in the other responses.

This field isn't used by the galaxy client, so it did not impact those operations.

Add healthcheck endpoints

It would be nice to have healthcheck endpoints for use with monitoring and orchestration.
Thinking of adding a few:

  • basic (to know that the server is working and serving requests)
  • artifactory read (a read against artifactory is successful)
  • artifactory write (a write against artifactory is successful)
  • upstream?? (not sure about this one)

Collection publishing doesn't work if the tarball is not base64 encoded

The publish endpoint assumes that the uploaded file was base64 encoded.

As ansible-galaxy clients go, 2.10 and higher do base64 encode the tarball, but 2.9 does not, it sends the raw bytes.

This causes a traceback:

Traceback (most recent call last):
  File "/home/briantist/code/galactory/.venv/lib/python3.8/site-packages/flask/app.py", line 2190, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/briantist/code/galactory/.venv/lib/python3.8/site-packages/flask/app.py", line 1486, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/briantist/code/galactory/.venv/lib/python3.8/site-packages/flask/app.py", line 1484, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/briantist/code/galactory/.venv/lib/python3.8/site-packages/flask/app.py", line 1469, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/briantist/code/galactory/galactory/api/v3/collections.py", line 338, in publish
    with _chunk_to_temp(Base64IO(file)) as tmp:
  File "/home/briantist/code/galactory/galactory/utilities.py", line 177, in _chunk_to_temp
    for chunk in it:
  File "/home/briantist/code/galactory/galactory/utilities.py", line 168, in <lambda>
    it = iter(lambda: fsrc.read(chunk_size), b'') if iterator is None else iterator(chunk_size)
  File "/home/briantist/code/galactory/.venv/lib/python3.8/site-packages/base64io/__init__.py", line 298, in read
    results.write(base64.b64decode(data))
  File "/usr/lib/python3.8/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

I don't have a particular interest in supporting a client as old as 2.9, especially since you can still use a newer version to publish your collections even if you are still using 2.9 for some reason.

However, both the v2 and v3 galaxy protocols seem to accept either form, even if that's not documented, so it seems like galactory should support it too.

Another reason to support this is for easier usage with other clients, like curl. Even the docs for the new v3 galaxy show a curl example exclusively, without base64 encoding: https://ansible.readthedocs.io/projects/galaxy-ng/en/latest/community/api_v3/#upload-a-collection

Galaxy v2 - upstream response href fields are not rewritten

Hitting an endpoint like /api/v2/collections/community/general/versions/ or /api/v2/collections/community/general/versions/4.8.6/ or something, shows the upstream URL in the href field, not the galactory URL.

Luckily this field is not used in ansible-galaxy operations but it should probably be fixed. I probably introduced this in #104 either as a side-effect or while fixing #103.

v3 is not affected simply by way of it not returning absolute URLs, so the relative URL fragments it returns are accidentally correct.

Forcing page counts onto upstream URLs broke v2 proxying partially

In #104 I had all upstream requests add on query string parameters to set the page size to 100, which helps reduce number of roundtrips with old clients, and slightly mitigates #99.

The page size parameters are different between v2 and v3, page_size and limit respectively and I lazily just added both to all URLs.

But it turns out that old galaxy throws a 400 error on the parameter it doesn't recognize, so that's causing failures on some paginated requests.

Implement collection deletion

The v3 API has an endpoint for deleting collections: https://ansible.readthedocs.io/projects/galaxy-ng/en/latest/community/api_v3/#delete-a-collection-or-a-specific-version

We could implement this and support deleting.

I'm a little on the fence about it. I do not want to make galactory a full-fledged collection management system. We should rely on Artifactory for that.

Although deletion would necessarily rely on Artifactory's permissions and your existing auth, I could see folks wanting to disable deletion or treat this endpoint differently (like we do with publishing). I suppose we could do the exact same thing as we do for that: an option where the configured auth isn't used and you must pass in auth with the request.

For now, I consider this very low priority, since I'm primarily concerned with supporting calls used by the ansible-galaxy client for installing/downloading, and publishing.

Add retries to Artifactory calls

Have been starting to see cases where a sudden flood of connections (like in a CI with a big parallel matrix) causes calls to artifactory to fail, ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')).

One way we could do this is with urllib3's Retry class, by ensuring we create a Session ourselves and passing it to ArtifactoryPath: https://github.com/devopshq/artifactory#session

We might need to be careful with this to ensure that everywhere we instantiate such an object we also use a proper session.

To that end, we could maybe subclass ArtifactoryPath with like GalactoryPath or whatever, that handles retryability.


Another option is to use a library like backoff to wrap calls we make. But I think that might not be the best option.

Unit tests for Python >= 3.10 don't actually test much of anything

In #35 I got the tests working for Python 3.10 and 3.11.

This introduced some branching code in the definition of a MockArtifactoryPath class due to changes in pathlib (the underlying base class):

if sys.version_info.major == 3 and sys.version_info.minor >= 10:
_accessor = _artifactory_accessor
__new__ = _new__new__
_make_child = _new_make_child
_make_child_relpath = _new_make_child_relpath
else:
# in 3.9 and below Pathlib limits what members can be present in 'Path' class
__slots__ = ("auth", "verify", "cert", "session", "timeout", "_galactory_mocked_path")

Although I got the tests to "pass", it turns out that it's not working properly.

I haven't been able to completely figure out why yet, but it seems like is_dir() is always returning True and that means in our tests we end up testing nothing

if repo.is_dir() and not repo.name.endswith('.tar.gz'):

Galactory still works fine with Python 3.10 and 3.11 and I expect that will continue, but means there's even less testing than I thought ๐Ÿ˜ž

I've still not made enough headway on #21 , I really need to step up testing in this project.

Issues with setting artifact properties

Because we previously relied on the collection being uploaded first before we gathered the collection info, we always updated the properties after uploading. It turns out this requires a pro license: https://www.jfrog.com/confluence/display/JFROG/Artifactory+REST+API#ArtifactoryRESTAPI-SetItemProperties

In #5 , since we removed that requirement, I attempted to use matrix parameters to set the artifact properties on upload, in a single request. Not having integration tests working yet, I usually test this against an older test version of Artifactory I have around that's still licensed, and this failed for the collection_info property which is JSON (Artifactory returned a 400 Bad Request error).

Testing revealed that this was due to quote characters (" and '), and no manner of escaping seemed to help, so I kept the post-upload property setting.

As I work on implementing integration tests with Artifactory OSS, I discovered the Pro license requirement, but I also discovered that the matrix property method has no issues with the JSON property or quote characters in general ๐Ÿคฏ

I cannot find any documented acknowledgement of the problem in the 6.x series, nor a documented fix in the 7.x series; further testing has confirmed that even with OSS container 6.x (same version as my test environment) the issue is not present.

Further digging reveals that the issue likely related to some kind of reverse proxy arrangement in my environment, but I've seen some evidence that this might be due to the "default" reverse proxy configurations generated by Artifactory itself, so I am likely not the only person to be affected, and the most likely environments to be affected are in production (where it's more likely that reverse proxies are used). Some more info:

"When specifying '_scheme', '_external' must be True."

When using galactory with preferred-url-scheme option, I was consistently getting the following exception after upgrading to 0.11.1:

raise ValueError("When specifying '_scheme', '_external' must be True.")

coming from invocations of url_for in galactory/api/v3/collections.py. I noticed in v3 these are hardcoded _external=False but in the v2 implementation of collections.py all of the _external parameters of url_for are set to True. Setting these to True in a local development environment fixed the behavior. I making this issue to bring awareness and make sure I'm not just misconfiguring galactory when using v3.

Provide more proxying control/filtering

For organizations who want to more tightly control which collections are allowed from upstream, it would be great to have blocklist/allowlist support.

Unsure of exactly implementation yet, I could see wanting to limit to whole namespaces, or to full collection names, something in between, etc.

At the moment, orgs who want to do this can achieve it somewhat with a separate process that pulls collections from upstream, and uploads them through galactory (or otherwise uploads them to artifactory with the properties galactory needs); building it into the proxying support should simplify things for those doing it this way.

Allow for skipping logging of requests that meet some criteria

If you have periodic health checks hitting galactory but you still want to log requests, it can be quite annoying that the log is filled with health check requests, masking actual usages.

For example in an AWS environment with an ALB, the log will be filled with requests from ELB-HealthChecker/2.0 (user agent).

I'm thinking it would be nice to exclude certain UAs from logging, or maybe do it by endpoint (can exclude /health/* for example).

Proto not set when running behind reverse proxy

In our environment, we run most everything behind a NGINX reverse proxy that handles all the SSL termination (we are required to have everything HTTPS). Most everything works properly, except when the flask.url_for generates an external URL, it defaults to standard HTTP. I found this ProxyFix page that has a solution and I tested it with setting app.wsgi_app = ProxyFix(app.wsgi_app, x_proto=1) in the main __init__.py.

However I'm not sure if that is the best option, or if it would be better to add a arg similar to the SERVER_NAME that would be SERVER_PROTO as seen from the client? Then pass that in when generating the url_for?

Consider an option that disables the root response

If you hit the "root" of the galactory server, you get a small text response:

Galactory is running

This could be used as a health check of sorts, but we now have a dedicated health endpoint:

When ansible-galaxy is configured to look at the root of galactory, it tries the root, (presumably ignores the response? or checks that it's a 200?) and then tries again to hit /api.

That roundtrip can be avoided by configuring ansible-galaxy to point at the /api endpoint to begin with.

The impact is pretty small; a single extra hit, and since by default ansible-galaxy itself caches galaxy responses, it doesn't do it again for a while. But it will do it for every new client that hits, and when caches expire, so if your workloads are heavily distributed and/or ephemeral (like CI runs), it's just a lot of unnecessary calls.

The proposed option would return a 404 and hopefully that would be enough to tell the galaxy client to stop and not try /api. The purpose of this intentional breakage would be so that you have a reason to update your config to add /api to your galaxy server path, otherwise it's easy to miss since it will still work without it.

This is a pretty minor thing so it's low priority to implement, but it would also be low effort. It would be useful in production, but probably annoying in dev/testing scenarios, which is why I probably wouldn't enable it by default...

Consider converting codebase to async

Probably a big project but not sure.

Might be a good time to look at converting from Flask to FastAPI.
We could also look more seriously at using antsibull-core as an upstream galaxy client.

Since we rely so much on dohq_artifactory though, it may end up not being that useful. because that library is not async, and our slowest calls are probably all in that library.

We could implement our own artifactory client instead. I don't really want to do that, but the APIs we need are quite limited. Not needing to follow pathlib semantics would also be nice, much easier to do testing, gains in efficiency...

Not really planning to do this any time soon but recording some thoughts.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.