GithubHelp home page GithubHelp logo

colin-b / httpx_auth Goto Github PK

View Code? Open in Web Editor NEW
105.0 105.0 25.0 534 KB

Authentication classes to be used with httpx

License: MIT License

Python 100.00%
active-directory api-key auth aws azure hacktoberfest httpx oauth2 okta python

httpx_auth's People

Contributors

blag avatar colin-b avatar k900 avatar kianmeng avatar martinka avatar miikka avatar nymous avatar rafalkrupinski avatar sweeneytr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

httpx_auth's Issues

Would it be possible to add support for AWS?

I am looking at using httpx and I like what you have done to add support for authentication to it here. I was wondering if you would be open to me working on and submitting a PR to add aws4auth?

Creating AWS4Auth instances with STS tokens leaks memory, slows down over time

If you continuously create AWS4Auth instances in with the security_token argument set, it will slowly leak memory and make request signing slower.

Our production service creates a new AWS4Auth instance for every request to AWS S3 (possibly we should just re-use them) and we noticed that after tens of thousands of requests, the requests were getting slower and slower. Restarting the service makes them fast again. Looks like the code below is causing the issue:

# TODO Check if we really need to be able to override this default ?
if self.security_token:
# TODO Avoid modifying shared variable
self.default_include_headers.append("x-amz-security-token")

Every time you create a new AWS4Auth instance, one more copy of x-amz-security-token gets appended to default_include_headers. Here's a Python REPL example demonstrating the problem:

>>> from httpx_auth.aws import AWS4Auth
>>> AWS4Auth("test", "test", "us-east-1", "s3", security_token="token").default_include_headers
['host', 'content-type', 'date', 'x-amz-*', 'x-amz-security-token']
>>> AWS4Auth("test", "test", "us-east-1", "s3", security_token="token").default_include_headers
['host', 'content-type', 'date', 'x-amz-*', 'x-amz-security-token', 'x-amz-security-token']
>>> AWS4Auth("test", "test", "us-east-1", "s3", security_token="token").default_include_headers
['host', 'content-type', 'date', 'x-amz-*', 'x-amz-security-token', 'x-amz-security-token', 'x-amz-security-token']

JSONDecodeError due to Improper Handling of Nested JSON Strings in JWT Payloads

Description

There is an issue in the httpx-auth library where the decoding of base64-encoded JSON within JWT tokens corrupts JSON strings that contain nested JSON. This happens because the double quotes inside the nested JSON string are not correctly handled during the decoding process, leading to a failure when attempting to load the string back into a JSON object.

Steps to Reproduce

The issue can be reproduced with the following test case:

import jwt
import json
from httpx_auth._oauth2.tokens import decode_base64

def test_decode_base64_with_nested_json_string():
    # Encode a JSON inside the JWT
    dummy_token = jwt.encode({"data": json.dumps({"something": ["else"]})}, key="")
    header, body, signature = dummy_token.split(".")
    
    # Decode the body
    decoded_bytes = decode_base64(body)
    
    # Attempt to load JSON
    result = json.loads(decoded_bytes)
    assert result == {"data": '{"something": ["else"]}'}

Running this test results in a json.decoder.JSONDecodeError due to incorrect handling of the nested JSON string.

Expected Behavior

The decoded JSON string should be handled correctly, allowing for proper loading into a Python dictionary without JSON parsing errors.

Actual Behavior

The test raises the following error due to malformed JSON:

json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 12 (char 11)

This error is caused by the way double quotes inside the nested JSON are handled, which corrupts the JSON string during the base64 decoding step.

Environment

Python Version: 3.10.11
httpx-auth version: 0.22.0 (2024-03-02)

Additional Context

This issue impacts scenarios where JWT tokens contain nested JSON strings as part of their payload. A fix would likely involve adjusting the base64 decoding function to correctly handle nested JSON strings without corrupting them.

Question: Why define headers to sign

From reading the code of botocore it seems that they sign all headers except those in a blacklist:

SIGNED_HEADERS_BLACKLIST = [
    'expect',
    'user-agent',
    'x-amzn-trace-id',
]

httpx_auth on the other hand works with a include list approach.

Why?

Basic auth should not be enforced for resource owner password flow

I do not believe that the token request for resource owner password flow requires that the server accept basic auth. See the spec: https://datatracker.ietf.org/doc/html/rfc6749#section-4.3

The spec does, however, require that the server support basic auth for the client credentials grant: https://datatracker.ietf.org/doc/html/rfc6749#section-2.3.1

I could not authenticate with the Drupal Simple OAuth server implementation with Resource Owner Password creds until I removed the basic auth in OAuth2ResourceOwnerPasswordCredentials::_configure_client:

client.auth = (self.username, self.password)

I think we could remove this altogether. If a server does require/support basic auth for this then a httpx client can be provided to OAuth2ResourceOwnerPasswordCredentials with this configured. As it is currently implemented basic auth will always be used. If this approach makes sense I'm happy to provide a PR.

AWS4Auth produces incomplete canonical query string

Hi!

I'm experiencing a problem that AWS4 authentication does not seem to work correctly when communicating with the Ceph RADOS Gateway Admin Operations API (https://docs.ceph.com/en/latest/radosgw/adminops/#admin-operations) which uses AWS Signature Version 4 to authenticate. Given the following snippets:

import requests
from requests_aws4auth import AWS4Auth

auth = AWS4Auth("access_key", "secret",
                "default", "s3")

with requests.Session() as client:
    response = client.put("https://ceph-host/admin/user", auth=auth,
                          params={"format": "json", "uid": "testtenant$testuser",
                                  "display-name": "Some Test Display name."})
import httpx
from httpx_auth import AWS4Auth

auth = AWS4Auth(access_id="access_key", secret_key="secret",
                region="default", service="s3")

with httpx.Client() as client:
    response = client.put("https://ceph-host/admin/user", auth=auth,
                                 params={"format": "json", "uid": "testtenant$testuser",
                                         "display-name": "Some Test Display name."})

The first one works like a charm, but the second one using httpx and httpx_auth does not (RADOS Gateway responds with a HTTP status code of 403 an a "SignatureDoesNotMatch" error).

(I chose requests_aws4auth as a counter example here, because it's getting used internally by one of the python libraries the Ceph Admin Ops documentation suggests using: https://github.com/UMIACS/rgwadmin)

It seems that the reason behind this behaviour is that the implementation of httpx_auth does not expect query parameters with arbitrary spaces in them:

    def _amz_cano_querystring(qs: str) -> str:
        """
        Parse and format querystring as per AWS4 auth requirements.
        Perform percent quoting as needed.
        qs -- querystring
        """
        safe_qs_amz_chars = "&=+"
        safe_qs_unresvd = "-_.~"
        qs = unquote(qs)   # 'Some%20Test%20Display%20name.' gets unquoted here, so split() produces an incorrect result
        space = " "
        qs = qs.split(space)[0]
        qs = quote(qs, safe=safe_qs_amz_chars)
        qs_items = {}
        for name, vals in parse_qs(qs, keep_blank_values=True).items():
            name = quote(name, safe=safe_qs_unresvd)
            vals = [quote(val, safe=safe_qs_unresvd) for val in vals]
            qs_items[name] = vals
        qs_strings = []
        for name, vals in qs_items.items():
            for val in vals:
                qs_strings.append("=".join([name, val]))
        qs = "&".join(sorted(qs_strings))
        return qs

I'm not very familiar with the AWS Signature Spec, so I can't tell whether this is intended behaviour or not.

If it is, maybe the function should raise something if it detects more than one space in the query string. If it isn't, and it should work with other object storage provider APIs such as Ceph, maybe one could change the implementation to act a bit more forgiving as in requests_aws4auth.

Indicate that the project is typed

Hello!

Thank you for this project, it greatly simplifies my httpx calls against OAuth protected APIs, without having to deal with secrets exchange and response parsing and everything ^^

While running mypy I saw Skipping analyzing "httpx_auth": found module but no type hints or library stubs. While looking at the code I think everything is typed, so you could indicate it to type checkers to improve everyone's code safety!
It should be as simple as adding an empty py.typed file, like in Pydantic or in FastAPI.

You can also add the PyPI classifier Typing :: Typed (see in FastAPI).

I can open a PR with this change if you want! ๐Ÿ˜‰

httpx_auth above v0.20.0 Results in a 403 Forbidden error on AWS API Gateway `execute-api` service with AWS Signature Version 4 auth

I am having an issue with myPython client making requests to an AWS API Gateway endpoint through HTTPX with HTTPX_AUTH AWS Signature Version 4. While version 0.19.0 of httpx_auth works correctly, any version above that results in a 403 Forbidden error.

The error message indicates that the AWS signature you are providing does not match what AWS is expecting. The message also helpfully provides the canonical string and string to sign that AWS generated based on your request.

The changelog for httpx_auth shows that between version 0.19.0 and 0.20.0 there was a significant overhaul of the AWS4Auth implementation to adhere more closely to the AWS documentation. This change may be the cause of the incompatibility you are experiencing.

HTTPx version: v0.27.0, HTTPx_AUTH version: ^v0.20.0 - Results in a 403 Forbidden error
HTTPx version: v0.26.0, HTTPx_AUTH version: v0.19.0 - Works just fine

Snippet of the error 2024-04-12 16:51:02.768 | ERROR | Client error '403 Forbidden' for url 'https://xxxx' Response: {'message': "The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.\n\nThe Canonical String for this request should have been\n'GET\n/xxx\n\nhost:xxx\nx-amz-content-sha256:xxxxx\nx-amz-date:20240412T155102Z\nx-amz-security-token:xxxx\n\nhost;x-amz-content-sha256;x-amz-date;x-amz-security-token\nexxxxx'\n\nThe String-to-Sign should have been\n'AWS4-HMAC-SHA256\nxxxxxxxxx/<region>/execute-api/aws4_request\n262xxx'\n"} For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403

I've tried including headers when making requests with httpx_auth v0.22.0 resulting into a similar error. I am calling the service execute-api on AWS API Gateway.

Feature request: Google Cloud Auth

Sample code below provided under MIT and Apache 2.0 licenses. This runs and works on your computer if you install the gcloud cli and run gcloud auth login

import asyncio
import threading

import google.auth.transport.requests
import google.auth
import httpx


class GoogleAuth(httpx.Auth):
    """Adds required authorization for requests to Google Cloud Platform.

    This gets the default credentials for the running user, and uses them to
    generate valid tokens to attach to requests.
    """

    def __init__(self, scopes=("https://www.googleapis.com/auth/cloud-platform",)):
        self._sync_lock = threading.RLock()
        self._async_lock = asyncio.Lock()
        self.scopes = scopes
        self.creds = None

    def _refresh_creds(self):
        # Must only be called with a lock.
        if self.creds is None:
            self.creds, _ = google.auth.default(scopes=self.scopes)
        auth_req = google.auth.transport.requests.Request()
        self.creds.refresh(auth_req)

    def sync_auth_flow(self, request: httpx.Request):
        if self.creds is None or self.creds.expired:
            with self._sync_lock:
                self._refresh_creds()
        request.headers["Authorization"] = "Bearer " + self.creds.token
        yield request

    async def async_auth_flow(self, request: httpx.Request):
        if self.creds is None or self.creds.expired:
            async with self._async_lock:
                await asyncio.to_thread(self._refresh_creds)
        request.headers["Authorization"] = "Bearer " + self.creds.token
        yield request


client = httpx.Client(auth=GoogleAuth())
response = client.get("https://cloudresourcemanager.googleapis.com/v1/projects")
print(response.json())

[BUG] Token renewal is not working

Hi!

I have a daemon application that needs to call an API endpoint every X seconds and whenever the token expires the application stop with the following error:

File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 787, in request
    return self.send(
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 878, in send
    response = self._send_handling_auth(
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 905, in _send_handling_auth
    request = next(auth_flow)
  File "/usr/local/lib/python3.8/site-packages/httpx/_auth.py", line 67, in sync_auth_flow
    request = next(flow)
  File "/usr/local/lib/python3.8/site-packages/httpx_auth/authentication.py", line 284, in auth_flow
    token = OAuth2.token_cache.get_token(
  File "/usr/local/lib/python3.8/site-packages/httpx_auth/oauth2_tokens.py", line 132, in get_token
    new_token = on_missing_token(**on_missing_token_kwargs)
  File "/usr/local/lib/python3.8/site-packages/httpx_auth/authentication.py", line 294, in request_new_token
    token, expires_in = request_new_grant_with_post(
  File "/usr/local/lib/python3.8/site-packages/httpx_auth/authentication.py", line 65, in request_new_grant_with_post
    with client:
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 1239, in __enter__
    raise RuntimeError(msg)
RuntimeError: Cannot reopen a client instance, once it has been closed.

Looking at the implementation of your client it seems you are using a context manager for managing it. This works correctly only once, because if you use the same httpx client when it enter this function again, the context manager has already closed the connection and that's why the error occurs.

To reproduce the issue just create an httpx client without a context manager and wait till the renewal needs to be done.

implicit doesn't support scope

Other OAuth2 flows accept scope and if it's a list, convert it string. In case of Implicit, scope is passed directly to request, and if it's a list, it gets encoded with brackets.

thoughts on async implementation

Thank you @Colin-b and all contributors for this awesome package!

Just wanted to share some thoughts on possibility of async OAuth2 flow. I don't know all the corner cases this package handles, so please let me know whenever I write something naive.

  1. OAuth2 flows use a global cache with a (threading) lock. It's a problem for async code (#48 (comment)), but also doesn't seem necessary. The token and lock could be simply instance variables. The only problem that that I can think of, is when user creates multiple instances with the same auth server and key. Is there a valid case for such usage?

  2. Actually the cache holds two locks, one for accessing cache, the other for refreshing the lock. Again this seem unnecessary:

    • when token is valid, the first concurrent call can acquire lock, check expiry and release it, while others wait for a short time.
    • when token is expired, the first concurrent call can acquire the lock, check expiry and refresh the token, while others have to wait until it finishes
  3. The OAuth2 flows use locks within .auth_flow() method, against the Auth documentation.

    If the authentication scheme does I/O such as disk access or network calls, or uses
    synchronization primitives such as locks, you should override .sync_auth_flow()
    and/or .async_auth_flow() instead of .auth_flow() to provide specialized
    implementations that will be used by Client and AsyncClient respectively.

    This is addressed in #48 patch.
    Having two locks makes sense when storage and refresh were split. In this case acquiring the refresh lock could be pulled to .a/sync_auth_flow().

  4. .auth_flow(), via .request_new_token() uses own instance of httpx.Client. Again, it doesn't seem necessary. Httpx already provides a/sync-portable protocol for making HTTP requests from Auth instances: response = yield request.

    If the auth server needs different transport options than the target server, mounts can be used. And if authentication is needed by the authentication server (meta-auth, client_auth param), the yielded requests need to be processed by meta-auth first.

  5. Since all OAuth2 implementations in this package follow the pattern:

    1. try getting cached token
    2. if expired, fallback to self.request_new_grant()
    3. set header

    steps 1+2 run with a lock, and 3 is very lightweight, the entire flow can run with a single lock.

When the above are addressed, supporting both sync and async should be as simple as implementing this class:

class LockingRefreshingAuth(Auth):
    def __init__(self):
        self._sync_lock = threading.Lock()
        self._async_lock = anyio.Lock()

    def sync_auth_flow(self, request: Request) -> Generator[Request, Response, None]:
        if self.requires_request_body:
            request.read()

        with self._sync_lock:
            # apply the loop over `self.auth_flow()`
            flow = self.auth_flow(request)
            ...

    async def async_auth_flow(self, request: Request) -> AsyncGenerator[Request, Response]:
        # like above, but async

Implementations would still yield their auth requests instead of directly using Client, and yield the user request before returning.


I've forked the repo to work on a POC. If it makes sense, and it's welcome, I'll make a PR. Meanwhile, comments are more than welcome.


edit 1: I've just noticed that auth token refresh requests also need to have authentication headers, and I guess that's why a client is used. But I still think it's unnecessary, and simply another Auth instance can be used

edit 2: It makes sense to have two locks if one lock protects a single Auth instance and the other the global cache.

AWS auth fails for temporary cedentials

I tested with the latest versions and the aws auth is failing for temporary credentials ( those that require a session token). The version from the original PR I pushed still works so its not a change in AWS. I will take a look soon ( next couple of days) to see if I can figure out what broke it.

Authentication for Java Server Pages or Spring Basic Login

Hello Team,
i find your library very helpfull but often i have to automate some scripts also for Java Server Pages or by Spring Basic Login.
But there have a separate way:
an action attribute like action="j_security_checkl"
the entry field for username like name="j_username" and field for password like name="j_password"

i often use the mechanize package but it is better i think to implement it into your package, because i normaly use httpx library

Best regards

Expiry problem, sporadic 401s

The current code seems to use tokens exactly up to their expiry date (see

def is_expired(expiry: float) -> bool:
). There can be two problems:

  1. Let's say the token expires on 11:59:59. On the client machine sending the request it is 11:59:58, so the token is still valid. The request is sent to the server. This takes 2 seconds due to a slow connection. The request reaches the server at 12:00:00 which will respond with a 401 Unauthenticated, because the token is expired.
  2. For whatever reason the issued token was revoked or is invalid now. In this case you will also get a 401 Unauthenticated.

There could be two possible solutions:

  1. Don't cache the token exactly until expiry is reached, but expire it a little earlier. E.g. If the token says to expire in 60 seconds, let it expire at 40 seconds and get a new one already. This leaves 20 seconds of leeway. The downside with this approach is that 20 seconds is just an arbitrary value and could still not be sufficient to prevent the problem. It will also not solve issue 2).
  2. Whenever a 401 is returned, acquire a new token and retry the request. This is also hinted at in the official httpx docs (c.f. https://www.python-httpx.org/advanced/#customizing-authentication ):
class MyCustomAuth(httpx.Auth):
    def __init__(self, token):
        self.token = token

    def auth_flow(self, request):
      response = yield request
      if response.status_code == 401:
          # If the server issues a 401 response then resend the request,
          # with a custom `X-Authentication` header.
          request.headers['X-Authentication'] = self.token
          yield request

Of course there should be some kind of limit to the number of retries, so you don't get stuck indefinitely.

Unfortunately I don't have the time to prepare a proper PR, so I am using this as a workaround now (implementation of solution 1):

class _TokenCache(TokenMemoryCache):
    premature_expiration_seconds = 30

    def _add_token(self, key: str, token: str, expiry: float):
        """ If we use tokens exactly up to their expiry date, we will run into problems and race conditions,
            so we make the token expire a little earlier in the hope to prevent this problem, see
            https://github.com/Colin-b/httpx_auth/issues/23
        """
        return super()._add_token(key, token, expiry - self.premature_expiration_seconds)


OAuth2.token_cache = _TokenCache()

Relax or update httpx dependency version.

If you want to install httpx_auth with the current version of httpx (0.27) with poetry you get this error:

Using version ^0.21.0 for httpx-auth

Updating dependencies

Resolving dependencies... (0.2s)
Because no versions of httpx-auth match >0.21.0,<0.22.0
 and httpx-auth (0.21.0) depends on httpx (==0.26.*), httpx-auth (>=0.21.0,<0.22.0) requires httpx (==0.26.*).
So, because http-scripts depends on both httpx (^0.27.0) and httpx-auth (^0.21.0), version solving failed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.