GithubHelp home page GithubHelp logo

obendidi / httpx-cache Goto Github PK

View Code? Open in Web Editor NEW
51.0 2.0 5.0 990 KB

Simple caching transport for httpx

Home Page: https://obendidi.github.io/httpx-cache/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
httpx cache cache-control asyncio python http caching

httpx-cache's Introduction

Archived: This project is no longer maintained, please migrate to hishel


HTTPX-CACHE

codecov

httpx-cache is an implementation of the caching algorithms in httplib2 and CacheControl for use with httpx transport object.

It is is heavily insipired by:

Documentation

Full documentation is available at https://obendidi.github.io/httpx-cache/

Installation

Using pip:

pip install httpx-cache

Please make sure to pin the exact httpx-cache version for your project, to make sure it all works.

Contributors

Feel free to contribute !

httpx-cache's People

Contributors

dependabot[bot] avatar followtheprocess avatar lesleslie avatar ludaavics avatar mhils avatar obendidi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

httpx-cache's Issues

Hope to use redis cache

Great job, giving httpx the ability to cache. It is very helpful in crawling crawler. At present, with the development of distributed crawler, local file cache is difficult to help distributed crawler speed up. It is recommended to implement redis cache

Errors in concurrent access to cached files

Describe the bug
Concurrent access to cached file throws an exception

To Reproduce

import asyncio
import httpx_cache
import shutil

cache_folder = "./httpx-cache-tests/"
try:
    shutil.rmtree(cache_folder)
except FileNotFoundError:
    pass

async def worker(i):
    print(f"Entering {i}")
    async with httpx_cache.AsyncClient(cache=httpx_cache.FileCache(cache_folder)) as client:
        result = await client.get("http://www.google.com")
        print(f"Exiting  {i}")
        return result

# works; if my mental model is correct, this writes to the file 10x in a row
tasks = [worker(i) for i in range(10)]
foo = await asyncio.gather(*tasks, return_exceptions=False) 

#  raises an error, though it works if the number of tasks is small ??
tasks = [worker(i) for i in range(100)]
foo = await asyncio.gather(*tasks, return_exceptions=False)  

On the first run, the above raises a FileNotFoundError:

FileNotFoundError                         Traceback (most recent call last)
Cell In [1], line 24
     22 #  raises an error, though it works if the number of tasks is small ??
     23 tasks = [worker(i) for i in range(100)]
---> 24 foo = await asyncio.gather(*tasks, return_exceptions=False)  

Cell In [1], line 14, in worker(i)
     12 print(f"Entering {i}")
     13 async with httpx_cache.AsyncClient(cache=httpx_cache.FileCache(cache_folder)) as client:
---> 14     result = await client.get("http://www.google.com")
     15     print(f"Exiting  {i}")
     16     return result

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1751, in AsyncClient.get(self, url, params, headers, cookies, auth, follow_redirects, timeout, extensions)
   1734 async def get(
   1735     self,
   1736     url: URLTypes,
   (...)
   1744     extensions: typing.Optional[dict] = None,
   1745 ) -> Response:
   1746     """
   1747     Send a `GET` request.
   1748 
   1749     **Parameters**: See `httpx.request`.
   1750     """
-> 1751     return await self.request(
   1752         "GET",
   1753         url,
   1754         params=params,
   1755         headers=headers,
   1756         cookies=cookies,
   1757         auth=auth,
   1758         follow_redirects=follow_redirects,
   1759         timeout=timeout,
   1760         extensions=extensions,
   1761     )

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1527, in AsyncClient.request(self, method, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)
   1498 """
   1499 Build and send a request.
   1500 
   (...)
   1512 [0]: /advanced/#merging-of-configuration
   1513 """
   1514 request = self.build_request(
   1515     method=method,
   1516     url=url,
   (...)
   1525     extensions=extensions,
   1526 )
-> 1527 return await self.send(request, auth=auth, follow_redirects=follow_redirects)

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1614, in AsyncClient.send(self, request, stream, auth, follow_redirects)
   1606 follow_redirects = (
   1607     self.follow_redirects
   1608     if isinstance(follow_redirects, UseClientDefault)
   1609     else follow_redirects
   1610 )
   1612 auth = self._build_request_auth(request, auth)
-> 1614 response = await self._send_handling_auth(
   1615     request,
   1616     auth=auth,
   1617     follow_redirects=follow_redirects,
   1618     history=[],
   1619 )
   1620 try:
   1621     if not stream:

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1642, in AsyncClient._send_handling_auth(self, request, auth, follow_redirects, history)
   1639 request = await auth_flow.__anext__()
   1641 while True:
-> 1642     response = await self._send_handling_redirects(
   1643         request,
   1644         follow_redirects=follow_redirects,
   1645         history=history,
   1646     )
   1647     try:
   1648         try:

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1679, in AsyncClient._send_handling_redirects(self, request, follow_redirects, history)
   1676 for hook in self._event_hooks["request"]:
   1677     await hook(request)
-> 1679 response = await self._send_single_request(request)
   1680 try:
   1681     for hook in self._event_hooks["response"]:

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1716, in AsyncClient._send_single_request(self, request)
   1711     raise RuntimeError(
   1712         "Attempted to send an sync request with an AsyncClient instance."
   1713     )
   1715 with request_context(request=request):
-> 1716     response = await transport.handle_async_request(request)
   1718 assert isinstance(response.stream, AsyncByteStream)
   1719 response.request = request

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx_cache/transport.py:118, in AsyncCacheControlTransport.handle_async_request(self, request)
    116 if self.controller.is_request_cacheable(request):
    117     logger.debug(f"Checking cache for: {request}")
--> 118     cached_response = await self.cache.aget(request)
    119     if cached_response is not None:
    120         logger.debug(f"Found cached response for: {request}")

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx_cache/cache/file.py:65, in FileCache.aget(self, request)
     63 if await filepath.is_file():
     64     async with RWLock().reader:
---> 65         cached = await filepath.read_bytes()
     66     return self.serializer.loads(request=request, cached=cached)
     67 return None

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/anyio/_core/_fileio.py:505, in Path.read_bytes(self)
    504 async def read_bytes(self) -> bytes:
--> 505     return await to_thread.run_sync(self._path.read_bytes)

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/anyio/to_thread.py:31, in run_sync(func, cancellable, limiter, *args)
     10 async def run_sync(
     11     func: Callable[..., T_Retval],
     12     *args: object,
     13     cancellable: bool = False,
     14     limiter: Optional[CapacityLimiter] = None
     15 ) -> T_Retval:
     16     """
     17     Call the given function with the given arguments in a worker thread.
     18 
   (...)
     29 
     30     """
---> 31     return await get_asynclib().run_sync_in_worker_thread(
     32         func, *args, cancellable=cancellable, limiter=limiter
     33     )

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/anyio/_backends/_asyncio.py:937, in run_sync_in_worker_thread(func, cancellable, limiter, *args)
    935 context.run(sniffio.current_async_library_cvar.set, None)
    936 worker.queue.put_nowait((context, func, args, future))
--> 937 return await future

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/anyio/_backends/_asyncio.py:867, in WorkerThread.run(self)
    865 exception: Optional[BaseException] = None
    866 try:
--> 867     result = context.run(func, *args)
    868 except BaseException as exc:
    869     exception = exc

File ~/miniconda3/envs/dev/lib/python3.9/pathlib.py:1249, in Path.read_bytes(self)
   1245 def read_bytes(self):
   1246     """
   1247     Open the file in bytes mode, read it, and close the file.
   1248     """
-> 1249     with self.open(mode='rb') as f:
   1250         return f.read()

File ~/miniconda3/envs/dev/lib/python3.9/pathlib.py:1242, in Path.open(self, mode, buffering, encoding, errors, newline)
   1236 def open(self, mode='r', buffering=-1, encoding=None,
   1237          errors=None, newline=None):
   1238     """
   1239     Open the file pointed by this path and return a file object, as
   1240     the built-in open() function does.
   1241     """
-> 1242     return io.open(self, mode, buffering, encoding, errors, newline,
   1243                    opener=self._opener)

File ~/miniconda3/envs/dev/lib/python3.9/pathlib.py:1110, in Path._opener(self, name, flags, mode)
   1108 def _opener(self, name, flags, mode=0o666):
   1109     # A stub for the opener argument to built-in open()
-> 1110     return self._accessor.open(self, flags, mode)

FileNotFoundError: [Errno 2] No such file or directory: 'httpx-cache-tests/9ebb612c68d6ddc348478732369ddd52e58ab325704a3beda73e90b8'

On subsequent runs in the same jupyter notebook session, the same code raises some msgpack error:

---------------------------------------------------------------------------
ExtraData                                 Traceback (most recent call last)
Cell In [3], line 23
     19 foo = await asyncio.gather(*tasks, return_exceptions=False)
     22 tasks = [worker(i) for i in range(100)]
---> 23 foo = await asyncio.gather(*tasks, return_exceptions=False)

Cell In [3], line 14, in worker(i)
     12 print(f"Entering {i}")
     13 async with httpx_cache.AsyncClient(cache=httpx_cache.FileCache(cache_folder)) as client:
---> 14     result = await client.get("http://www.google.com")
     15     print(f"Exiting  {i}")
     16     return result

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1751, in AsyncClient.get(self, url, params, headers, cookies, auth, follow_redirects, timeout, extensions)
   1734 async def get(
   1735     self,
   1736     url: URLTypes,
   (...)
   1744     extensions: typing.Optional[dict] = None,
   1745 ) -> Response:
   1746     """
   1747     Send a `GET` request.
   1748 
   1749     **Parameters**: See `httpx.request`.
   1750     """
-> 1751     return await self.request(
   1752         "GET",
   1753         url,
   1754         params=params,
   1755         headers=headers,
   1756         cookies=cookies,
   1757         auth=auth,
   1758         follow_redirects=follow_redirects,
   1759         timeout=timeout,
   1760         extensions=extensions,
   1761     )

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1527, in AsyncClient.request(self, method, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)
   1498 """
   1499 Build and send a request.
   1500 
   (...)
   1512 [0]: /advanced/#merging-of-configuration
   1513 """
   1514 request = self.build_request(
   1515     method=method,
   1516     url=url,
   (...)
   1525     extensions=extensions,
   1526 )
-> 1527 return await self.send(request, auth=auth, follow_redirects=follow_redirects)

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1614, in AsyncClient.send(self, request, stream, auth, follow_redirects)
   1606 follow_redirects = (
   1607     self.follow_redirects
   1608     if isinstance(follow_redirects, UseClientDefault)
   1609     else follow_redirects
   1610 )
   1612 auth = self._build_request_auth(request, auth)
-> 1614 response = await self._send_handling_auth(
   1615     request,
   1616     auth=auth,
   1617     follow_redirects=follow_redirects,
   1618     history=[],
   1619 )
   1620 try:
   1621     if not stream:

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1642, in AsyncClient._send_handling_auth(self, request, auth, follow_redirects, history)
   1639 request = await auth_flow.__anext__()
   1641 while True:
-> 1642     response = await self._send_handling_redirects(
   1643         request,
   1644         follow_redirects=follow_redirects,
   1645         history=history,
   1646     )
   1647     try:
   1648         try:

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1679, in AsyncClient._send_handling_redirects(self, request, follow_redirects, history)
   1676 for hook in self._event_hooks["request"]:
   1677     await hook(request)
-> 1679 response = await self._send_single_request(request)
   1680 try:
   1681     for hook in self._event_hooks["response"]:

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1716, in AsyncClient._send_single_request(self, request)
   1711     raise RuntimeError(
   1712         "Attempted to send an sync request with an AsyncClient instance."
   1713     )
   1715 with request_context(request=request):
-> 1716     response = await transport.handle_async_request(request)
   1718 assert isinstance(response.stream, AsyncByteStream)
   1719 response.request = request

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx_cache/transport.py:118, in AsyncCacheControlTransport.handle_async_request(self, request)
    116 if self.controller.is_request_cacheable(request):
    117     logger.debug(f"Checking cache for: {request}")
--> 118     cached_response = await self.cache.aget(request)
    119     if cached_response is not None:
    120         logger.debug(f"Found cached response for: {request}")

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx_cache/cache/file.py:66, in FileCache.aget(self, request)
     64     async with RWLock().reader:
     65         cached = await filepath.read_bytes()
---> 66     return self.serializer.loads(request=request, cached=cached)
     67 return None

File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx_cache/serializer/common.py:172, in MsgPackSerializer.loads(self, cached, request)
    168 def loads(  # type: ignore
    169     self, *, cached: bytes, request: tp.Optional[httpx.Request] = None
    170 ) -> httpx.Response:
    171     """Load an httpx.Response from a msgapck bytes."""
--> 172     return super().loads(cached=msgpack.loads(cached, raw=False), request=request)

File msgpack/_unpacker.pyx:201, in msgpack._cmsgpack.unpackb()

ExtraData: unpack(b) received extra data.

Expected behavior
Concurrent access to an already-cached file should be fine ?

Additional context
I probably brought this on myself

Conda

Hi,

Would you consider making your library available on conda?
Thanks!

`FileCache` on an `AsyncClient` doesn't work with `s3fs`

Is your feature request related to a problem? Please describe.
I'd like to use the FileCache backend to store the cache to object storage (S3), instead of the local disk. Since s3fs, which is built on top of fsspec, provides an abstract filesystem for S3 and universal_pathlib's UPath is a subclass of pathlib.Path and provides a similar interface to any fsspec-backed file system, I was hoping to simply pass a UPath as the cache_dir for a FileCache and have it work automatically.

While this does work for the synchronous client:

import httpx_cache
from upath import UPath

bucket = UPath("s3://my-bucket")
cache_dir = bucket / "cache_test"
cache = httpx_cache.FileCache(cache_dir=cache_dir)

def main():
    with httpx_cache.Client(cache=cache) as client:
        response1 = client.get("https://httpbin.org/get")
        response2 = client.get("https://httpbin.org/get")
    return response2

main()

# Shows a new file created for the cache
print([f for f in cache_dir.iterdir()])

It doesn't work with the AsyncClient:

import asyncio

import httpx_cache
from upath import UPath

bucket = UPath("s3://my_bucket")
cache_dir = bucket / "async_cache"
cache = httpx_cache.FileCache(cache_dir=cache_dir)

async def async_main():
    async with httpx_cache.AsyncClient(cache=cache) as client:
        response1 = await client.get("https://httpbin.org/get")
        response2 = await client.get("https://httpbin.org/get")
    return response2


asyncio.run(async_main())

# No cache files are created
print([f for f in cache_dir.iterdir()])

If the async_cache prefix doesn't exist in the bucket already, I'm getting a FileNotFoundError: my_bucket/async_cache.

To circumvent this, I created a dummy file to ensure the prefix existed. The script now runs. However, no files are created for the cache.

Cache not working

PROC_SHOPIFY_CACHE = 12 * 60 * 60 async def proc_shopify(order_no): url = f"https://alfuuktopkh.myshopify.com/admin/api/2023-07/orders.json?name={order_no}&status=any" headers = { 'Content-Type': 'application/json', 'X-Shopify-Access-Token': ADMIN_token, "cache-control": f"max-age={PROC_SHOPIFY_CACHE}" } async with httpx_cache.AsyncClient() as client: response = await client.get(url, headers=headers)

I don't think caching is working on the request, as the response is similar in multiple runs.

Overwrite is_response_cacheable / ignore `no-store`

It would be nice to have an easy way to overwrite is_response_cacheable so that it ignores no-store in response.

if "no-store" in request_cc or "no-store" in response_cc:
logger.debug(
"Request/Response cache-control headers has a 'no-store' directive. "
"Response is not cacheable!"
)
return False

Maybe by being able to pass own instance of CacheControl?

self.controller = CacheControl(

You have an undeclared dependency on attrs

Describe the bug
A clear and concise description of what the bug is.

Hey there! I was using your project and couldn't figure out why I was getting weird errors that didn't show up in my dev environment.

It seems like you use attrs but don't declare it as a dependency in pyproject.toml, my dev environment had it in from some other project hence why I only saw the errors once I installed my project outside the dev environment.

Imported here:

Used here:

But it's not listed in dependencies in pyproject.toml here:

[tool.poetry.dependencies]

When I ran my project I got ModuleNotFoundError and traced it back here.

Expected behavior
A clear and concise description of what you expected to happen.

The dependency should be declared in pyproject.toml

Screenshots
If applicable, add screenshots to help explain your problem.

I forgot to grab any screenshots but I hope the problem is clear

Desktop (please complete the following information):

  • OS: [e.g. iOS] MacOS
  • Version [e.g. 22] Python 3.10.1

Additional context
Add any other context about the problem here.

Comparison to httpx-caching

Hi there! Nice looking library.

Is your feature request related to a problem? Please describe.
The README mentions this project is inspired in part by httpx-caching -- but doesn't say what differences it has.

Describe the solution you'd like
A small comparison or even sentence describing why this exists as distinct from httpx-caching.

I know nothing about what came first, nor do I know (yet) what the answer is to how they differ, but regardless thanks for making the library!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.