obendidi / httpx-cache Goto Github PK
View Code? Open in Web Editor NEWSimple caching transport for httpx
Home Page: https://obendidi.github.io/httpx-cache/
License: BSD 3-Clause "New" or "Revised" License
Simple caching transport for httpx
Home Page: https://obendidi.github.io/httpx-cache/
License: BSD 3-Clause "New" or "Revised" License
Describe the bug
A clear and concise description of what the bug is.
Hey there! I was using your project and couldn't figure out why I was getting weird errors that didn't show up in my dev environment.
It seems like you use attrs but don't declare it as a dependency in pyproject.toml
, my dev environment had it in from some other project hence why I only saw the errors once I installed my project outside the dev environment.
Imported here:
httpx-cache/httpx_cache/utils.py
Line 8 in 7789306
Used here:
httpx-cache/httpx_cache/utils.py
Line 94 in 7789306
But it's not listed in dependencies in pyproject.toml
here:
Line 30 in 7789306
When I ran my project I got ModuleNotFoundError
and traced it back here.
Expected behavior
A clear and concise description of what you expected to happen.
The dependency should be declared in pyproject.toml
Screenshots
If applicable, add screenshots to help explain your problem.
I forgot to grab any screenshots but I hope the problem is clear
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
How about adding an header to the response to confirm if the request is from cache or not.
Hi there! Nice looking library.
Is your feature request related to a problem? Please describe.
The README mentions this project is inspired in part by httpx-caching
-- but doesn't say what differences it has.
Describe the solution you'd like
A small comparison or even sentence describing why this exists as distinct from httpx-caching
.
I know nothing about what came first, nor do I know (yet) what the answer is to how they differ, but regardless thanks for making the library!
Great job, giving httpx the ability to cache. It is very helpful in crawling crawler. At present, with the development of distributed crawler, local file cache is difficult to help distributed crawler speed up. It is recommended to implement redis cache
PROC_SHOPIFY_CACHE = 12 * 60 * 60 async def proc_shopify(order_no): url = f"https://alfuuktopkh.myshopify.com/admin/api/2023-07/orders.json?name={order_no}&status=any" headers = { 'Content-Type': 'application/json', 'X-Shopify-Access-Token': ADMIN_token, "cache-control": f"max-age={PROC_SHOPIFY_CACHE}" } async with httpx_cache.AsyncClient() as client: response = await client.get(url, headers=headers)
I don't think caching is working on the request, as the response is similar in multiple runs.
Could you please update the dependencies to allow httpx version 0.23? it looks like it patches some critical vulnerability.
thanks!
It would be nice to have an easy way to overwrite is_response_cacheable
so that it ignores no-store
in response.
httpx-cache/httpx_cache/cache_control.py
Lines 215 to 220 in 7478478
Maybe by being able to pass own instance of CacheControl
?
httpx-cache/httpx_cache/transport.py
Line 103 in 7478478
Describe the bug
Concurrent access to cached file throws an exception
To Reproduce
import asyncio
import httpx_cache
import shutil
cache_folder = "./httpx-cache-tests/"
try:
shutil.rmtree(cache_folder)
except FileNotFoundError:
pass
async def worker(i):
print(f"Entering {i}")
async with httpx_cache.AsyncClient(cache=httpx_cache.FileCache(cache_folder)) as client:
result = await client.get("http://www.google.com")
print(f"Exiting {i}")
return result
# works; if my mental model is correct, this writes to the file 10x in a row
tasks = [worker(i) for i in range(10)]
foo = await asyncio.gather(*tasks, return_exceptions=False)
# raises an error, though it works if the number of tasks is small ??
tasks = [worker(i) for i in range(100)]
foo = await asyncio.gather(*tasks, return_exceptions=False)
On the first run, the above raises a FileNotFoundError
:
FileNotFoundError Traceback (most recent call last)
Cell In [1], line 24
22 # raises an error, though it works if the number of tasks is small ??
23 tasks = [worker(i) for i in range(100)]
---> 24 foo = await asyncio.gather(*tasks, return_exceptions=False)
Cell In [1], line 14, in worker(i)
12 print(f"Entering {i}")
13 async with httpx_cache.AsyncClient(cache=httpx_cache.FileCache(cache_folder)) as client:
---> 14 result = await client.get("http://www.google.com")
15 print(f"Exiting {i}")
16 return result
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1751, in AsyncClient.get(self, url, params, headers, cookies, auth, follow_redirects, timeout, extensions)
1734 async def get(
1735 self,
1736 url: URLTypes,
(...)
1744 extensions: typing.Optional[dict] = None,
1745 ) -> Response:
1746 """
1747 Send a `GET` request.
1748
1749 **Parameters**: See `httpx.request`.
1750 """
-> 1751 return await self.request(
1752 "GET",
1753 url,
1754 params=params,
1755 headers=headers,
1756 cookies=cookies,
1757 auth=auth,
1758 follow_redirects=follow_redirects,
1759 timeout=timeout,
1760 extensions=extensions,
1761 )
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1527, in AsyncClient.request(self, method, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)
1498 """
1499 Build and send a request.
1500
(...)
1512 [0]: /advanced/#merging-of-configuration
1513 """
1514 request = self.build_request(
1515 method=method,
1516 url=url,
(...)
1525 extensions=extensions,
1526 )
-> 1527 return await self.send(request, auth=auth, follow_redirects=follow_redirects)
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1614, in AsyncClient.send(self, request, stream, auth, follow_redirects)
1606 follow_redirects = (
1607 self.follow_redirects
1608 if isinstance(follow_redirects, UseClientDefault)
1609 else follow_redirects
1610 )
1612 auth = self._build_request_auth(request, auth)
-> 1614 response = await self._send_handling_auth(
1615 request,
1616 auth=auth,
1617 follow_redirects=follow_redirects,
1618 history=[],
1619 )
1620 try:
1621 if not stream:
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1642, in AsyncClient._send_handling_auth(self, request, auth, follow_redirects, history)
1639 request = await auth_flow.__anext__()
1641 while True:
-> 1642 response = await self._send_handling_redirects(
1643 request,
1644 follow_redirects=follow_redirects,
1645 history=history,
1646 )
1647 try:
1648 try:
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1679, in AsyncClient._send_handling_redirects(self, request, follow_redirects, history)
1676 for hook in self._event_hooks["request"]:
1677 await hook(request)
-> 1679 response = await self._send_single_request(request)
1680 try:
1681 for hook in self._event_hooks["response"]:
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1716, in AsyncClient._send_single_request(self, request)
1711 raise RuntimeError(
1712 "Attempted to send an sync request with an AsyncClient instance."
1713 )
1715 with request_context(request=request):
-> 1716 response = await transport.handle_async_request(request)
1718 assert isinstance(response.stream, AsyncByteStream)
1719 response.request = request
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx_cache/transport.py:118, in AsyncCacheControlTransport.handle_async_request(self, request)
116 if self.controller.is_request_cacheable(request):
117 logger.debug(f"Checking cache for: {request}")
--> 118 cached_response = await self.cache.aget(request)
119 if cached_response is not None:
120 logger.debug(f"Found cached response for: {request}")
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx_cache/cache/file.py:65, in FileCache.aget(self, request)
63 if await filepath.is_file():
64 async with RWLock().reader:
---> 65 cached = await filepath.read_bytes()
66 return self.serializer.loads(request=request, cached=cached)
67 return None
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/anyio/_core/_fileio.py:505, in Path.read_bytes(self)
504 async def read_bytes(self) -> bytes:
--> 505 return await to_thread.run_sync(self._path.read_bytes)
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/anyio/to_thread.py:31, in run_sync(func, cancellable, limiter, *args)
10 async def run_sync(
11 func: Callable[..., T_Retval],
12 *args: object,
13 cancellable: bool = False,
14 limiter: Optional[CapacityLimiter] = None
15 ) -> T_Retval:
16 """
17 Call the given function with the given arguments in a worker thread.
18
(...)
29
30 """
---> 31 return await get_asynclib().run_sync_in_worker_thread(
32 func, *args, cancellable=cancellable, limiter=limiter
33 )
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/anyio/_backends/_asyncio.py:937, in run_sync_in_worker_thread(func, cancellable, limiter, *args)
935 context.run(sniffio.current_async_library_cvar.set, None)
936 worker.queue.put_nowait((context, func, args, future))
--> 937 return await future
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/anyio/_backends/_asyncio.py:867, in WorkerThread.run(self)
865 exception: Optional[BaseException] = None
866 try:
--> 867 result = context.run(func, *args)
868 except BaseException as exc:
869 exception = exc
File ~/miniconda3/envs/dev/lib/python3.9/pathlib.py:1249, in Path.read_bytes(self)
1245 def read_bytes(self):
1246 """
1247 Open the file in bytes mode, read it, and close the file.
1248 """
-> 1249 with self.open(mode='rb') as f:
1250 return f.read()
File ~/miniconda3/envs/dev/lib/python3.9/pathlib.py:1242, in Path.open(self, mode, buffering, encoding, errors, newline)
1236 def open(self, mode='r', buffering=-1, encoding=None,
1237 errors=None, newline=None):
1238 """
1239 Open the file pointed by this path and return a file object, as
1240 the built-in open() function does.
1241 """
-> 1242 return io.open(self, mode, buffering, encoding, errors, newline,
1243 opener=self._opener)
File ~/miniconda3/envs/dev/lib/python3.9/pathlib.py:1110, in Path._opener(self, name, flags, mode)
1108 def _opener(self, name, flags, mode=0o666):
1109 # A stub for the opener argument to built-in open()
-> 1110 return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'httpx-cache-tests/9ebb612c68d6ddc348478732369ddd52e58ab325704a3beda73e90b8'
On subsequent runs in the same jupyter notebook session, the same code raises some msgpack error:
---------------------------------------------------------------------------
ExtraData Traceback (most recent call last)
Cell In [3], line 23
19 foo = await asyncio.gather(*tasks, return_exceptions=False)
22 tasks = [worker(i) for i in range(100)]
---> 23 foo = await asyncio.gather(*tasks, return_exceptions=False)
Cell In [3], line 14, in worker(i)
12 print(f"Entering {i}")
13 async with httpx_cache.AsyncClient(cache=httpx_cache.FileCache(cache_folder)) as client:
---> 14 result = await client.get("http://www.google.com")
15 print(f"Exiting {i}")
16 return result
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1751, in AsyncClient.get(self, url, params, headers, cookies, auth, follow_redirects, timeout, extensions)
1734 async def get(
1735 self,
1736 url: URLTypes,
(...)
1744 extensions: typing.Optional[dict] = None,
1745 ) -> Response:
1746 """
1747 Send a `GET` request.
1748
1749 **Parameters**: See `httpx.request`.
1750 """
-> 1751 return await self.request(
1752 "GET",
1753 url,
1754 params=params,
1755 headers=headers,
1756 cookies=cookies,
1757 auth=auth,
1758 follow_redirects=follow_redirects,
1759 timeout=timeout,
1760 extensions=extensions,
1761 )
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1527, in AsyncClient.request(self, method, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)
1498 """
1499 Build and send a request.
1500
(...)
1512 [0]: /advanced/#merging-of-configuration
1513 """
1514 request = self.build_request(
1515 method=method,
1516 url=url,
(...)
1525 extensions=extensions,
1526 )
-> 1527 return await self.send(request, auth=auth, follow_redirects=follow_redirects)
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1614, in AsyncClient.send(self, request, stream, auth, follow_redirects)
1606 follow_redirects = (
1607 self.follow_redirects
1608 if isinstance(follow_redirects, UseClientDefault)
1609 else follow_redirects
1610 )
1612 auth = self._build_request_auth(request, auth)
-> 1614 response = await self._send_handling_auth(
1615 request,
1616 auth=auth,
1617 follow_redirects=follow_redirects,
1618 history=[],
1619 )
1620 try:
1621 if not stream:
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1642, in AsyncClient._send_handling_auth(self, request, auth, follow_redirects, history)
1639 request = await auth_flow.__anext__()
1641 while True:
-> 1642 response = await self._send_handling_redirects(
1643 request,
1644 follow_redirects=follow_redirects,
1645 history=history,
1646 )
1647 try:
1648 try:
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1679, in AsyncClient._send_handling_redirects(self, request, follow_redirects, history)
1676 for hook in self._event_hooks["request"]:
1677 await hook(request)
-> 1679 response = await self._send_single_request(request)
1680 try:
1681 for hook in self._event_hooks["response"]:
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx/_client.py:1716, in AsyncClient._send_single_request(self, request)
1711 raise RuntimeError(
1712 "Attempted to send an sync request with an AsyncClient instance."
1713 )
1715 with request_context(request=request):
-> 1716 response = await transport.handle_async_request(request)
1718 assert isinstance(response.stream, AsyncByteStream)
1719 response.request = request
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx_cache/transport.py:118, in AsyncCacheControlTransport.handle_async_request(self, request)
116 if self.controller.is_request_cacheable(request):
117 logger.debug(f"Checking cache for: {request}")
--> 118 cached_response = await self.cache.aget(request)
119 if cached_response is not None:
120 logger.debug(f"Found cached response for: {request}")
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx_cache/cache/file.py:66, in FileCache.aget(self, request)
64 async with RWLock().reader:
65 cached = await filepath.read_bytes()
---> 66 return self.serializer.loads(request=request, cached=cached)
67 return None
File ~/miniconda3/envs/dev/lib/python3.9/site-packages/httpx_cache/serializer/common.py:172, in MsgPackSerializer.loads(self, cached, request)
168 def loads( # type: ignore
169 self, *, cached: bytes, request: tp.Optional[httpx.Request] = None
170 ) -> httpx.Response:
171 """Load an httpx.Response from a msgapck bytes."""
--> 172 return super().loads(cached=msgpack.loads(cached, raw=False), request=request)
File msgpack/_unpacker.pyx:201, in msgpack._cmsgpack.unpackb()
ExtraData: unpack(b) received extra data.
Expected behavior
Concurrent access to an already-cached file should be fine ?
Additional context
I probably brought this on myself
Is your feature request related to a problem? Please describe.
I'd like to use the FileCache
backend to store the cache to object storage (S3), instead of the local disk. Since s3fs
, which is built on top of fsspec
, provides an abstract filesystem for S3 and universal_pathlib
's UPath
is a subclass of pathlib.Path
and provides a similar interface to any fsspec
-backed file system, I was hoping to simply pass a UPath
as the cache_dir
for a FileCache
and have it work automatically.
While this does work for the synchronous client:
import httpx_cache
from upath import UPath
bucket = UPath("s3://my-bucket")
cache_dir = bucket / "cache_test"
cache = httpx_cache.FileCache(cache_dir=cache_dir)
def main():
with httpx_cache.Client(cache=cache) as client:
response1 = client.get("https://httpbin.org/get")
response2 = client.get("https://httpbin.org/get")
return response2
main()
# Shows a new file created for the cache
print([f for f in cache_dir.iterdir()])
It doesn't work with the AsyncClient:
import asyncio
import httpx_cache
from upath import UPath
bucket = UPath("s3://my_bucket")
cache_dir = bucket / "async_cache"
cache = httpx_cache.FileCache(cache_dir=cache_dir)
async def async_main():
async with httpx_cache.AsyncClient(cache=cache) as client:
response1 = await client.get("https://httpbin.org/get")
response2 = await client.get("https://httpbin.org/get")
return response2
asyncio.run(async_main())
# No cache files are created
print([f for f in cache_dir.iterdir()])
If the async_cache
prefix doesn't exist in the bucket already, I'm getting a FileNotFoundError: my_bucket/async_cache
.
To circumvent this, I created a dummy file to ensure the prefix existed. The script now runs. However, no files are created for the cache.
Hi,
Would you consider making your library available on conda?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.