Comments (6)
@aaraney Increasing max_sites_per_request
to 50 fixed the issue.
Of note: I haven't been able to get this error to appear on all systems. I expect this is related to the performance of sqlite
on different systems? Do you recall how we settled on default number of 20 sites per request?
from hydrotools.
Barring submission of another issue it's probably safe to close this one. The default of 20 sites per request likely covers most users, so changing it might do more harm than good.
Note the machine on which I got this error was a virtual machine, running on an ancient OS, stored on an HDD, and frequently subject to scans that eat up a significant portion of system resources. So, quite an extraordinary case remedied by a simple solution.
Thanks for the help!
from hydrotools.
Hmmm, thanks for reporting this @jarq6c! This is a little surprising given that we are not using multiprocessing. This smells like an upstream bug, but i'll need to do a little digging to verify that.
For a little context sqlite
, the db used by nwis_client
to cache http requests, uses a multiple reader single writer model where when a writer starts a write, the whole db is locked. This consequently means that any readers or writers that try to access the db while the lock is held error out. Libraries that interact with sqlite will often use a spin lock to get around this.
This error is surprising to me because we are using an async approach to retrieving data instead of threads or multiprocessing. It is surprising because I would expect that writing to the cache is a blocking operation. I suspect what is happening is a lot of async writes to the cache (db) (that might be large in size and take more time) and no logic to enforce sqlite
's the single writer constraint. Meaning, you need some way of capturing that a write to the db is occurring. Example solutions are a readwrite lock on the db connection object or a retry backoff strategy.
Ill do some digging later this morning and try to find a way around this and report back. Thanks again, @jarq6c!
from hydrotools.
In omnilib/aiosqlite#234 (the issue I previously linked) a provided solution was to limit the number of concurrent asyncio.gather
s.
aiosqlite does not use a connection pool. This is likely a limit of the underlying sqlite3 implementation. I'd suggest using a concurrency-limited gather, like
aioitertools.asyncio.gather(..., limit=X)
, especially with many concurrent writes, where only one connection would ever be able to write to a shared database anyways.
asyncio.gather
is used in the _restclient
's mget
method to do concurrent request retrieval which nwis_client
is using. I am not keen on placing a concurrency limit at the library level like was suggested above because it could hinder performance for some users depending on their hardware/software. I could be talked into adding a optional keyword argument that exposes this low level detail if we keep running into it. However, I think the most viable solution is to increase the number of sites returned in each request. You can pass the max_sites_per_request
keyword argument (current default is 20
sites) to IVDataService.get
to effectively change the request chunking size. I think increasing this to something like 50 or 75 will reduce the likelihood of running into this problem. @jarq6c, can you try this and see if you continue to run into the same issue?
from hydrotools.
Great!
Of note: I haven't been able to get this error to appear on all systems. I expect this is related to the performance of sqlite on different systems?
I think it's less about sqlite's performance per say, but instead io speed. So for example, at the hardware level an SSD vs a HDD and at the software level OS userspace / kernel space switching that's done during reading and writing.
Do you recall how we settled on default number of 20 sites per request?
I will have to look if I have the numbers somewhere, but it was a back-of-the-napkin calculation. I requested data at something like 10, 20, 40, 60, 80, 100, 500 sites and looked at request-response full travel times and 20 was on average the fastest.
from hydrotools.
Starting to dig now. It looks like others have run into similar problems when using aiosqlite
. See omnilib/aiosqlite#234.
from hydrotools.
Related Issues (20)
- NWIS IV Client `FutureWarning` HOT 3
- NWM Client New Test Failure: AttributeError: 'EntryPoints' object has no attribute 'get' HOT 5
- Pandas >= 2.0.0 package compliance audit HOT 4
- Move `hydrotools` namespace packages to separate repositories HOT 3
- "Run Slow Unit Tests" Action has been failing for some time HOT 2
- 3.7 Tests failing: xarray EntryPoints has no attribute get HOT 6
- DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace HOT 1
- AWS Retrospective HOT 10
- SVI Client slow unit tests failing HOT 8
- nwm_client_new documentation is incomplete for private servers. HOT 1
- nwm_client_new `get` methods fails with custom Parquet Store
- Consider supporting MS Azure (`nwm_client_new`) HOT 1
- Determine feasibility of _restclient's continued dependence on `aiohttp_cache_client` HOT 5
- SVI Client get method failing due to Pydantic>2 issue HOT 1
- New version of `_restclient` cannot be pushed to PyPI b.c. namespace packages with leading `_` in package name cannot be uploaded HOT 1
- Add some basic information about the NWM operational configuration to the `nwm_client_new` package. HOT 1
- Event Detection methods are raising `FutureWarning` HOT 3
- question about update cycle for hydrotools HOT 3
- NWPS API Available HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hydrotools.