Comments (3)
@AlSaeed proposed the following:
- The primary id of each row should be also retrieved when retrieving the time series.
- 4 lists of row ids should be computed (across timeseries). These are the rows that will change their direction value to [-1,0,1,NULL].
- 4 batched queries to update the direction field only, based on the ids of the lists above.
- When retrieving the stale timeseries we should store them in temporary table.
- Updating timestamp2 can be done in 1 query, using the temporary table from [4.].
- For the retrieval part, if we expect the data of all timeseries to fit in memory simultaneously, we can use the temporary table from [4.] to retrieve the entire data and use pandas to separate them into independent series.
This will work, with some modifications to support the upcoming shift to include issue dates. I've consulted with @jacobbien, and he suggests that since direction is relatively complex to compute (it's the slope of a line fit to all values of the previous 7 days, thresholded based on the variance of historical data), that it's really a computed product, not raw data. Therefore, updating the direction column should be done in-place without creating a new issue for the affected timepoint.
As an example, consider the following input to the direction updater:
geo value | value | time value | issue | direction | direction timestamp |
---|---|---|---|---|---|
ca | 4.1 | 20200601 | 20200601 | 0 | stale |
ca | 4.0 | 20200602 | 20200602 | 0 | stale |
ca | 4.1 | 20200603 | 20200603 | 0 | stale |
ca | 4.2 | 20200604 | 20200604 | 0 | stale |
ca | 5.0 | 20200605 | 20200605 | null | stale |
The proposed solution is the following:
geo value | value | time value | issue | direction | direction timestamp |
---|---|---|---|---|---|
ca | 4.1 | 20200601 | 20200601 | 1 | fresh |
ca | 4.0 | 20200602 | 20200602 | 1 | fresh |
ca | 4.1 | 20200603 | 20200603 | 1 | fresh |
ca | 4.2 | 20200604 | 20200604 | 1 | fresh |
ca | 5.0 | 20200605 | 20200605 | 1 | fresh |
The alternative is the following:
geo value | value | time value | issue | direction | direction timestamp |
---|---|---|---|---|---|
ca | 4.1 | 20200601 | 20200601 | 0 | stale |
ca | 4.1 | 20200601 | 20200605 | 1 | fresh |
ca | 4.0 | 20200602 | 20200602 | 0 | stale |
ca | 4.0 | 20200602 | 20200605 | 1 | fresh |
ca | 4.1 | 20200603 | 20200603 | 0 | stale |
ca | 4.1 | 20200603 | 20200605 | 1 | fresh |
ca | 4.2 | 20200604 | 20200604 | 0 | stale |
ca | 4.2 | 20200604 | 20200605 | 1 | fresh |
ca | 5.0 | 20200605 | 20200605 | 1 | fresh |
from delphi-epidata.
@melange396, barring unforeseen complications we're expecting a PR for query optimizations on this tomorrow -- if database calls turn out to be the top contributor in your profiling efforts, ignore them for now.
from delphi-epidata.
First pass of implementation is in #133
from delphi-epidata.
Related Issues (20)
- Move acquisition deployment off of github-deploy-repo HOT 6
- Add covid_naat_pct_positive_7dav to the dsew documentation
- Add basic integration tests for all endpoints HOT 2
- Consider adding a layer to permit renaming of signals HOT 2
- Refactor `csv_importer.py` and `csv_to_database.py` and `covidcast_row.py`
- Permit inequalities for `issue`, `time_value`
- Some NoroSTAT DB table creation queries are in wrong function HOT 1
- Consider updating census data HOT 4
- [Database Schema] Convert `missing_*` columns to TINYINT HOT 2
- API Keys: Remove `|| github.ref == 'refs/heads/api-keys'` after transition to production status.
- sample queries for API server correctness HOT 1
- get rid of "__test_target__" et al HOT 3
- Disable/remove afhsb endpoint
- `covidcast` endpoint - `as_of` field fails silently when given a range of dates HOT 1
- `covidcast` endpoint - inconsistent handling of ISO format dates HOT 3
- `covidcast` endpoint - no mutual exclusivity check for issue, lag, and as_of fields HOT 7
- specially handle `as_of`==$TODAY HOT 3
- CI: update node
- Norostat returning -2 no results despite the table having content HOT 2
- Help! I got an error that says I need to register for an API key. What do I do? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from delphi-epidata.