Comments (6)
wrong repo
from elementary.
elementary-data/dbt-data-reliability#711
from elementary.
HI @dmitrii-khr - actually we prefer Elementary issues to be concentrated in this repo so it's good that you opened it here. I'll close the other one actually.
Currently, this is expected behavior, though I understand the confusion.
When you use a detection period of 1 day
, we actually include the last full bucket as the detection - in that sense the test will work only on yesterday and not on today.
So it's actually not a bug - however daily buckets are currently always from midnight to midnight, and not the "last 24 hours".
If you'd like to make the test more real-time, you may want to consider decreasing the time bucket from daily to less than that. For example, hourly buckets can be set like this:
time_bucket:
period: hour
count: 1
Please let me know if this makes sense.
Thanks!
from elementary.
Hi!
Thank you for the reaction!
When you use a detection period of 1 day, we actually include the last full bucket as the detection - in that sense the test will work only on yesterday and not on today.
What I can see from logs works in other way.
Let's consider an example.
Column anomaly test. Timestamp column with time. No detection delay. Data is updated many times a day including the current day (2024-05-21).
Test considers complete buckets only. The first query determines the following boundaries for buckets:
Notice that today's values are not going to be in the buckets at all.
Going forward with logs and intermediate results we can get the following picture:
Bucket 2024-05-19 to 2024-05-20 marked as is_anomalous. It is not the last full bucket. It is not yesterday, it is 2 days ago.
It happens because of the non-strict condition in the anomaly_scores_with_is_anomalous CTE:
and bucket_end >= dateadd(day, cast('-1' as integer), cast(max_bucket_end as timestamp))
from elementary.
Any thoughts?
from elementary.
Hi @dmitrii-khr ,
Thanks for sharing and for the details.
I agree that this is a bug.
Would you like to contribute the fix? (I imagine this is just changing the condition as you wrote + testing that it produces the required results)
from elementary.
Related Issues (20)
- Macro 'dbt_macro__check_schema_exists' takes no keyword argument 'information_schema' Snowflake HOT 3
- Extend test description with anomaly_score and training_stddev HOT 7
- [FEATURE] - Add possibility to alert top n heaviest models from "Model Duration" HOT 1
- Cannot generate observability report HOT 7
- Add project name and environment to slack notification HOT 2
- elementary column anomalies tests fail on redshift with error 'Relation name is longer than 127 characters' HOT 1
- [BUG] Teams alerts spam 5x of the same failure with suppression set. HOT 6
- 0.15 / BigQuery EDR monitor fails HOT 3
- Temporary tables for volume and freshness tests are not properly cleaned on Athena HOT 8
- Clicking on dbt tests link from dashboard redirects to tests page but with source freshness NOT excluded HOT 2
- --disable-samples should only hide the results of generic and singular tests, not graphs of freshness or volume tests HOT 1
- Elementary model information_schema_columns fails with error: Unity Catalog is not enabled HOT 1
- `UNRESOLVED_COLUMN` exception when upgrading to elementary 0.15.1 HOT 1
- Getting Redshift syntax errors on volume and column anomaly tests configured without timestamp parameter HOT 2
- Issue on docs: link redirects to main page HOT 1
- detection_delay for volume anomalies tests with month scale
- ignore_small_changes on BigQuery
- elementary in combination with sql server throws error
- Add --hours-back CLI option for edr monitor
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elementary.