GithubHelp home page GithubHelp logo

Comments (6)

dmitrii-khr avatar dmitrii-khr commented on June 7, 2024

wrong repo

from elementary.

dmitrii-khr avatar dmitrii-khr commented on June 7, 2024

elementary-data/dbt-data-reliability#711

from elementary.

haritamar avatar haritamar commented on June 7, 2024

HI @dmitrii-khr - actually we prefer Elementary issues to be concentrated in this repo so it's good that you opened it here. I'll close the other one actually.

Currently, this is expected behavior, though I understand the confusion.
When you use a detection period of 1 day, we actually include the last full bucket as the detection - in that sense the test will work only on yesterday and not on today.
So it's actually not a bug - however daily buckets are currently always from midnight to midnight, and not the "last 24 hours".

If you'd like to make the test more real-time, you may want to consider decreasing the time bucket from daily to less than that. For example, hourly buckets can be set like this:

time_bucket:
    period: hour
    count: 1

Please let me know if this makes sense.
Thanks!

from elementary.

dmitrii-khr avatar dmitrii-khr commented on June 7, 2024

Hi!
Thank you for the reaction!

When you use a detection period of 1 day, we actually include the last full bucket as the detection - in that sense the test will work only on yesterday and not on today.

What I can see from logs works in other way.

Let's consider an example.
Column anomaly test. Timestamp column with time. No detection delay. Data is updated many times a day including the current day (2024-05-21).
Test considers complete buckets only. The first query determines the following boundaries for buckets:
image
Notice that today's values are not going to be in the buckets at all.

Going forward with logs and intermediate results we can get the following picture:
image
Bucket 2024-05-19 to 2024-05-20 marked as is_anomalous. It is not the last full bucket. It is not yesterday, it is 2 days ago.

It happens because of the non-strict condition in the anomaly_scores_with_is_anomalous CTE:
and bucket_end >= dateadd(day, cast('-1' as integer), cast(max_bucket_end as timestamp))

Overall test fails:
image

from elementary.

dmitrii-khr avatar dmitrii-khr commented on June 7, 2024

Any thoughts?

from elementary.

haritamar avatar haritamar commented on June 7, 2024

Hi @dmitrii-khr ,
Thanks for sharing and for the details.
I agree that this is a bug.

Would you like to contribute the fix? (I imagine this is just changing the condition as you wrote + testing that it produces the required results)

from elementary.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.