GithubHelp home page GithubHelp logo

gfn-cis / bq-adet Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 37 KB

Automatic Anomaly Decetor

Home Page: https://gfn.ru

HCL 81.82% PLpgSQL 12.87% Python 5.31%
machinelearning bigquery bigqueryml statistics anomaly-detection

bq-adet's Introduction

bq-adet Automatic Anomaly Detection

Habr post in Russian explaining the purpose of Adet for https://habr.com/ru/post/599645/

Deployment

gcloud auth application-default login --scopes=https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/bigquery

Above might fail, so you can just use the

gcloud auth  login --update-adc --enable-gdrive-access  
gcloud config set project your_project
gcloud iam service-accounts create adet-deployment --description="adet-deployment" --display-name="adet-deployment"

Upload Demo Adet config.xls to the Google Drive and convert it to the Google Sheets format

Grant adet-deployment service account viewers access to the Google Sheets

copy tf/variables.tf.example to the variables.tf and write your infrastructure parameters

deploy the infrastructure

cd tf 
terraform init
terraform apply 

During the apply, some API might not be enabled yet : BigQuery DataTransfer Service. Just follow the links from terraform error report and enable required APIs and re-run apply

Gaps in data and High-Order data

Sometimes you have to analyze the high-order data (averages and other), which are relying on other data. In this case, when data is absent for some date, Adet has nothing to analyze - the is no subject for analysis. For example, if there are no sales of particular good for some day, but in present day there was a bunch of sales, Adet will not react on this because there is no record in timeseries. To solve this problem, you need to fill such data with zeroes (e.g. fill the gaps). The function adet_no_gaps_view_ddl is to help with this task.

The call example:

select adet.adet_no_gaps_view_ddl("GroupingColumn1, GroupingColumn2", "metric_column", "date_column", min_metric, 'data_source').ddl

Will return the DDL for the view where data are grouped by GroupingColumn1 and GroupingColumn2, with gaps in metric_column, filled with zeros in each group.

Lower-order metric in some groups might not be worthy to analyze. When for some good you run 1K sales/daily, and for others - 1-3 in week, it could be good choice to remove these low-sale groups from the analysis for the matter of saving money on ML charges and reduce type 1 errors amount. But, when some outsider became high-seller, it should be analyzed. For such case, the min_metric argument is used. Specify the minimum metric to include in the analysis. When metric in some group reaches the minimum threshold, all past metrics in timeline for this group will be included.

Using as Terraform module

module "adet" {
  source     = "git::https://github.com/GFNRussia/bq-adet.git//tf"
  dataset_id = "adet"
  location = "EU" #or another location
  config_file_id        = "https://docs.google.com/spreadsheets/d/rest of the sheets url"
  project_id            = "google project id"
  region                = "europe-west1"
  service_account_email = "[email protected]"
}

bq-adet's People

Contributors

akmgfn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.