snowflake-labs / dbt_constraints Goto Github PK

View Code? Open in Web Editor NEW

124.0 124.0 24.0 1.43 MB

This package generates database constraints based on the tests in a dbt project

License: Apache License 2.0

SQL 100.00%

dbt_constraints's People

Contributors

Stargazers

Watchers

dbt_constraints's Issues

Add an option to specify the name of the constraint (primary key, unique key, and foreign key)

It would be nice to, when desired, override the generated name of the constraint.

Unable to create the long name relations in PostgreSQL

Since PostgreSQL has a predefined limit on the identifier length - 63 bytes, there is a chance that we run into the duplicate name issue.
How to handle it( the possible solution):
by generating a unique constraint name - #18

Feature request: add constraints to seeds

As a developer, I would like dbt_constraints to add constraints for my tests on seeds, so that I can benefit from join elimination in queries on my seed tables.

Decouple constraints from tests

Because NOT NULL is enforced by Snowflake, I can rely on it. So if I have this constraint, running the test is pointless.

On the other hand in some cases we might want to NOT set any constraint, but still run the test, just to asses quality of the upstream data, but not interrupt our transformations in case of nulls.

To sum up, I would like to be able to choose any of combinations:

set constraint, don't run test
set constraint and run test
don't set constraint, run test

Add for BigQuery, since they now support FK & PK

You can now define Foreign and Primary keys on table columns in BigQuery. Would love to see this implemented 🙏 ❤️

Google docs regarding adding FK and PK constraints.
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#alter_table_add_constraint_statement

Queries to pull the information off the schema about FK & PK.

https://cloud.google.com/bigquery/docs/information-schema-constraint-column-usage

SELECT *
FROM <PROJECT_ID>.<DATASET>.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE;

https://cloud.google.com/bigquery/docs/information-schema-key-column-usage

SELECT *
FROM <PROJECT_ID>.<DATASET>.INFORMATION_SCHEMA.KEY_COLUMN_USAGE;

https://cloud.google.com/bigquery/docs/information-schema-table-constraints

SELECT *
FROM <PROJECT_ID>.<DATASET>.INFORMATION_SCHEMA.TABLE_CONSTRAINTS;

General spot for BQ schema information: https://cloud.google.com/bigquery/docs/information-schema-intro

Allow specification of custom materialization types for pk/fk/uk constraints

The macro create_constraints_by_type contains a check for, among other things, the type of materialization (table, incremental, snapshot, and seed) when determining if the table resulting from a model will have constraints created for it. This prevents dbt_constraints from creating constraints on tables created by custom materialization types.

Please add the ability to specify additional materialization types for which constraints can be created (or remove the check entirely?).

Thanks.

always_create_constraint is not working properly

always_create_constraint config is working only with thresholds (warn_if and error_if) .. It is not forcing the constraint with just severity: warn.

-- Not setting the FK

name: col_fk
tests:
- dbt_constraints.foreign_key:
pk_table_name: ref('pk_table')
pk_column_name: col_pk
config:
severity: warn
always_create_constraint: true

-- Setting the FK

name: col_fk
tests:
- dbt_constraints.foreign_key:
pk_table_name: ref('pk_table')
pk_column_name: col_pk
config:
warn_if: ">100"
always_create_constraint: true

Not Null constraints should be created first

It appears that the Not Null constraints are done last:
https://github.com/Snowflake-Labs/dbt_constraints/blob/main/macros/create_constraints.sql#L146

Some databases (SQL Server at least) do not allow ALTER COLUMN on a column that another constraint (PK, UK, FK) is dependent on. You'll get an error like "The object 'SOME_TABLE_SOME_COLUMN_UK' is dependent on column 'SOME_COLUMN'".

Additionally, some databases will not allow Primary Keys on a Nullable column.

Does it make sense to have Not Null done first?

Exclusion of a data catalog, database, or directory from a dbt_constraint build

Description

The Please Note section of the dbt_constraints README.md explains that "When you add this package, dbt will automatically begin to create ... not null constraints for not_null tests." The Disabling automatic constraint generation subsection details how not null constraints can (only) be disabled for sources.

There is no functionality to disable constraint creation for other upstream model types like snapshots. Currently, my organization's dbt project has not null tests defined for most snapshots. Constraints should be created for production data through development with the option to enable and disable creating constraints for raw/source and snapshot data.

Doesn't work when name != alias

Hey
Thanks for publishing this!
I noticed that it doesn't seem to work when a model is published with an alias.
I think that
create_constraints.sql:203

            {%- set table_relation = adapter.get_relation(
                database=table_models[0].database,
                schema=table_models[0].schema,
                identifier=table_models[0].name) -%}

should become

...
                identifier=table_models[0].alias) -%}

,
and similar changes on :243 and :248 in the same file.

I wonder if there could be a better way of getting a relation from a node/model than adaptor.get_relation? I couldn't find one.. but surely it must exist...

dbt Constraints should skip constraints for tests with a `where` config

As Philipp Leufke pointed out to me in the #db-snowflake Slack channel:

I wonder how this would work with models that have soft-deleted rows.
Our incremental pattern heavily makes use of soft deletes and any uniqueness or relationship test is set to ignore the soft deleted rows.
So, the FK is only true if these times are excluded.
This is, in our case, achieved by mart models which have such a filter set and which are materialized as views. However, as I understand, this dbt package will ignore such views...

I am strongly considering adding an exclusion for constraints from any tests with a where config property. I can still allow such filters in my tests but relational database constraints can't set conditions. FK and UK should exclude NULL values but that is directly in the ANSI SQL standards and are part of the SQL I use for my unique_key and foreign_key tests. dbt also adds such logic in their default unique and relationship tests:

{% macro default__test_unique(model, column_name) %}

select
    {{ column_name }} as unique_field,
    count(*) as n_records

from {{ model }}
where {{ column_name }} is not null
group by {{ column_name }}
having count(*) > 1

{% endmacro %}

{% macro default__test_relationships(model, column_name, to, field) %}

with child as (
    select {{ column_name }} as from_field
    from {{ model }}
    where {{ column_name }} is not null
),

parent as (
    select {{ field }} as to_field
    from {{ to }}
)

select
    from_field

from child
left join parent
    on child.from_field = parent.to_field

where parent.to_field is null

{% endmacro %}

FK not created in Snowflake when you only have OWNERSHIP on a parent table

We only verify privileges for sources and there is a test for whether you have the REFERENCES privilege on a parent table in a FK. Unfortunately, when you have OWNERSHIP, the Snowflake metadata view doesn't also include REFERENCES. I'm not sure if this was a change in Snowflake behavior but the fix will be simply to update the privilege query to allow both OWNERSHIP and REFERENCES.

Foreign key constraints tested more than once

Foreign key constraints are inherently double-ended: they have the source model (originating-end) where the FK is declared, and there is the target model (receiving-end) to the PK of which the relationship refers.

When defining a test constraint using dbt_constraints.foreign_key package between models A (fk) -> (pk) B, running all tests on the receiving-end models (i.e. dbt test -m B) triggers the testing of the foreign key constraint as well.

For an orchestration scenario that I am prototyping, I would however prefer to avoid running the receiving-end FK test. Thus when testing model A I want the FK test to be evaluated, yet when testing model B I do not want the FK test to be evaluated again.

Is this feature covered? If so, how?

I have the workaround to parse the DBT manifest and explicitly define which tests to run (on model B, in the example above, after I exclude the receiving-ends of FK tests) yet I would be interested in a more elegant solution.

Enhancement: Conditional foreign key test to support data vault

Minor enhancement to accept conditional filtering on the child side of the foreign key test (similar to dbt_utils.relationships_where test

The specific scenario is testing of referential integrity between data vault Hub and Satellite tables where ghost records are added to Satellite tables on initial creation but never to Hubs. All other surrogate keys present in a Satellite table must also be present in the parent Hub

I appreciate the desire to adhere to "referential constraint" as specified in ANSI SQL 92 however I am keen to

get the benefit of applying FKs (Snowflake)
apply ref integrity tests
retain ghost records in satellite tables

Two approaches I can see:

child predicate param (similar to dbt_utils.relationships_where)

#SATELLITES
   - name: sat_reliability_engineering_functional_location_eca
     description: SATELLITE of Functional Location - attributes related to ECA data
     tests:
      - dbt_constraints.foreign_key:
          fk_column_names:
            - DV_SK_HUB_FUNCTIONAL_LOCATION
          child_predicate: DV_SK_HUB_FUNCTIONAL_LOCATION != repeat('0', 64)::BINARY
          pk_table_name: ref('hub_functional_location')
          pk_column_names:
            - DV_SK_HUB_FUNCTIONAL_LOCATION
          tags: ["data_test"]

nullif condition

  #SATELLITES
   - name: sat_reliability_engineering_functional_location_eca
     description: SATELLITE of Functional Location - attributes related to ECA data
     tests:
      - dbt_constraints.foreign_key:
          fk_column_names:
            - DV_SK_HUB_FUNCTIONAL_LOCATION
          fk_nullif: repeat('0', 64)::BINARY
          pk_table_name: ref('hub_functional_location')
          pk_column_names:
            - DV_SK_HUB_FUNCTIONAL_LOCATION
          tags: ["data_test"]

Happy to contribute if agreeable on the request

dbt_constraints produces an error if there are singular tests in dbt

I am trying to setup a singular test in dbt (it’s a test for one specific table - TableA), so I wrote an SQL query which I placed in tests folder. It returns failing rows. However, when I run dbt test —-select tableA, in case the test passes (no failing records), I get the following error:

14:20:57  Running dbt Constraints
14:20:58  Database error while running on-run-end
14:20:59  Encountered an error:
Compilation Error in operation dbt_constraints-on-run-end-0 (./dbt_project.yml)
  'dbt.tableA.graph.compiled.CompiledSingularTestNode object' has no attribute 'test_metadata’

If the test fails, no error appears, I just get info about the failing rows.

It seems that dbt_constraints is causing this problem, specifically this script which runs in the on-run-end hook https://github.com/Snowflake-Labs/dbt_constraints/blob/main/macros/create_constraints.sql. There is no documented way to add test_metadata to a singular test in dbt.

The test is just a simple sql file in tests folder that looks something like this:

tests/table_a_test.sql

SELECT *
FROM {{ ref('TableA') }}
WHERE param_1 NOT IN 
    (SELECT TableB_id
    FROM {{ ref('TableB') }}
    UNION
    SELECT TableC_id
    FROM {{ ref('TableC') }}
    UNION
    SELECT TableD_id
    FROM {{ ref('TableD') }}
    UNION
    SELECT TableE_id
    FROM {{ ref ('TableE') }} )
        and param_2 is null

Thank you!

Failed to create PK constraint if a table already has PK constraint on another column or columns

Some of our models have incremental materialization. This model contains some history we want to keep, so we disabled full_refresh for it.

We wanted to change PK column after adding synthetic (generated) column.

dbt_constraints doesn't check if PK constraint already exists and should be dropped.

Optional foreign keys?

Hey @sfc-gh-dflippo a really useful library you've created here thanks for all the hard work! This is not an issue just a question regarding your thoughts on including optional foreign key mappings? Although not ideal there are circumstances where a user may run into a legacy data mapping that involves 0-1 or 0-* relationship between a child and parent. In these circumstances the child foreign key can be NULL if there is no parent relationship. The foreign key tests for this scenario appear to perform a check for null's between the relationship between the child (foreign) and parent (primary) mapping. Under these circumstances the tests fail and the relationship is not created (as expected).

select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (
with child as (
  select
    CHILD_ID
  from MY_SNOWFLAKE_SCHEMA.My_Child_Table
  where 1=1
        and CHILD_ID is not null     
),

parent as (
    select
        PARENT_ID
    from MY_SNOWFLAKE_SCHEMA.My_Parent_Table
),

validation_errors as (
    select
        child.*
    from child
    left join parent
        on parent.PARENT_ID = child.CHILD_ID
    where parent.PARENT_ID is null
)
select *
from validation_errors
      
    ) dbt_internal_test

Some Data Warehouses (like Snowflake) don't enforce these constraints (Except NULL) but for all the reasons mentioned in the README for enforcing the constraints (particularly reverse engineering Data Models) is there an opportunity to include an option to allow nulls on specific foreign key relationships? Any thoughts on the matter from your experience would be appreciated.

Foreign Key constriants are not created in Snowflake database

we have added a test dbt_constraints.foreign_key: in a model yml file under the concerned column , dbt is executing an alter statement but somehow it isnt working in snowflake
here is the sql command i see executed by dbt core on snowflake db
ALTER TABLE SUPPLY_TEAM ADD CONSTRAINT SUPPLY_TEAM_OFFICE_FK FOREIGN KEY ( OFFICE_ID ) REFERENCES OFFICE ( office_ID ) MATCH SIMPLE RELY
the above sql runs without any errors but i dont see it created on table, i checked the DDL and also ran show imported keys on the child table , i dont see FK in db table

FK relationship to table already in the database

hi there

Thanks for this handy package.

I am very new to DBT and currently using it to add some materialized tables to a database which already has tables in it (generated from another process, not dbt).

I want to use dbt_constraints to specify the FK relationships to existing tables in the database. But, these are not created as models with dbt. Is this possible?

At the moment, when I run dbt test and just specify the table name, it says it passes in creating the FK constraints, but these do not show up in the database or schema as actually have been generated.

As an example, I have this:

tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns:
            - forecast_period_startdate
            - period_fk
            - facility_fk
            - product_fk
      - dbt_constraints.primary_key:
          column_name: id
          constraint_name: mt_facility_pred_actual_price_pkey
      - dbt_constraints.foreign_key:
          fk_column_name: period_fk
          pk_table_name: period
          pk_column_name: id

So, period is a table already in the database and I want to make the materialized table period_fk column have a Fk constraint to the id of this column. Is this possible?

Is this package still needed if `dbt >= 1.5.0` ?

Hello guys,

I'm starting my journey with dbt, to replace a old ETL (GUI) process which uses a database on-premise for destination.

My DW is a SQL Server, and I'm looking for the best way to implement PK / FK at my final models (facts and dims in a star-schema).

Then I reached at this nice package and at the same time, I also found that dbt 1.5 implements Model Contracts and Constraints Feature.

So as far I understand, if my dbt is >= 1.5.0 I can implement constraints without the need of this package.

Is that right or there is something that I'm missing?

BR!

question regarding MATCH option on creating Snowflake constraints

I want to add a fk key constrint in my dbt model but the problem is that the column which has to be FK is not null and it is different than the referenced column ( pk in other table so not null).

I tested it in Snowflake and I saw that I can have the fk in this way only by using this script:

ALTER TABLE add constraint fk_something ( fk_key) references PK_table ( pk_key) MATCH SIMPLE RELY.

I checked the code but I did not see anything for the options ( MATCH FULL/SIMPLE/PARTIAL), I only saw that the generated code will be alter table ... RELY .

Thank you!

not_null failure where error tolerance

We have a test as follows:

        tests:
          - not_null:
              config:
                severity: error
                error_if: ">1000"
                warn_if: ">1000"

The purpose of this test is to ensure a certain data completion tolerance and flag where this tolerance is not being met for the business to take action.

Behaviour for newer versions of dbt_constraints attempts to apply a not_null constraint in this situation and fails because it's not a true not_null in the database sense. There needs to be some means to ignore the implementation of this constraint - automatic or by configuration.

Current version pinning prevents version bump to dbt_utils 0.9.x

Hello!

In attempting to bump my local project to the newest 0.9.x revisions of dbt_utils I found that our dependency on dbt_constraints prevented that upgrade due to pinning to a maximum of 0.8.x of that same package. After reviewing the changes to dbt_utils as well as the use of that package in this package I believe it is safe to relax the existing constraints to use the 0.9.x train. I was able to validate that change for Postgres in #20 but do not have access to Snowflake or other databases this may affect.

not_null constraint causing error on severity: warn with warn_if > 0

I have a not_null test configured to only warn if the number of failures is >100.

        name: mostly_not_null_column
        tests:
          - not_null:
              config:
                severity: warn
                warn_if: ">100"

It appears this is not triggering dbt_constraints check for failed tests. I get the following error:

on-run-end failed, error:
column "mostly_not_null_column" of relation "my_relation" contains null values

Test times out - Foreign Key constraint test on Postgres tables with 800k+ rows

I am unable to get Postgres to finish running foreign key constraint on two tables with <1M rows runs.

The current implementation uses a subquery to identify pk-fk mismatches. I assume that Snowflake handles these subqueries w/out a hitch; Postgres struggles to do so. Some brief testing shows me that it works fine for tables with 50k rows, but @ 100k+ it's not functional.

Could this could be refactored to reflect the implementation of the relationships test built-in to dbt-core?

Looks like they started with the same method but eventually moved to a left join several years ago - see here and here

It's not clear whether the left join approach is equally beneficial for snowflake. I will make a pull request with this change - can the owners of this repo let me know if the change should be made under default__test_constraints.sql or only for postgres via the adapter dispatch (i.e. postgres__create_constraints.sql ?

Run constraints without running tests

Hi! We have a scheduled job every hour where we run DBT but we don't want to run the tests every hour, is it possible to run dbt constraints without running the tests? If so, how would I do that?

Redshift Support

I'm trying to use dbt_constraints.primary_key with not_null, but get an "can not make a nullable column a primary key" error when trying to create the primary key. As I understand, dbt columns are always "nullable". Does the macro have to alter the column to be NOT NULL, or is there another way around this?

My test is defined as:

tests:
  - dbt_constraints.primary_key
  - not_null

Disable constraint creation, after tests are run.

Normally when one runs say the production build nightly or whatever you'd like the constraints to be added as normal so one could generate diagrams with the relations e.t.c.

There might be the case however that you're developing some new test-cases or want to rerun for whatever reason, then you might have the test cases on your local machine and just want to run the tests against the production-environment where we don't have write access.

It would be great then if it would be possible to turn off the hook that runs at the end of the test-suite that creates the constraints so that it doesn't fail at that point with an access error preventing the test-error details to be displayed.

Thanks for creating this package, it is very useful!

Using constraints in models without name and column parameters

I'm using datavault4dbt. Models look a little different than the standard.
In my model CUST_HUB_CUSTOMER_CUST.sql I call the post-hook

{% set CUST_HUB_CUSTOMER_CUST = api.Relation.create(schema='dwhstage', identifier='CUST_HUB_CUSTOMER_CUST', type='table') %}
{{
    config({
	  "post-hook": [

		"{{ dbt_constraints.create_primary_key(table_model='CUST_HUB_CUSTOMER_CUST', column_names=['CUSTKEY_HK'], verify_permissions=false, quote_columns=false, constraint_name=PK_CUST, lookup_cache=none)}}"
	  ]
    })
}}


{%- set yaml_metadata -%}
source_models:
  stg_customer:
    bk_columns: 'C_CUSTKEY'
    rsrc_static: 'Customer'
  stg_customer_taxlocation:
    bk_columns: 'C_CUSTKEY'
    rsrc_static: 'Daniel'
  stg_telephone:
    bk_columns: 'C_CUSTKEY'
    rsrc_static: 'Daniel'
hashkey: CUSTKEY_HK
business_keys:
  - 'C_CUSTKEY'

{%- endset -%}

{% set metadata_dict = fromyaml(yaml_metadata) %}

{{ datavault4dbt.hub(hashkey=metadata_dict.get("hashkey"),
        business_keys=metadata_dict.get("business_keys"),
        source_models=metadata_dict.get("source_models")) }}

... and receive this error

15:48:00 Encountered an error:
Compilation Error in model CUST_HUB_CUSTOMER_CUST (models\raw_vault\CUST_HUB_CUSTOMER_CUST.sql)
'None' has no attribute 'table'

in macro unique_constraint_exists (macros\create_constraints.sql)
called by macro oracle__create_primary_key (macros\oracle__create_constraints.sql)
called by macro create_primary_key (macros\create_constraints.sql)
called by model CUST_HUB_CUSTOMER_CUST (models\raw_vault\CUST_HUB_CUSTOMER_CUST.sql)

I tried creating a table_relation but the error stayed the same.

{% set CUST_HUB_CUSTOMER_CUST = api.Relation.create(schema='dwhstage', identifier='CUST_HUB_CUSTOMER_CUST', type='table') %}
{{
    config({
	  "post-hook": [

		"{{ dbt_constraints.create_primary_key(table_model='CUST_HUB_CUSTOMER_CUST', column_names=['CUSTKEY_HK'], verify_permissions=false, quote_columns=false, constraint_name=PK_CUST, lookup_cache=none)}}"
	  ]
    })
}}I

the package is installed properly I see the drop constraints statements in the log.
I added name and columns parameters to this model, but it didn't help.

Any idea how I can create a relation object in order to run this?

Latest version 0.6.0 working on dbt-snowflake 1.4.1 (core version 1.4.5

Hi,

When running on version 0.6.0 with dbt-snowflake 1.4.1 (which is core version 1.4.5), I get the following error:

dbt build -s departure_source_market
12:35:16 Running with dbt=1.4.5
12:35:17 Encountered an error:
Compilation Error
dbt found two macros named "test_primary_key" in the project
"dbt_constraints".
To fix this error, rename or remove one of the following macros:
- macros\create_constraints.sql
- macros\create_constraints.sql

After downgrading to 0.5.3, it works again:
dbt deps
12:38:21 Running with dbt=1.4.5
12:38:22 Installing Snowflake-Labs/dbt_constraints
12:38:23 Installed from version 0.5.3
12:38:23 Updated version available: 0.6.0
12:38:23
12:38:23 Updates available for packages: ['Snowflake-Labs/dbt_constraints']
Update your versions in packages.yml, then run dbt deps

dbt build -s departure_source_market
12:38:34 Running with dbt=1.4.5
[...]
12:38:49 Running 1 on-run-end hook
12:38:49 Running dbt Constraints
12:38:49 Finished dbt Constraints
12:38:49 1 of 1 START hook: dbt_constraints.on-run-end.0 ................................ [RUN]
12:38:49 1 of 1 OK hook: dbt_constraints.on-run-end.0 ................................... [OK in 0.00s]

Could you take a look please?

Create constraint only on table recreation

It would be great, if creations of constraints could be disabled for incremental materialization, and enabled for materialize=table and for full refresh.

All ALTERs rolled back on PostgreSQL

I'm using dbt_constraints 0.31 on dbt 1.1.1 against PostgreSQL 14.1.

On PostgreSQL the plugin is running all the right statements, but then the DDL gets rolled back (unlike Snowflake, PostgreSQL has fully transaction DDL), so we end up with no constraints after all:

02:10:08.238679 [debug] [MainThread]: No FK key: "shop_development"."dbt"."stores" ['zone_id']
02:10:08.239331 [info ] [MainThread]: Creating foreign key: STORES_ZONE_ID_FK referencing zones ['zone_id']
02:10:08.239958 [debug] [MainThread]: Using postgres connection "master"
02:10:08.240087 [debug] [MainThread]: On master: /* {"app": "dbt", "dbt_version": "1.1.1", "profile_name": "shop", "target_name": "dev", "connection_name": "master"} */

    ALTER TABLE "shop_development"."dbt"."stores" ADD CONSTRAINT STORES_ZONE_ID_FK FOREIGN KEY ( zone_id ) REFERENCES "shop_development"."dbt"."zones" ( zone_id )
  
02:10:08.240851 [debug] [MainThread]: SQL status: ALTER TABLE in 0.0 seconds
02:10:08.241739 [info ] [MainThread]: Finished dbt Constraints
02:10:08.241975 [debug] [MainThread]: Writing injected SQL for node "operation.dbt_constraints.dbt_constraints-on-run-end-0"
02:10:08.242926 [info ] [MainThread]: 1 of 1 START hook: dbt_constraints.on-run-end.0 ................................ [RUN]
02:10:08.243174 [info ] [MainThread]: 1 of 1 OK hook: dbt_constraints.on-run-end.0 ................................... [�[32mOK�[0m in 0.00s]
02:10:08.243380 [info ] [MainThread]: 
02:10:08.243572 [debug] [MainThread]: On master: ROLLBACK

Probably the easiest thing to do would be to commit explicitly after?