snowflake-labs / dbt_constraints Goto Github PK
View Code? Open in Web Editor NEWThis package generates database constraints based on the tests in a dbt project
License: Apache License 2.0
This package generates database constraints based on the tests in a dbt project
License: Apache License 2.0
It would be nice to, when desired, override the generated name of the constraint.
Since PostgreSQL has a predefined limit on the identifier length - 63 bytes, there is a chance that we run into the duplicate name issue.
How to handle it( the possible solution):
by generating a unique constraint name - #18
As a developer, I would like dbt_constraints to add constraints for my tests on seeds, so that I can benefit from join elimination in queries on my seed tables.
Because NOT NULL is enforced by Snowflake, I can rely on it. So if I have this constraint, running the test is pointless.
On the other hand in some cases we might want to NOT set any constraint, but still run the test, just to asses quality of the upstream data, but not interrupt our transformations in case of nulls.
To sum up, I would like to be able to choose any of combinations:
You can now define Foreign and Primary keys on table columns in BigQuery. Would love to see this implemented 🙏 ❤️
Google docs regarding adding FK and PK constraints.
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#alter_table_add_constraint_statement
Queries to pull the information off the schema about FK & PK.
https://cloud.google.com/bigquery/docs/information-schema-constraint-column-usage
SELECT *
FROM <PROJECT_ID>.<DATASET>.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE;
https://cloud.google.com/bigquery/docs/information-schema-key-column-usage
SELECT *
FROM <PROJECT_ID>.<DATASET>.INFORMATION_SCHEMA.KEY_COLUMN_USAGE;
https://cloud.google.com/bigquery/docs/information-schema-table-constraints
SELECT *
FROM <PROJECT_ID>.<DATASET>.INFORMATION_SCHEMA.TABLE_CONSTRAINTS;
General spot for BQ schema information: https://cloud.google.com/bigquery/docs/information-schema-intro
The macro create_constraints_by_type contains a check for, among other things, the type of materialization (table, incremental, snapshot, and seed) when determining if the table resulting from a model will have constraints created for it. This prevents dbt_constraints from creating constraints on tables created by custom materialization types.
Please add the ability to specify additional materialization types for which constraints can be created (or remove the check entirely?).
Thanks.
always_create_constraint config is working only with thresholds (warn_if and error_if) .. It is not forcing the constraint with just severity: warn.
-- Not setting the FK
-- Setting the FK
It appears that the Not Null constraints are done last:
https://github.com/Snowflake-Labs/dbt_constraints/blob/main/macros/create_constraints.sql#L146
Some databases (SQL Server at least) do not allow ALTER COLUMN on a column that another constraint (PK, UK, FK) is dependent on. You'll get an error like "The object 'SOME_TABLE_SOME_COLUMN_UK' is dependent on column 'SOME_COLUMN'".
Additionally, some databases will not allow Primary Keys on a Nullable column.
Does it make sense to have Not Null done first?
The Please Note section of the dbt_constraints README.md explains that "When you add this package, dbt will automatically begin to create ... not null constraints for not_null tests." The Disabling automatic constraint generation subsection details how not null constraints can (only) be disabled for sources.
There is no functionality to disable constraint creation for other upstream model types like snapshots. Currently, my organization's dbt project has not null tests defined for most snapshots. Constraints should be created for production data through development with the option to enable and disable creating constraints for raw/source and snapshot data.
Hey
Thanks for publishing this!
I noticed that it doesn't seem to work when a model is published with an alias.
I think that
create_constraints.sql:203
{%- set table_relation = adapter.get_relation(
database=table_models[0].database,
schema=table_models[0].schema,
identifier=table_models[0].name) -%}
should become
...
identifier=table_models[0].alias) -%}
,
and similar changes on :243 and :248 in the same file.
I wonder if there could be a better way of getting a relation from a node/model than adaptor.get_relation
? I couldn't find one.. but surely it must exist...
As Philipp Leufke pointed out to me in the #db-snowflake Slack channel:
I wonder how this would work with models that have soft-deleted rows.
Our incremental pattern heavily makes use of soft deletes and any uniqueness or relationship test is set to ignore the soft deleted rows.
So, the FK is only true if these times are excluded.
This is, in our case, achieved by mart models which have such a filter set and which are materialized as views. However, as I understand, this dbt package will ignore such views...
I am strongly considering adding an exclusion for constraints from any tests with a where
config property. I can still allow such filters in my tests but relational database constraints can't set conditions. FK and UK should exclude NULL values but that is directly in the ANSI SQL standards and are part of the SQL I use for my unique_key
and foreign_key
tests. dbt also adds such logic in their default unique and relationship tests:
{% macro default__test_unique(model, column_name) %}
select
{{ column_name }} as unique_field,
count(*) as n_records
from {{ model }}
where {{ column_name }} is not null
group by {{ column_name }}
having count(*) > 1
{% endmacro %}
{% macro default__test_relationships(model, column_name, to, field) %}
with child as (
select {{ column_name }} as from_field
from {{ model }}
where {{ column_name }} is not null
),
parent as (
select {{ field }} as to_field
from {{ to }}
)
select
from_field
from child
left join parent
on child.from_field = parent.to_field
where parent.to_field is null
{% endmacro %}
We only verify privileges for sources and there is a test for whether you have the REFERENCES
privilege on a parent table in a FK. Unfortunately, when you have OWNERSHIP
, the Snowflake metadata view doesn't also include REFERENCES.
I'm not sure if this was a change in Snowflake behavior but the fix will be simply to update the privilege query to allow both OWNERSHIP
and REFERENCES
.
Foreign key constraints are inherently double-ended: they have the source model (originating-end) where the FK is declared, and there is the target model (receiving-end) to the PK of which the relationship refers.
When defining a test constraint using dbt_constraints.foreign_key
package between models A (fk) -> (pk) B
, running all tests on the receiving-end models (i.e. dbt test -m B
) triggers the testing of the foreign key constraint as well.
For an orchestration scenario that I am prototyping, I would however prefer to avoid running the receiving-end FK test. Thus when testing model A I want the FK test to be evaluated, yet when testing model B I do not want the FK test to be evaluated again.
Is this feature covered? If so, how?
I have the workaround to parse the DBT manifest and explicitly define which tests to run (on model B, in the example above, after I exclude the receiving-ends of FK tests) yet I would be interested in a more elegant solution.
Minor enhancement to accept conditional filtering on the child side of the foreign key test (similar to dbt_utils.relationships_where test
The specific scenario is testing of referential integrity between data vault Hub and Satellite tables where ghost records are added to Satellite tables on initial creation but never to Hubs. All other surrogate keys present in a Satellite table must also be present in the parent Hub
I appreciate the desire to adhere to "referential constraint" as specified in ANSI SQL 92 however I am keen to
Two approaches I can see:
#SATELLITES
- name: sat_reliability_engineering_functional_location_eca
description: SATELLITE of Functional Location - attributes related to ECA data
tests:
- dbt_constraints.foreign_key:
fk_column_names:
- DV_SK_HUB_FUNCTIONAL_LOCATION
child_predicate: DV_SK_HUB_FUNCTIONAL_LOCATION != repeat('0', 64)::BINARY
pk_table_name: ref('hub_functional_location')
pk_column_names:
- DV_SK_HUB_FUNCTIONAL_LOCATION
tags: ["data_test"]
#SATELLITES
- name: sat_reliability_engineering_functional_location_eca
description: SATELLITE of Functional Location - attributes related to ECA data
tests:
- dbt_constraints.foreign_key:
fk_column_names:
- DV_SK_HUB_FUNCTIONAL_LOCATION
fk_nullif: repeat('0', 64)::BINARY
pk_table_name: ref('hub_functional_location')
pk_column_names:
- DV_SK_HUB_FUNCTIONAL_LOCATION
tags: ["data_test"]
Happy to contribute if agreeable on the request
I am trying to setup a singular test in dbt (it’s a test for one specific table - TableA), so I wrote an SQL query which I placed in tests folder. It returns failing rows. However, when I run dbt test —-select tableA, in case the test passes (no failing records), I get the following error:
14:20:57 Running dbt Constraints
14:20:58 Database error while running on-run-end
14:20:59 Encountered an error:
Compilation Error in operation dbt_constraints-on-run-end-0 (./dbt_project.yml)
'dbt.tableA.graph.compiled.CompiledSingularTestNode object' has no attribute 'test_metadata’
If the test fails, no error appears, I just get info about the failing rows.
It seems that dbt_constraints is causing this problem, specifically this script which runs in the on-run-end hook https://github.com/Snowflake-Labs/dbt_constraints/blob/main/macros/create_constraints.sql. There is no documented way to add test_metadata to a singular test in dbt.
The test is just a simple sql file in tests
folder that looks something like this:
tests/table_a_test.sql
SELECT *
FROM {{ ref('TableA') }}
WHERE param_1 NOT IN
(SELECT TableB_id
FROM {{ ref('TableB') }}
UNION
SELECT TableC_id
FROM {{ ref('TableC') }}
UNION
SELECT TableD_id
FROM {{ ref('TableD') }}
UNION
SELECT TableE_id
FROM {{ ref ('TableE') }} )
and param_2 is null
Thank you!
Some of our models have incremental
materialization. This model contains some history we want to keep, so we disabled full_refresh
for it.
We wanted to change PK column after adding synthetic (generated) column.
dbt_constraints
doesn't check if PK constraint already exists and should be dropped.
Hey @sfc-gh-dflippo a really useful library you've created here thanks for all the hard work! This is not an issue just a question regarding your thoughts on including optional foreign key mappings? Although not ideal there are circumstances where a user may run into a legacy data mapping that involves 0-1 or 0-* relationship between a child and parent. In these circumstances the child foreign key can be NULL if there is no parent relationship. The foreign key tests for this scenario appear to perform a check for null's between the relationship between the child (foreign) and parent (primary) mapping. Under these circumstances the tests fail and the relationship is not created (as expected).
select
count(*) as failures,
count(*) != 0 as should_warn,
count(*) != 0 as should_error
from (
with child as (
select
CHILD_ID
from MY_SNOWFLAKE_SCHEMA.My_Child_Table
where 1=1
and CHILD_ID is not null
),
parent as (
select
PARENT_ID
from MY_SNOWFLAKE_SCHEMA.My_Parent_Table
),
validation_errors as (
select
child.*
from child
left join parent
on parent.PARENT_ID = child.CHILD_ID
where parent.PARENT_ID is null
)
select *
from validation_errors
) dbt_internal_test
Some Data Warehouses (like Snowflake) don't enforce these constraints (Except NULL) but for all the reasons mentioned in the README for enforcing the constraints (particularly reverse engineering Data Models) is there an opportunity to include an option to allow nulls on specific foreign key relationships? Any thoughts on the matter from your experience would be appreciated.
we have added a test dbt_constraints.foreign_key: in a model yml file under the concerned column , dbt is executing an alter statement but somehow it isnt working in snowflake
here is the sql command i see executed by dbt core on snowflake db
ALTER TABLE SUPPLY_TEAM ADD CONSTRAINT SUPPLY_TEAM_OFFICE_FK FOREIGN KEY ( OFFICE_ID ) REFERENCES OFFICE ( office_ID ) MATCH SIMPLE RELY
the above sql runs without any errors but i dont see it created on table, i checked the DDL and also ran show imported keys on the child table , i dont see FK in db table
hi there
Thanks for this handy package.
I am very new to DBT and currently using it to add some materialized tables to a database which already has tables in it (generated from another process, not dbt).
I want to use dbt_constraints
to specify the FK relationships to existing tables in the database. But, these are not created as models with dbt. Is this possible?
At the moment, when I run dbt test
and just specify the table name, it says it passes in creating the FK constraints, but these do not show up in the database or schema as actually have been generated.
As an example, I have this:
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- forecast_period_startdate
- period_fk
- facility_fk
- product_fk
- dbt_constraints.primary_key:
column_name: id
constraint_name: mt_facility_pred_actual_price_pkey
- dbt_constraints.foreign_key:
fk_column_name: period_fk
pk_table_name: period
pk_column_name: id
So, period
is a table already in the database and I want to make the materialized table period_fk
column have a Fk constraint to the id of this column. Is this possible?
Hello guys,
I'm starting my journey with dbt, to replace a old ETL (GUI) process which uses a database on-premise for destination.
My DW is a SQL Server, and I'm looking for the best way to implement PK / FK at my final models (facts and dims in a star-schema).
Then I reached at this nice package and at the same time, I also found that dbt 1.5 implements Model Contracts and Constraints Feature.
So as far I understand, if my dbt
is >= 1.5.0
I can implement constraints without the need of this package.
Is that right or there is something that I'm missing?
BR!
I want to add a fk key constrint in my dbt model but the problem is that the column which has to be FK is not null and it is different than the referenced column ( pk in other table so not null).
I tested it in Snowflake and I saw that I can have the fk in this way only by using this script:
ALTER TABLE add constraint fk_something ( fk_key) references PK_table ( pk_key) MATCH SIMPLE RELY.
I checked the code but I did not see anything for the options ( MATCH FULL/SIMPLE/PARTIAL), I only saw that the generated code will be alter table ... RELY .
Thank you!
We have a test as follows:
tests:
- not_null:
config:
severity: error
error_if: ">1000"
warn_if: ">1000"
The purpose of this test is to ensure a certain data completion tolerance and flag where this tolerance is not being met for the business to take action.
Behaviour for newer versions of dbt_constraints attempts to apply a not_null constraint in this situation and fails because it's not a true not_null in the database sense. There needs to be some means to ignore the implementation of this constraint - automatic or by configuration.
Hello!
In attempting to bump my local project to the newest 0.9.x
revisions of dbt_utils
I found that our dependency on dbt_constraints
prevented that upgrade due to pinning to a maximum of 0.8.x
of that same package. After reviewing the changes to dbt_utils
as well as the use of that package in this package I believe it is safe to relax the existing constraints to use the 0.9.x
train. I was able to validate that change for Postgres in #20 but do not have access to Snowflake or other databases this may affect.
I have a not_null test configured to only warn if the number of failures is >100.
name: mostly_not_null_column
tests:
- not_null:
config:
severity: warn
warn_if: ">100"
It appears this is not triggering dbt_constraints
check for failed tests. I get the following error:
on-run-end failed, error:
column "mostly_not_null_column" of relation "my_relation" contains null values
I am unable to get Postgres to finish running foreign key constraint on two tables with <1M rows runs.
The current implementation uses a subquery to identify pk-fk mismatches. I assume that Snowflake handles these subqueries w/out a hitch; Postgres struggles to do so. Some brief testing shows me that it works fine for tables with 50k rows, but @ 100k+ it's not functional.
Could this could be refactored to reflect the implementation of the relationships test built-in to dbt-core?
Looks like they started with the same method but eventually moved to a left join several years ago - see here and here
It's not clear whether the left join approach is equally beneficial for snowflake. I will make a pull request with this change - can the owners of this repo let me know if the change should be made under default__test_constraints.sql or only for postgres via the adapter dispatch (i.e. postgres__create_constraints.sql ?
Hi! We have a scheduled job every hour where we run DBT but we don't want to run the tests every hour, is it possible to run dbt constraints without running the tests? If so, how would I do that?
I'm trying to use dbt_constraints.primary_key
with not_null
, but get an "can not make a nullable column a primary key" error when trying to create the primary key. As I understand, dbt columns are always "nullable". Does the macro have to alter the column to be NOT NULL, or is there another way around this?
My test is defined as:
tests:
- dbt_constraints.primary_key
- not_null
Normally when one runs say the production build nightly or whatever you'd like the constraints to be added as normal so one could generate diagrams with the relations e.t.c.
There might be the case however that you're developing some new test-cases or want to rerun for whatever reason, then you might have the test cases on your local machine and just want to run the tests against the production-environment where we don't have write access.
It would be great then if it would be possible to turn off the hook that runs at the end of the test-suite that creates the constraints so that it doesn't fail at that point with an access error preventing the test-error details to be displayed.
Thanks for creating this package, it is very useful!
I'm using datavault4dbt. Models look a little different than the standard.
In my model CUST_HUB_CUSTOMER_CUST.sql I call the post-hook
{% set CUST_HUB_CUSTOMER_CUST = api.Relation.create(schema='dwhstage', identifier='CUST_HUB_CUSTOMER_CUST', type='table') %}
{{
config({
"post-hook": [
"{{ dbt_constraints.create_primary_key(table_model='CUST_HUB_CUSTOMER_CUST', column_names=['CUSTKEY_HK'], verify_permissions=false, quote_columns=false, constraint_name=PK_CUST, lookup_cache=none)}}"
]
})
}}
{%- set yaml_metadata -%}
source_models:
stg_customer:
bk_columns: 'C_CUSTKEY'
rsrc_static: 'Customer'
stg_customer_taxlocation:
bk_columns: 'C_CUSTKEY'
rsrc_static: 'Daniel'
stg_telephone:
bk_columns: 'C_CUSTKEY'
rsrc_static: 'Daniel'
hashkey: CUSTKEY_HK
business_keys:
- 'C_CUSTKEY'
{%- endset -%}
{% set metadata_dict = fromyaml(yaml_metadata) %}
{{ datavault4dbt.hub(hashkey=metadata_dict.get("hashkey"),
business_keys=metadata_dict.get("business_keys"),
source_models=metadata_dict.get("source_models")) }}
... and receive this error
15:48:00 Encountered an error:
Compilation Error in model CUST_HUB_CUSTOMER_CUST (models\raw_vault\CUST_HUB_CUSTOMER_CUST.sql)
'None' has no attribute 'table'
in macro unique_constraint_exists (macros\create_constraints.sql)
called by macro oracle__create_primary_key (macros\oracle__create_constraints.sql)
called by macro create_primary_key (macros\create_constraints.sql)
called by model CUST_HUB_CUSTOMER_CUST (models\raw_vault\CUST_HUB_CUSTOMER_CUST.sql)
I tried creating a table_relation but the error stayed the same.
{% set CUST_HUB_CUSTOMER_CUST = api.Relation.create(schema='dwhstage', identifier='CUST_HUB_CUSTOMER_CUST', type='table') %}
{{
config({
"post-hook": [
"{{ dbt_constraints.create_primary_key(table_model='CUST_HUB_CUSTOMER_CUST', column_names=['CUSTKEY_HK'], verify_permissions=false, quote_columns=false, constraint_name=PK_CUST, lookup_cache=none)}}"
]
})
}}I
the package is installed properly I see the drop constraints statements in the log.
I added name and columns parameters to this model, but it didn't help.
Any idea how I can create a relation object in order to run this?
Hi,
When running on version 0.6.0 with dbt-snowflake 1.4.1 (which is core version 1.4.5), I get the following error:
dbt build -s departure_source_market
12:35:16 Running with dbt=1.4.5
12:35:17 Encountered an error:
Compilation Error
dbt found two macros named "test_primary_key" in the project
"dbt_constraints".
To fix this error, rename or remove one of the following macros:
- macros\create_constraints.sql
- macros\create_constraints.sql
After downgrading to 0.5.3, it works again:
dbt deps
12:38:21 Running with dbt=1.4.5
12:38:22 Installing Snowflake-Labs/dbt_constraints
12:38:23 Installed from version 0.5.3
12:38:23 Updated version available: 0.6.0
12:38:23
12:38:23 Updates available for packages: ['Snowflake-Labs/dbt_constraints']
Update your versions in packages.yml, then run dbt deps
dbt build -s departure_source_market
12:38:34 Running with dbt=1.4.5
[...]
12:38:49 Running 1 on-run-end hook
12:38:49 Running dbt Constraints
12:38:49 Finished dbt Constraints
12:38:49 1 of 1 START hook: dbt_constraints.on-run-end.0 ................................ [RUN]
12:38:49 1 of 1 OK hook: dbt_constraints.on-run-end.0 ................................... [OK in 0.00s]
Could you take a look please?
It would be great, if creations of constraints could be disabled for incremental materialization, and enabled for materialize=table and for full refresh.
I'm using dbt_constraints 0.31 on dbt 1.1.1 against PostgreSQL 14.1.
On PostgreSQL the plugin is running all the right statements, but then the DDL gets rolled back (unlike Snowflake, PostgreSQL has fully transaction DDL), so we end up with no constraints after all:
02:10:08.238679 [debug] [MainThread]: No FK key: "shop_development"."dbt"."stores" ['zone_id']
02:10:08.239331 [info ] [MainThread]: Creating foreign key: STORES_ZONE_ID_FK referencing zones ['zone_id']
02:10:08.239958 [debug] [MainThread]: Using postgres connection "master"
02:10:08.240087 [debug] [MainThread]: On master: /* {"app": "dbt", "dbt_version": "1.1.1", "profile_name": "shop", "target_name": "dev", "connection_name": "master"} */
ALTER TABLE "shop_development"."dbt"."stores" ADD CONSTRAINT STORES_ZONE_ID_FK FOREIGN KEY ( zone_id ) REFERENCES "shop_development"."dbt"."zones" ( zone_id )
02:10:08.240851 [debug] [MainThread]: SQL status: ALTER TABLE in 0.0 seconds
02:10:08.241739 [info ] [MainThread]: Finished dbt Constraints
02:10:08.241975 [debug] [MainThread]: Writing injected SQL for node "operation.dbt_constraints.dbt_constraints-on-run-end-0"
02:10:08.242926 [info ] [MainThread]: 1 of 1 START hook: dbt_constraints.on-run-end.0 ................................ [RUN]
02:10:08.243174 [info ] [MainThread]: 1 of 1 OK hook: dbt_constraints.on-run-end.0 ................................... [�[32mOK�[0m in 0.00s]
02:10:08.243380 [info ] [MainThread]:
02:10:08.243572 [debug] [MainThread]: On master: ROLLBACK
Probably the easiest thing to do would be to commit explicitly after?
Creation of constraints should be skipped if tests pass but do have a warn_if / error_if != 0 configured.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.