GithubHelp home page GithubHelp logo

dbt-checkpoint

CI black black

Sponsors

Datacoves

Hosted VS Code, dbt-core, SqlFluff, and Airflow, find out more at Datacoves.com.


Goal

dbt-checkpoint provides pre-commit hooks to ensure the quality of your dbt projects.

dbt is awesome, but when the number of models, sources, and macros in a project grows, it becomes challenging to maintain the same level of quality across developers.. Users forget to update columns in property(yml) files or add table and column add descriptions. Without automation the reviewer workload increases and unintentional errors may be missed. dbt-checkpoint allows organizations to add automated validations improving your code review and release process.

Telemetry

dbt-checkpoint has telemetry built into some of its hooks to help the maintainers from Datacoves understand which hooks are being used and which are not to prioritize future development of dbt-checkpoint. We do not track credentials nor details of your dbt execution such as model names. We also do not track any of the dbt hooks, such as for generating documentation. The one detail we do use related to dbt is the anonymous user_id generated by dbt to help us identify distinct projects.

By default this is turned on – you can opt out of event tracking at any time by adding the following to your .dbt-checkpoint.yaml file:

version: 1
disable-tracking: true

Setting dbt project root

You can specify a dbt project root directory for all hooks. This is particularly useful when your dbt project is not located at the root of your repository but in a sub-directory of it.

In that situation, you previously had to specify a --manifest flag in each hook.

Now, you can avoid repeating yourself by adding the dbt-project-dir key to your .dbt-checkpoint.yaml config file:

version: 1
dbt-project-dir: my_dbt_project

This way, we will automatically look for the required manifest/catalog inside your my_dbt_project project folder.

General exclude and per-hook excluding

Since dbt-checkpoint 1.1.0, certain hooks implement an implicit logic that "discover" their sql/yml equivalent for checking.

For a complete background please refer to #118.

Since the root-level exclude statement is handled by pre-commit, when those hooks discover their related sql/yml files, this root exclusion is ommitted (dbt-checkpoint re-includes files that may have been excluded). To exclude files from being discovered by this logic, the exclude path/regex must be provided in each hook (#119)

List of dbt-checkpoint hooks

💡 Click on hook name to view the details.

Model checks:

Script checks:

Source checks:

Macro checks:

Modifiers:

dbt commands:


If you have a suggestion for a new hook or you find a bug, let us know

Install

For detailed installation and usage, instructions see pre-commit.com site.

pip install pre-commit

Setup

  1. Create a file named .pre-commit-config.yaml in your project root folder.
  2. Add list of hooks you want to run befor every commit. E.g.:
repos:
- repo: https://github.com/dbt-checkpoint/dbt-checkpoint
  rev: v1.2.1
  hooks:
  - id: dbt-parse
  - id: dbt-docs-generate
    args: ["--cmd-flags", "++no-compile"]
  - id: check-script-semicolon
  - id: check-script-has-no-table-name
  - id: check-model-has-all-columns
    name: Check columns - core
    files: ^models/core
  - id: check-model-has-all-columns
    name: Check columns - mart
    files: ^models/mart
  - id: check-model-columns-have-desc
    files: ^models/mart
  1. Optionally, run pre-commit install to set up the git hook scripts. With this, pre-commit will run automatically on git commit! You can also manually run pre-commit run after you stage all files you want to run. Or pre-commit run --all-files to run the hooks against all of the files (not only staged).

Run in CI/CD

Unfortunately, you cannot natively use dbt-checkpoint if you are using dbt Cloud. But you can run checks after you push changes into Github.

dbt-checkpoint for the most of the hooks needs manifest.json (see requirements section in hook documentation), that is in the target folder. Since this target folder is usually in .gitignore, you need to generate it. For that you need to run the dbt-parse command. To be able to parse dbt, you also need profiles.yml file with your credentials. To provide passwords and secrets use Github Secrets (see example).

Say you want to check that a model contains at least two tests, you would use this configuration:

repos:
- repo: https://github.com/dbt-checkpoint/dbt-checkpoint
 rev: v1.2.1
 hooks:
 - id: check-model-has-tests
   args: ["--test-cnt", "2", "--"]

To be able to run this in Github CI you need to modified it to:

repos:
- repo: https://github.com/dbt-checkpoint/dbt-checkpoint
 rev: v1.2.1
 hooks:
 - id: dbt-parse
 - id: check-model-has-tests
   args: ["--test-cnt", "2", "--"]

Create profiles.yml

First step is to create profiles.yml. E.g.

# example profiles.yml file
jaffle_shop:
  target: dev
  outputs:
    dev:
      type: snowflake
      threads: 8
      client_session_keep_alive: true
      account: "{{ env_var('ACCOUNT') }}"
      database: "{{ env_var('DATABASE') }}"
      schema: "{{ env_var('SCHEMA') }}"
      user: "{{ env_var('USER') }}"
      password: "{{ env_var('PASSWORD') }}"
      role: "{{ env_var('ROLE') }}"
      warehouse: "{{ env_var('WAREHOUSE') }}"

and store this file in project root ./profiles.yml.

Create new workflow

  • inside your Github repository create folder .github/workflows (unless it already exists).
  • create new file e.g. pr.yml
  • specify your workflow e.g.:
name: dbt-checkpoint

on:
  push:
  pull_request:
    branches:
      - main

jobs:
  dbt-checkpoint:
    runs-on: ubuntu-latest
    env:
      ACCOUNT: ${{ vars.ACCOUNT }}
      DATABASE: ${{ vars.DATABASE }}
      SCHEMA: ${{ vars.SCHEMA }}
      USER: ${{ vars.USER }}
      PASSWORD: ${{ secrets.PASSWORD }}
      ROLE: ${{ vars.ROLE }}
      WAREHOUSE: ${{ vars.WAREHOUSE }}
    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      - name: Setup Python
        uses: actions/setup-python@v2

      - id: Get file changes
        uses: trilom/[email protected]
        with:
          output: " "

      - name: Run dbt checkpoint
        uses: dbt-checkpoint/[email protected]
        with:
          extra_args: --files ${{ steps.get_file_changes.outputs.files}}
          dbt_version: 1.6.3
          dbt_adapter: dbt-snowflake

Acknowledgements

Thank you to Radek Tomšej for initial development and maintenance of this great package, and for sharing your work with the community!

dbt-checkpoint's Projects

action icon action

Github Action to run dbt-checkpoint hooks

dbt-checkpoint icon dbt-checkpoint

:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.