GithubHelp home page GithubHelp logo

slidoapp / dbt-superset-lineage Goto Github PK

View Code? Open in Web Editor NEW
128.0 4.0 14.0 2 MB

Make dbt docs and Apache Superset talk to one another

License: MIT License

Python 100.00%
dbt superset data-lineage lineage cli tool

dbt-superset-lineage's Introduction

dbt-superset-lineage

License: MIT PyPI GitHub last commit PyPI - Python Version PyPI - Format

dbt-superset-lineage

Make dbt docs and Apache Superset talk to one another

Why do I need something like this?

Odds are rather high that you use dbt together with a visualisation tool. If so, these questions might have popped into your head time to time:

  • "Could I get rid of this model? Does it get used for some dashboards? And in which ones, if yes?"
  • "It would be so handy to see all these well-maintained model and column descriptions when exploring and creating charts."

In case your visualisation tool of choice is Supserset, you are in luck!

Using dbt-superset-lineage, you can:

  • Add dependencies of Superset dashboards to your dbt sources and models
  • Sync model and column descriptions from dbt docs to Superset

This will help you:

  • Avoid broken dashboards because of deprecated or changed models
  • Choosing the right attributes without navigating back and forth between chart and documentation

Demo

The package was presented during Coalesce, the annual dbt conference, as a part of the talk From 100 spreadsheets to 100 data analysts: the story of dbt at Slido. Watch a demo in the video below.

Demo video

Installation

pip install dbt-superset-lineage

Usage

dbt-superset-lineage comes with two basic commands: pull-dashboards and push-descriptions. The documentation for the individual commands can be shown by using the --help option.

It includes a wrapper for Superset API, one only needs to provide SUPERSET_ACCESS_TOKEN/SUPERSET_REFRESH_TOKEN (obtained via /security/login) as environment variable or through --superset-access-token/superset-refresh-token option.

N.B.

  • Make sure to run dbt compile (or dbt run) against the production profile, not your development profile
  • In case more databases are used within dbt and/or Superset and there are duplicate names (schema + table) across them, specify the database through --dbt-db-name and/or --superset-db-id options
  • Currently, PUT requests are only supported if CSRF tokens are disabled in Superset (WTF_CSRF_ENABLED=False).
  • Tested on dbt v1.4.5 and Apache Superset v2.0.1. Other versions might face errors due to different underlying code and API.

Pull dashboards

Pull dashboards from Superset and add them as exposures to dbt docs with references to dbt sources and models, making them visible both separately and as dependencies.

N.B.

  • Only published dashboards are extracted.
$ cd jaffle_shop
$ dbt compile  # Compile project to create manifest.json
$ export SUPERSET_ACCESS_TOKEN=<TOKEN>
$ dbt-superset-lineage pull-dashboards https://mysuperset.mycompany.com  # Pull dashboards from Superset to /models/exposures/superset_dashboards.yml
$ dbt docs generate # Generate dbt docs
$ dbt docs serve # Serve dbt docs

Separate exposure in dbt docs

Referenced exposure in dbt docs

Push descriptions

Push model and column descriptions from your dbt docs to Superset as plain text so that they could be viewed in Superset when creating charts.

N.B.:

  • Run carefully as this rewrites your datasets using merged metadata from Superset and dbt docs.
  • Running with --superset-refresh-columns overrides columns.filterable and columns.groupby to true, because of this issue.
  • Descriptions are rendered as plain text, hence no markdown syntax, incl. links, will be displayed.
  • Avoid special characters and strings in your dbt docs, e.g. or <null>.
$ cd jaffle_shop
$ dbt compile  # Compile project to create manifest.json
$ export SUPERSET_ACCESS_TOKEN=<TOKEN>
$ dbt-superset-lineage push-descriptions https://mysuperset.mycompany.com  # Push descrptions from dbt docs to Superset

Column descriptions in Superset

License

Licensed under the MIT license (see LICENSE.md file for more details).

dbt-superset-lineage's People

Contributors

dependabot[bot] avatar mrshu avatar muslimbeibituly avatar one-data-cookie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dbt-superset-lineage's Issues

Update needed on sqlfluff version

Because dbt-superset-lineage (0.2.1) depends on sqlfluff (>=0.8.2,<0.9.0)
and no versions of dbt-superset-lineage match >0.2.1,<0.3.0, dbt-superset-lineage (>=0.2.1,<0.3.0) requires sqlfluff (>=0.8.2,<0.9.0).
So, because seed depends on both dbt-superset-lineage (^0.2.1) and sqlfluff (^1.4.1), version solving failed.

Feature request: Append create Superset chart URL in dbt model description

In an effort to ease user flow from dbt docs into Superset, we could programmatically append a markdown hyperlink into a dbt model's description, pointing to the "create chart" page in Superset.

For example, consider the following file, orders.yml

# orders.yml
models:
  - name: orders
    description: |
      This table has basic information about orders, as well as some derived facts based on payments

      [Explore this table!](http://localhost:8088/explore/?datasource_type=table&datasource_id=97)
    columns: ...

dbt docs page:
Screenshot 2024-02-21 at 4 21 00 PM

Clicking "Explore this table!" takes you to this Superset page:

Screenshot 2024-02-21 at 4 44 54 PM

Referenced SQLFluff Version is Still 1.4.5

#32 states that SQLFluff should have been updated to 2.1.2, but it seems like dbt-superset-lineage is still referencing 1.4.5.

Commands On a Clean Virtual Instance

pip list

Package    Version
---------- -------
pip        22.3.1
setuptools 68.0.0
wheel      0.38.4

pipenv install dbt-superset-lineage

Package              Version
-------------------- ---------
...
setuptools           68.0.0
soupsieve            2.4.1
sqlfluff             1.4.5
tblib                2.0.0
toml                 0.10.2
...

Pull dashboards has old API calls

Hey guys,
I adapted the pull_dashboards.py file to use it as, when trying to run the code out of the box, I was not getting the dashboards' table names.

Out of my interpretation GET /dashboard/id doesn't return "table_name" anymore and it was replaced by a similar command I found (GET /dashboard/id/datasets).

Let me know how this sounds to you, I am happy to contribute with my minor edits.
Best,
Agus

400 Bad Request: The CSRF token is missing

Hi, is this repo still actively maintained. I have been testing this on a Docker deployment of Superset and unable to disable CSRF tokens. there are a number of threads on this which seem to indicate this is a common issue. Is there a way to ensure this library works for POST commands with SS or any specific steps to disable CSRF.

Feature request: Create datasets from dbt models if not present in Superset

Hi! I want to start a conversation about having this library also support a feature to create a dataset if one does not exist in Superset, perhaps as part of the push_descriptions command or an entirely new command, say, create_datasets.

Automatically creating datasets would solve for two use-cases:

  1. Ensuring any new dbt models are synced into Superset without having to explicitly create it in Superset itself.
  2. Helps sync existing dbt models into a freshly provisioned Superset instance -- again reduces effort to create the corresponding datasets in Superset (via a separate script or manual actions in the Superset UI)

This PR (rohitsanj#1) against my own fork of this repo introduces a new flag create_dataset_if_not_exists to the push_descriptions command. I've also added in a new folder called dbt_schemas containing the dbt manifest JSON schema and the schema-generated pydantic models -- this is used to parse the dbt manifest.json file to provide helpful type hints when developing and automatic data validation at runtime.

Would love to know the community's thoughts on this and if others have come across the requirement for such a feature. Thanks!

Appears that Refresh Token expires?

Hi team. First off, thanks for the great project.

We generated a token via the API (/security/login) which includes an access + refresh token. We stored the output of these as secure credentials in our CI environment. Things have been working great with this script in CI for the past 2 weeks. Then, all of a sudden we're getting 401 Unauthorized responses for /security/refresh calls.

Nothing has changed in our environment / users-wise.

I'm curious if the refresh tokens only last for 2 weeks now? Is there a way to generate longer lived API tokens somehow?

This issue is likely outside the scope of this project per-say; however, I couldn't find very detailed information on the API access tokens anywhere else and was curious if others had experienced this or had a workaround to generate tokens that could persist in CI at some longer/indefinite period.

/ping @one-data-cookie

/cc @bkimjin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.