GithubHelp home page GithubHelp logo

Thoughts on dbt about prql HOT 5 CLOSED

prql avatar prql commented on May 19, 2024
Thoughts on dbt

from prql.

Comments (5)

drewbanin avatar drewbanin commented on May 19, 2024 6

@max-sixty I love the way you're thinking about this!! We've certainly been considering what a language that extends SQL could look like. Internally, we're considering this analogous to what Typescript did for Javascript. We don't have much beyond some hack-day prototypes and brainstorming docs on our end, and this repo got a lot of 👀 emojis when I shared it with the team :D

Half-baked thoughts incoming:

  • We like how flexible Jinja is, but we dislike how it's challenging and slow it is to use as a parser. We ended up writing a static analysis tool that could extract ref(), source(), and config() function calls from Jinja-space a lot faster than Jinja can! Check that out here. Side note: I really, really like tree-sitter
  • I always get stuck when considering if a language built on top of SQL should have some native understanding if loops or loop-like constructs. It's fairly common to do something like this in dbt:
select
  country
  {% for product in ['shoes', 'shirts', 'pants']  %}
  , sum(case when product = '{{ product }}' then 1 else 0 end) as count_{{ product }}

from {{ ref('orders') }}
group by 1

What would a language-based construct for this look like? Could it be:

select
  country,
  map sum(case when product = _iter then 1 else 0 end) across ['shoes', 'shirts', 'pants'] with column prefix count_

from orders

This feels kind of appealing -- you could imagine really good syntax highlighting, linting, static validation, column-level lineage, etc.... but I've never been able to find a syntax for this kind of thing that felt coherent and sensible.

I'd be sooo very excited to dig further into prql and learn more about the project! Loving what I'm seeing though @max-sixty :D

from prql.

max-sixty avatar max-sixty commented on May 19, 2024 5

Thanks for the words of support @drewbanin

What would a language-based construct for this look like?

For your example, I think the current impl would be:

func count_product product_name = (
  "count_{product_name}": (
    product = product_name ? 1 : 0
    sum
  )
)

from orders
aggregate [country] [
    count_product shoes
    count_product shirts
    count_product pants
]

...and that adding a map function could work too, so the second statement becomes (and also fine to put the func inline like your example):

from orders
aggregate [country] [
    map count_product [shoes, shirts, pants]
]

(I'm sure there are some corner cases we'd need to think about re how functions work in aggregate, similar to https://github.com/max-sixty/prql/issues/9#issuecomment-1020875475, but I don't think those are intractable)

This feels kind of appealing -- you could imagine really good syntax highlighting, linting, static validation, column-level lineage, etc.... but I've never been able to find a syntax for this kind of thing that felt coherent and sensible.

Yes, I've been thinking about this more the past few days — I very much agree this sort of thing would become possible and really good, even if it's quite a way off.

  • We ended up writing a static analysis tool that could extract ref(), source(), and config() function calls from Jinja-space a lot faster than Jinja can! Check that out here.

That looks really cool. I'll keep an eye on it. Fully agree tree-sitter is great. I had been thinking of it for incremental syntax highlighting rather than as a full parser, but maybe I'd been underrating it? If anyone has views, please feel free to open an issue.

from prql.

max-sixty avatar max-sixty commented on May 19, 2024 1

Right, building on what @tcholewik said — PRQL's goal is to make analytical work easier, so we should focus on solving a distinct problem really well, and I'm seeing that as the analytical query itself.

I love dbt and have introduced it to both companies that I've worked at over the past few years, to great internal success. If PRQL is successful, hopefully it could have an interface to dbt. I think dbt will always do the things you suggest better than PRQL could (i.e. testing / docs / dags).

I do think you're getting at something insightful — much of the jinja or macros that people use in dbt is because of shortcomings with SQL, and those could be displaced by a better language, like PRQL. dbt have done a great job building things like dbt-utils to make that even more modular; but there's only so much string interpolation can do.

I'll CC some of the dbt folks I've interacted with in the past few years to ask if they have any thoughts? @drewbanin @jtcohen6

from prql.

tcholewik avatar tcholewik commented on May 19, 2024

Actually I don't think that for this tool to be popular it would have to displace dbt.
Instead if it integrated well with dbt if could be used a side by side, where prql would compile queries and dbt act as runner for those queries.

Just like you said I also think this could be an upgrade over jinja templates.

from prql.

max-sixty avatar max-sixty commented on May 19, 2024

Closed by https://github.com/prql/dbt-prql (which may move to another repo)

from prql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.