sqlfluff / sqlfluff Goto Github PK

View Code? Open in Web Editor NEW

7.4K 7.4K 669.0 27.24 MB

A modular SQL linter and auto-formatter with support for multiple dialects and templated code.

Home Page: https://www.sqlfluff.com

License: MIT License

Python 81.90% Dockerfile 0.03% SQL 18.05% Shell 0.02%

hacktoberfest pypi sql sql-linter

sqlfluff's Introduction

The SQL Linter for Humans

SQLFluff is a dialect-flexible and configurable SQL linter. Designed with ELT applications in mind, SQLFluff also works with Jinja templating and dbt. SQLFluff will auto-fix most linting errors, allowing you to focus your time on what matters.

Dialects Supported

Although SQL is reasonably consistent in its implementations, there are several different dialects available with variations of syntax and grammar. SQLFluff currently supports the following SQL dialects (though perhaps not in full):

ANSI SQL - this is the base version and on occasion may not strictly follow the ANSI/ISO SQL definition
Athena
BigQuery
ClickHouse
Databricks (note: this extends the sparksql dialect with Unity Catalog syntax).
Db2
DuckDB
Exasol
Greenplum
Hive
Materialize
MySQL
Oracle
PostgreSQL (aka Postgres)
Redshift
Snowflake
SOQL
SparkSQL
SQLite
Teradata
Transact-SQL (aka T-SQL)
Trino
Vertica

We aim to make it easy to expand on the support of these dialects and also add other, currently unsupported, dialects. Please raise issues (or upvote any existing issues) to let us know of demand for missing support.

Pull requests from those that know the missing syntax or dialects are especially welcomed and are the question way for you to get support added. We are happy to work with any potential contributors on this to help them add this support. Please raise an issue first for any large feature change to ensure it is a good fit for this project before spending time on this work.

Templates Supported

SQL itself does not lend itself well to modularity, so to introduce some flexibility and reusability it is often templated as discussed more in our modularity documentation.

SQLFluff supports the following templates:

Jinja (aka Jinja2)
SQL placeholders (e.g. SQLAlchemy parameters)
Python format strings
dbt (requires plugin)

Again, please raise issues if you wish to support more templating languages/syntaxes.

VS Code Extension

We also have a VS Code extension:

Getting Started

To get started, install the package and run sqlfluff lint or sqlfluff fix.

$ pip install sqlfluff
$ echo "  SELECT a  +  b FROM tbl;  " > test.sql
$ sqlfluff lint test.sql --dialect ansi
== [test.sql] FAIL
L:   1 | P:   1 | LT01 | Expected only single space before 'SELECT' keyword.
                       | Found '  '. [layout.spacing]
L:   1 | P:   1 | LT02 | First line should not be indented.
                       | [layout.indent]
L:   1 | P:   1 | LT13 | Files must not begin with newlines or whitespace.
                       | [layout.start_of_file]
L:   1 | P:  11 | LT01 | Expected only single space before binary operator '+'.
                       | Found '  '. [layout.spacing]
L:   1 | P:  14 | LT01 | Expected only single space before naked identifier.
                       | Found '  '. [layout.spacing]
L:   1 | P:  27 | LT01 | Unnecessary trailing whitespace at end of file.
                       | [layout.spacing]
L:   1 | P:  27 | LT12 | Files must end with a single trailing newline.
                       | [layout.end_of_file]
All Finished 📜 🎉!

Alternatively, you can use the Official SQLFluff Docker Image or have a play using SQLFluff online.

For full CLI usage and rules reference, see the SQLFluff docs.

Documentation

For full documentation visit docs.sqlfluff.com. This documentation is generated from this repository so please raise issues or pull requests for any additions, corrections, or clarifications.

Releases

SQLFluff adheres to Semantic Versioning, so breaking changes should be restricted to major versions releases. Some elements (such as the python API) are in a less stable state and may see more significant changes more often. For details on breaking changes and how to migrate between versions, see our release notes. See the changelog for more details. If you would like to join in, please consider contributing.

New releases are made monthly. For more information, visit Releases.

SQLFluff on Slack

We have a fast-growing community on Slack, come and join us!

SQLFluff on Twitter

Contributing

We are grateful to all our contributors. There is a lot to do in this project, and we are just getting started.

If you want to understand more about the architecture of SQLFluff, you can find more here.

If you would like to contribute, check out the open issues on GitHub. You can also see the guide to contributing.

Sponsors

The turnkey analytics stack, find out more at Datacoves.com.

sqlfluff's People

Contributors

Stargazers

Watchers

Forkers

mrshu sumit0k hemphill39 henry-kr scaryclam barrywhart sanketsaurav dandandan jklipp ryantuck katzmann1983 cwsaunders alchaplinsky nolanbconaway cromgit thomafred liamjtaylor simonstjg rkm3 ericxiao251 phcorcoran s-pace heshen sethwoodworth dmateusp pwildenhain noahbruegmann niallrees pavantatikonda dnshio segv lpillmann piotrgredowski zuliatowoade ry-v1 10101010 how-sqool-is-that sidhreddy lng03 dorzey artownsend m-kuhn gabestep dojinkimm davehowell drewmcdonald scrambldchannel aeolun opengisch salmonsd geofftk infused-kim davidjohnquinlan satish-ravi mt-pooh peiwangdb sreev zhongjiajie iserko nevado franloza netlify emily-hawkins osvenskan dataders teemuhonkanen tomasfarias fernandobrito bolajiwahab stjordanis jasonspeck fxztam statunizaga suryatmodulus cclauss epic-r-r linchun3 data-analisis ktaranov lhinrhyjiwe rajthilakmca magicianred blunney1 yoyodynecorp ngaut tilast qianniaoge 5l1v3r1 julianopiovezan tmastny optionalg dflss takimo adam-tokarski silverbullettruck andres-lowrie thien-vuong-cgc boblannon muskanmahajan37 silverbullettruck2001

sqlfluff's Issues

Bug: Specifying paths which don't exist fails messily

When specifying a path which doesn't exist, we get a long traceback response rather than a compact and concise message saying that the path doesn't exist. For example the command:

sqlfluff lint foo

We get the output:

$ sqlfluff lint foo
Traceback (most recent call last):
  File "C:\Users\usr\dev2\sqlfluff\env\Scripts\sqlfluff-script.py", line 11, in <module>
    load_entry_point('sqlfluff', 'console_scripts', 'sqlfluff')()
  File "C:\Users\usr\dev2\sqlfluff\env\lib\site-packages\click\core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\usr\dev2\sqlfluff\env\lib\site-packages\click\core.py", line 717, in main
    rv = self.invoke(ctx)
  File "C:\Users\usr\dev2\sqlfluff\env\lib\site-packages\click\core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\usr\dev2\sqlfluff\env\lib\site-packages\click\core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\alan\dev2\sqlfluff\env\lib\site-packages\click\core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "c:\users\usr\dev2\sqlfluff\src\sqlfluff\cli\commands.py", line 138, in lint
    result = lnt.lint_paths(paths, verbosity=verbose)
  File "c:\users\usr\dev2\sqlfluff\src\sqlfluff\linter.py", line 424, in lint_paths
    result.add(self.lint_path(path, verbosity=verbosity, fix=fix))
  File "c:\users\usr\dev2\sqlfluff\src\sqlfluff\linter.py", line 408, in lint_path
    for fname in self.paths_from_path(path):
  File "c:\users\usr\dev2\sqlfluff\src\sqlfluff\linter.py", line 379, in paths_from_path
    raise IOError("Specified path does not exist")
OSError: Specified path does not exist

better would be something like:

$ sqlfluff lint foo
Error: the path `foo` doesn't exist.

Unexpected behavior of 'sqlfluff fix' for "Indentation length is not a multiple of 4"

Fixing indentation issues in this file ends up moving the incorrectly indented text left (unindenting it) rather than right (moving it to the indentation I expected).

Input:

SELECT
  user_id,
  list_id,
    gender_label,
    audience
FROM
  age_data
JOIN
    gender_data
USING
  (user_id, list_id)
JOIN
  audience_size
USING
  (user_id, list_id)
LEFT JOIN
  verts
USING
  (user_id)

Output:

SELECT
user_id,
list_id,
    gender_label,
    audience
FROM
age_data
JOIN
    gender_data
USING
(user_id, list_id)
JOIN
audience_size
USING
(user_id, list_id)
LEFT JOIN
verts
USING
(user_id)

Is there a way to have it deduce and apply the correct indentation based on the structure of the query?

Expected output:

SELECT
    user_id,
    list_id,
    gender_label,
    audience
FROM
    age_data
JOIN
    gender_data
USING
    (user_id, list_id)
JOIN
    audience_size
USING
    (user_id, list_id)
LEFT JOIN
    verts
USING
    (user_id)

Bug: The `fix` command is not templating safe

Running the command below does fix the linting error but it also removes the templating tags and replaces them with the dummy commands.

sqlfluff fix test\fixtures\templater\jinja_b\jinja.sql --rules L010

To fix this we need a way that that during the fixing process we compare the templated and un-templated versions to identify where substitutions have been made and either ignore those fixes (probably raising a warning) - or even better, compensate for them and try to work around them.

Error parsing complex mathematical expression

Input:

SELECT
    COS(2*ACOS(-1)*2*y/53) AS c2
FROM
    t

Output:

$ sqlfluff lint test1.sql
== [test1.sql] FAIL
L:   2 | P:  10 | L006 | Operators should be preceded by a space.
L:   2 | P:  11 | L006 | Operators should be followed by a space.
L:   2 | P:  15 | ???? | Found unparsable segment @L002P015: '(-1)*2*y/53...'
L:   2 | P:  31 | L014 | Inconsistent capitalisation of unquoted identifiers.
L:   4 | P:   5 | L014 | Inconsistent capitalisation of unquoted identifiers.

Interestingly, adding whitespace around the operators also fixes the parsing error. I wonder if two adjacent tokens are being mistakenly parsed as a single token when spaces are missing.

Here's the same SQL with spaces (and no parsing error):

SELECT
    COS(2 * ACOS(-1) * 2 * y / 53) AS c2
FROM
    t

Output:

$ sqlfluff lint test1.sql
== [test1.sql] FAIL
L:   2 | P:  22 | L006 | Operators should be preceded by a space.

I don't know why there's still a warning about needing space around an operator. Can you look into this as well?

Enhancement: Structure and Documentation of Dialects

Similar to #39 we should have good documentation of the various grammars. Ideally this should be auto-generated from the grammar itself.

Enhancement: Support BiqQuery Dialect with table expressions.

This query uses some ARRAY operations. This is BigQuery, but I think the same or similar SQL is valid in Postgres, possibly other databases.

Input:

SELECT
    y AS woy
FROM
    UNNEST(GENERATE_ARRAY(1, 53)) AS y

Output:

$ sqlfluff lint test2.sql
== [test2.sql] FAIL
L:   3 | P:   1 | ???? | Found unparsable segment @L003P001: 'FROM\n    UNNEST(GENE...'

Bug: SQLFluff needs to include jinja2 in its "install_requires" list in setup.py

Without this, if I pip install sqlfluff and run it, I get an error, e.g.:

$ sqlfluff lint test.sql
SELECT
Traceback (most recent call last):
  File "/Users/bhart/.pyenv/versions/sqlfluff-prod-3.6.4/bin/sqlfluff", line 11, in <module>
SELECT
    load_entry_point('sqlfluff==0.2.3', 'console_scripts', 'sqlfluff')()
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/cli/commands.py", line 139, in lint
    result = lnt.lint_paths(paths, verbosity=verbose)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/linter.py", line 589, in lint_paths
    result.add(self.lint_path(path, verbosity=verbosity, fix=fix))
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/linter.py", line 576, in lint_path
    linted_path.add(self.lint_string(f.read(), fname=fname, verbosity=verbosity, fix=fix, config=config))
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/linter.py", line 452, in lint_string
    parsed, vs, time_dict = self.parse_string(s=s, fname=fname, verbosity=verbosity, config=config)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/linter.py", line 392, in parse_string
    s = self.templater.process(s, fname=fname, config=config or self.config)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/templaters.py", line 178, in process
    from jinja2 import Environment, StrictUndefined  # noqa
ModuleNotFoundError: No module named 'jinja2'

Using with python raw sql

Hi! Thanks a lot for this awesome project!

Is it possible to use it with python code and sql written as strings inside?
It is pretty sad to know that these strings cannot be linted right now 😞

Enhancement: Merge segment.name and segment.type.

Quite a low level issue here. Segments have both a name and a type property. They're becoming increasingly synonymous and I'm no longer convinced it makes sense to have both. I think we should work out how to remove one.

Feature Request: File based configuration

First of all thank you @alanmcruickshank , Really nice work on authoring this project.

I wanted to know what are your plans on putting more rules or making it more configurable via JSON Config as parameter as I would like to use this in our organisation and would hate to have multiple projects to maintain internally.

Mainly around this points in your ToDos

Configurable linting
- Command line options for config
- Ability to read from config files

Bug: Running "sqlfluff version" on a fresh install throws an error

I installed sqlfluff and ran the version and lint commands. Both failed, probably due to a missing configuration file. It would be good to report this error in a nicer way.

Here's the result I saw:

$ sqlfluff version
Traceback (most recent call last):
  File "/usr/local/pyenv/versions/3.7.3/bin/sqlfluff", line 11, in <module>
    load_entry_point('sqlfluff==0.2.1', 'console_scripts', 'sqlfluff')()
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/sqlfluff/cli/commands.py", line 82, in version
    c = get_config(**kwargs)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/sqlfluff/cli/commands.py", line 55, in get_config
    return FluffConfig.from_root(overrides=overrides)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/sqlfluff/config.py", line 283, in from_root
    return cls(configs=c, overrides=overrides)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/sqlfluff/config.py", line 263, in __init__
    if self._configs['core']['rules']:
KeyError: 'rules'

Rule: Detect incorrect usage / understanding of DISTINCT

I often run across code where someone is using DISTINCT as if it were a function, when it's actually an option that applies to the whole list of result columns, e.g.:

SELECT
  DISTINCT(a),
  b,
  c
FROM
  foo

In this case, they are going to get all distinct combinations of a, b, and c, not just the distinct values of a as they seem to think.

In this case, the suggested corrections are one of:

Switch to using GROUP BY a and apply an aggregation function to b and c
Remove the parentheses after DISTINCT

Related info:

Enhancement: Specifying rules at the command should fail loudly.

At the moment - if someone specifies a rule in the CLI which doesn't actually exist, there is no warning or error. We should at least give some feedback about this.

Rule: Lint indentation within queries.

This could be a big one, but I think we need to start on it before too long. This will also be controversial so need a bit of homework before anyone does any coding.

Want to help? Please post links and examples to good indentation here.

Output for subdirectories and files should be in alphabetical order

When reviewing output for a large directory structure, I noticed that the files are processed in an sequence that jumps across directories. It would be cool to process in alphabetical order (and depth first, I think?).

Example output (with the actual lint warnings removed, leaving just the filenames):

== [sql/benchmark_summary_queries/constraints/_postcondition_no_nulls.sql] FAIL
== [sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql] FAIL
== [sql/benchmark_summary_queries/constraints/_postcondition_anonymized_buckets.sql] FAIL

Notice a file from sql/audience_size_queries/ appears in between the output for two files in the same directory, sql/benchmark_summary_queries/.

Warn on missing whitespace in CTE declaration

I'd like to see a warning if there is missing whitespace after the AS in a CTE declaration, e.g.:

WITH
    count_audience_size AS( # <--- No space between AS and (
        SELECT
            user_id
        FROM
            table
    )

SELECT * FROM count_audience_size

Enhancement: Add the `NOT IN (SELECT *)` syntax

Query:

SELECT
    user_id AS non_GDPR_users
FROM
    p.d.t AS query
WHERE user_id NOT IN (SELECT user_id FROM p.d.t2)

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   2 | P:  16 | L014 | Inconsistent capitalisation of unquoted identifiers.
L:   5 | P:  15 | ???? | Found unparsable segment @L005P015: 'NOT IN (SELECT user_...'

Bug: fix ValueError: Unexpected capitalisation policy: 'inconsistent'

I don't even know what is happening here!

➜ sqlfluff version                                                                                   
0.2.4

➜ echo 'selECT * from table;' > test.sql

➜ sqlfluff fix test.sql --rules L001,L002,L003,L004,L005,L006,L007,L008,L009,L010,L011,L012,L013,L014
==== finding violations ====
Traceback (most recent call last):
  File "/Users/nolan/anaconda3/bin/sqlfluff", line 10, in <module>
    sys.exit(cli())
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/cli/commands.py", line 174, in fix
    result = lnt.lint_paths(paths, verbosity=verbose)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/linter.py", line 605, in lint_paths
    result.add(self.lint_path(path, verbosity=verbosity, fix=fix))
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/linter.py", line 592, in lint_path
    fix=fix, config=config))
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/linter.py", line 536, in lint_string
    lerrs, _, _, _ = crawler.crawl(parsed)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/base.py", line 192, in crawl
    raw_stack=raw_stack, fix=fix, memory=memory)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/base.py", line 192, in crawl
    raw_stack=raw_stack, fix=fix, memory=memory)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/base.py", line 192, in crawl
    raw_stack=raw_stack, fix=fix, memory=memory)
  [Previous line repeated 1 more time]
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/base.py", line 162, in crawl
    raw_stack=raw_stack, memory=memory)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/std.py", line 488, in _eval
    segment, self.capitalisation_policy))
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/std.py", line 463, in make_replacement
    return make_replacement(seg, list(cases_seen)[0])
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/std.py", line 469, in make_replacement
    raise ValueError("Unexpected capitalisation policy: {0!r}".format(policy))
ValueError: Unexpected capitalisation policy: 'inconsistent'

Enhancement: Support "smart indent" where appropriate

This is something to think about, not a "do exactly this" issue.

This query produces a warning:

SELECT
    a
FROM
    t
JOIN
    u
USING
    (user_id,
     list_id)

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   9 | P:   1 | L003 | Indentation length is not a multiple of 4.

It might be cool to support something like the Python "hanging indent" described here. I.e. since the second column is indented to align with the first column name, this should be reported as okay (no warning).

There may be other SQL constructs which could/should support the same behavior.

Enhancement: Structure and Documentation of Rules

There should be a framework for auto-documentation of rules. Currently they're not documented at all.

Feature request: functions (`DATE()`) and proper `ORDER BY`

Consider the following query:

SELECT
    col_a,
    col_b,
    date_col_a,
    date_col_b
FROM "database"."sample_table"
WHERE
    DATE(date_col_b) >= current_date
    AND length(col_a) = 4
ORDER BY date_col_a DESC

When saved to a.sql we can execute sqlfluff on it, which results into the following:

$ sqlfluff lint --rules L001,L002,L003,L004,L005,L008 -n a.sql
== [a.sql] FAIL
L:   7 | P:   1 | ???? | Found unparsable segment @ 7,1: 'WHERE\n    DATE(date_...'
L:  10 | P:  21 | ???? | Found unparsable segment @ 10,21: 'DESC\n...'

This is unexpected, as the query has no trouble being executed in PrestoDB (that is AWS Athena) for which it was prepared.

Could it be that the quotes (") around column/database names may cause the problem? Or is it the DATE function?

Do let me know what I can do to help debug this even further.

Thanks!

Make the documentation less awful.

The documentation is now getting to a scale where it doesn't just work in one Markdown file. We need something better, and probably hosted elsewhere (readthedocs or similar).

This is an issue to track that.

Bug: "CROSS JOIN" is not parsed correctly

This query has a cross join. It looks fine to me, yet it produces two warnings. I think CROSS JOIN is not recognized, and it thinks CROSS is an alias for the correctly_substituted table rather than a keyword.

Query:

SELECT
    count_correctly_substituted
FROM
    correctly_substituted
CROSS JOIN
    needs_substitution

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   5 | P:   1 | L011 | Implicit aliasing of table not allowed. Use explicit `AS` clause.
L:   5 | P:   1 | L014 | Inconsistent capitalisation of unquoted identifiers.

I suspect

New rule: When defining a column or table alias, always include the AS keyword

This is more readable, as it makes the aliasing explicit.

Rule: No nested selects within join clauses.

This is a pretty general, high-level quality check. I often see this issue with SQL written by new data scientists or engineers -- they'll deliver a working query, but it's monolithic and 50+ lines long. Even with proper indentation, it's hard to grasp what going on.

When this occurs, SQLFluff should suggest breaking the query into two or more logical chunks, with some of these chunks written as (descriptively named) CTEs.

Bug: fix from stdin thinks "-" is a file

I was excited to find that stdin input is now accepted for fix! I tried it out and found that it is not really:

➜ sqlfluff version  
0.2.4

➜ echo 'select col from tbl' | sqlfluff fix - --rules L001
==== finding violations ====
The path(s) ('-',) could not be accessed. Check it/they exist(s).

I might just try to figure out whats going on here, but worth making an issue to keep track of the bug!

Enhancement: "--" comment does not parse correctly if following space is missing (mysql dialect)

This query generates a parse error. It should probably generate a warning about a missing space after the comment marker.

Query:

--Blah blah
SELECT
    'abc'

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   1 | P:   1 | ???? | Found unparsable segment @L001P001: '--Blah blah\nSELECT\n ...'

Enhancement: Inequality operator fails to parse

Both versions of the SQL inequality operator fail to parse:

<>
!=

E.g.

SELECT
    user_id
FROM
    p.d.t
WHERE
    p.d.t != 1

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   6 | P:  11 | ???? | Found unparsable segment @L006P011: '!= 1\n...'

Bug: Unexpected L003 fix behavior on python2

On python3, L003 handles this case as expected:

➜ echo '  select 1 from tbl;' | sqlfluff fix - --rules L003
select 1 from tbl;

But not on python2:

➜ echo '  select 1 from tbl;' | sqlfluff fix - --rules L003
    select 1 from tbl;

➜ python --version       
Python 2.7.16

➜ sqlfluff version  
0.2.4

I have noticed that python2 handles this correctly if there is only one preceding space. any idea what could be going on?

Documentation: Auto document templaters, dialects and grammars.

Similar to #44 and #39 this is taken from the old TODO in the README.md.

We need to document templaters, dialects and grammars. This should as far as possible be done using sphinx and autodoc so that it stays up to date.

Feature request: autofix lint errors (that can be fixed)

Similar to autopep8, eslint --fix, or rubocop --auto-correct, it would be amazing if sqlfluff was able to fix some of the lint errors it discovered automatically. This is good because it:

makes users using sqlfluff able to move faster by not having to fix all errors themselves
makes rolling out sqlfluff to a new codebase faster and less of a speedbump
makes the signal to noise ratio of the linter much higher

This is definitely a big one, but it makes a huge difference in usability of the linter. I find that a lot of folks like the consistency that a linter brings but aren't willing to put in the manual effort to manually wrap lines, edit indentation, etc etc, so they end up with a lot of disabled rules or using the linter mostly as an advisory thing instead of a thing that would run in CI and enforce good quality code.

The antidote to that is making it so that humans only have to deal with lint errors when they themselves have actually made an errors that the computer can't fix on its own, like a very broken syntax error or a reference to an undefined variable. That way, "boring" lint fails (like whitespace) are just things you don't have to care about ever because the linter fixes them, and if you're feeling real frisky, you can set your editor to "Format on Save" and watch your code get better every time you save.

Rule: Inconsistent capitalisation of keywords

Either keywords should be uppercase or lowercase -but within a file they should be consistent.

An extension of this rule would be to allow a more specific version to be configured using the new config framework.

Bug: Issues with nested functions (e.g. SPLIT(LOWER(text), ' ') and so on)

Consider the following query:

SELECT
    SPLIT(LOWER(text), ' ') AS text
FROM "database"."sample_table"

Saving this query to b.sql and running sqlfluff lint -n b.sql results in the following:

Traceback (most recent call last):
  File "/home/user/sql/.venv/bin/sqlfluff", line 11, in <module>
    load_entry_point('sqlfluff==0.1.3', 'console_scripts', 'sqlfluff')()
  File "/home/user/sql/.venv/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/sql/.venv/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/cli/commands.py", line 85, in lint
    result = lnt.lint_paths(paths, verbosity=verbose)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/linter.py", line 324, in lint_paths
    result.add(self.lint_path(path, verbosity=verbosity, fix=fix))
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/linter.py", line 312, in lint_path
    linted_path.add(self.lint_file(f, fname=fname, verbosity=verbosity, fix=fix))
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/linter.py", line 212, in lint_file
    parsed, vs, time_dict = self.parse_file(f=f, fname=fname, verbosity=verbosity)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/linter.py", line 193, in parse_file
    parsed = fs.parse(recurse=recurse, verbosity=verbosity, dialect=self.dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 228, in parse
    verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 394, in expand
    res = stmt.parse(recurse=recurse, parse_depth=parse_depth, verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 228, in parse
    verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 394, in expand
    res = stmt.parse(recurse=recurse, parse_depth=parse_depth, verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 228, in parse
    verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 394, in expand
    res = stmt.parse(recurse=recurse, parse_depth=parse_depth, verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 228, in parse
    verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 394, in expand
    res = stmt.parse(recurse=recurse, parse_depth=parse_depth, verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 186, in parse
    dialect=dialect, match_segment=self.__class__.__name__)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 70, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 273, in match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 70, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 485, in match
    match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 70, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 273, in match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 70, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 236, in match
    dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 361, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 320, in match
    dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 70, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 586, in match
    segment=segments[start_bracket_idx])
TypeError: tuple indices must be integers or slices, not NoneType

Rule: In a JOIN, flag overly complex logic, suggesting to move it to WHERE

This is a situation that arose recently in some production code.

We have the following join:

...
JOIN
  content
ON (s.user_id = content.user_id
      AND s.unique_opens <= c.emails_sent - (s.hard_bounce + s.soft_bounce)
      AND s.campaign_id = content.campaign_id)

The ON is a mixture of what looks like actual join criteria (s.user_id = content.user_id and s.campaign_id = content.campaign_id) and filtering logic (the <= condition).

It would be cool to have a check that detects non-trivial join conditions (anything other than =, maybe?) and suggests moving those to a WHERE clause.

Question: Can we guarantee that this change won't change the query results? I think it's a reasonable suggestion either way, but if it's risky, the message should be worded appropriately to avoid someone blindly making the change and introducing a bug.

Feature Request: CLI that updates while running, rather than afterward

The current CLI looks like it's frozen while linting large directories. It should spit out output as it goes rather than just waiting until the end. Potentially with some options for a progress bar or similar.

Enhancement: Logging for config diffs

When linting large file structures there's a possibility of nested configs which vary from file to file. This is really hard to debug at the moment. For some levels of verbosity, the root config should be printed at the start (in full) and then for each file if their config is any different, the diff (and not the full new config) should also be printed.

Looking for contributors?

Are you still developing this project, and are you looking for contributors? I am a data engineer at Mailchimp, and we make extensive use of BigQuery.

A tool like this could be helpful for our team, and we may be able to help with development if you're still planning to work on it.

Please let me know -- thanks!

Enhancement: "IS NULL" and "IS NOT NULL" in query fail to parse

Query:

SELECT
    user_id
FROM
    p.d.t
WHERE
    gender IS NOT NULL

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   6 | P:  12 | ???? | Found unparsable segment @L006P012: 'IS NOT NULL\n...'

Enhancement: Raise issue if table referred to in select is not present.

Quite frequently people write queries that fail because they reference things wrong. sqlfluff won't know what columns are available (probably), but it does know what tables are available.

A query like the one below should fail because the select elements refer to tables a and b which aren't in the FROM statement.

SELECT
    a.blah,
    b.foo
FROM bar

Enhancement: Support for BigQuery SQL

Our data science projects use Google BigQuery. It will be good if this is supported. It's pretty close to standard SQL, I think.

Feature Request: Optional Jinja2 pre-processor

Similar to #1, but for formatting rather than code style. It would be great if the linter could reformat the SQL to follow some common-sense formatting guidelines, primarily around indentation.

Our standard is to place SELECT, FROM, JOIN, and keywords like that at the left, then indent the column list, join conditions, etc. (or place them on the same line as the keyword if it easily fits).

Subqueries and CTEs are indented.

We use two-space indent.

We capitalize all SQL keywords and function names.

We include spaces around operators (comparison, arithmetic, etc.), also after commas in parameter lists, column lists, etc.

Adding docstring linting

Enabling docstring linting (Google style) to make the codebase better documented.

Rule: Warn if a query uses both DISTINCT and GROUP BY

Since these two features of SQL behave very similarly, to my knowledge there is no reason to use both in the same query (I have not been able to 100% confirm this.). Even if there are occasional reasons, it's still pretty unusual and worth flagging.

If someone does this, they may be misunderstanding the behavior. The tool should suggest they eliminate DISTINCT from the query and do what they need using GROUP BY.

"Whitespace around operators" check reports the same warning twice

Query:

SELECT
    a/b AS click_rate
FROM
    t

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   2 | P:   6 | L006 | Operators should be surrounded by a single whitespace.
L:   2 | P:   7 | L006 | Operators should be surrounded by a single whitespace.

Note the warning is correct, but it is reported twice -- apparently once on the operator and once on the second operand.

Enhancement: Have a way to run SQLFluff for a git branch, filtering the output to new or modified areas

There is a Python package called diff-cover which provides two command-line tools with similar functionality for Python code:

diff-cover for coverage information
diff-quality for code quality

In both cases, the tool runs the underlying command (e.g. pylint, flake8, or coverage) and filters the results to display only the output relating to new or modified parts of the code.

It would be great to have a way to do the same with SQLFluff. diff-cover is mentioned only as a similar tool; it may not be directly useful or relevant to addressing this need.

Rule: If a JOIN uses "ON" but could use "USING", suggest changing it

In SQL, USING provides a more concise syntax for joins where the fields in the two tables have the same names and the fields are being compared for equality (=). It is more readable and therefore preferable for style reasons. If a JOIN uses ON and all the join conditions are like this, suggest switching to USING.

For example, a query like this:

SELECT
  c.user_id,
  c.campaign_id,
  s.num_clicks
FROM
  campaigns AS c
LEFT JOIN stats AS s
ON c.user_id = s.user_id AND c.campaign_id = s.campaign_id

Should suggest changing it to this:

SELECT
  c.user_id,
  c.campaign_id,
  s.num_clicks
FROM
  campaigns AS c
LEFT JOIN stats AS s
USING (user_id, campaign_id)

Enhancement: Implement "UNION ALL" in parser.

Linting this file:

SELECT "Majority Other" AS gender_label UNION ALL
SELECT "Mixed Gender" AS gender_label

Fails as follows:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   1 | P:  41 | ???? | Found unparsable segment @L001P041: 'UNION ALL\nSELECT "Mi...'

Feature request: some pre-set options for SQL style guides

Hi @alanmcruickshank,

Thanks a ton again for taking the time to build sqlfluff -- I believe it has the potential to be the linter that is sorely missing!

I am not sure what are your future plans, but if I may, I'd suggest trying to add rules from a few guidelines -- here are the ones I am familiar with:

If you would be open to exploring this direction, I would be happy to put together a few rules as well -- from looking at the code it seems fairly straightforward.

Thanks again!

sqlfluff fails to install in Python 2.7

I started looking at integrating sqlfluff into our application today. We use a mixture of Python 2.7 and 3 on our projects, which new projects using Python 3.

I noticed that the package won't install in Python 2.7. It looks like a very simple issue. This is the error:

Collecting sqlfluff
  Using cached https://artifactory.rsglab.com/artifactory/api/pypi/pypi/packages/f3/71/6ec9f43964cb994a3b2075da1f335c021115593d9a11b3cc02ace2581ef5/sqlfluff-0.1.5.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-Te5lWx/sqlfluff/setup.py", line 7, in <module>
        import configparser
    ImportError: No module named configparser

This module has been renamed in Python 3.

Python 2.7: import ConfigParser
Python 3: import configparser

I believe the following will work:

try:
    import configparser
except ImportError:
    import ConfigParser as configparser

If you use the six library, you could instead: from six.moves import configparser

Rule: When a query includes a computed column in the SELECT list, warn if no column alias was defined

This makes the code and the query results more self documenting. It's especially important in some data pipelines, where SQL queries are often stored in new tables even without an explicit CREATE TABLE AS statement.

sqlfluff / sqlfluff Goto Github PK

sqlfluff's Introduction

The SQL Linter for Humans

Dialects Supported

Templates Supported

VS Code Extension

Getting Started

Documentation

Releases

SQLFluff on Slack

SQLFluff on Twitter

Contributing

Sponsors

sqlfluff's People

Contributors

Stargazers

Watchers

Forkers

sqlfluff's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs