GithubHelp home page GithubHelp logo

sqlfluff / sqlfluff Goto Github PK

View Code? Open in Web Editor NEW
7.4K 7.4K 669.0 27.24 MB

A modular SQL linter and auto-formatter with support for multiple dialects and templated code.

Home Page: https://www.sqlfluff.com

License: MIT License

Python 81.90% Dockerfile 0.03% SQL 18.05% Shell 0.02%
hacktoberfest pypi sql sql-linter

sqlfluff's Introduction

SQLFluff

The SQL Linter for Humans

PyPi Version PyPi License PyPi Python Versions PyPi Status PyPi Downloads

Coveralls GitHub Workflow Status ReadTheDocs Code style: black Docker Pulls

SQLFluff is a dialect-flexible and configurable SQL linter. Designed with ELT applications in mind, SQLFluff also works with Jinja templating and dbt. SQLFluff will auto-fix most linting errors, allowing you to focus your time on what matters.

Dialects Supported

Although SQL is reasonably consistent in its implementations, there are several different dialects available with variations of syntax and grammar. SQLFluff currently supports the following SQL dialects (though perhaps not in full):

We aim to make it easy to expand on the support of these dialects and also add other, currently unsupported, dialects. Please raise issues (or upvote any existing issues) to let us know of demand for missing support.

Pull requests from those that know the missing syntax or dialects are especially welcomed and are the question way for you to get support added. We are happy to work with any potential contributors on this to help them add this support. Please raise an issue first for any large feature change to ensure it is a good fit for this project before spending time on this work.

Templates Supported

SQL itself does not lend itself well to modularity, so to introduce some flexibility and reusability it is often templated as discussed more in our modularity documentation.

SQLFluff supports the following templates:

Again, please raise issues if you wish to support more templating languages/syntaxes.

VS Code Extension

We also have a VS Code extension:

Getting Started

To get started, install the package and run sqlfluff lint or sqlfluff fix.

$ pip install sqlfluff
$ echo "  SELECT a  +  b FROM tbl;  " > test.sql
$ sqlfluff lint test.sql --dialect ansi
== [test.sql] FAIL
L:   1 | P:   1 | LT01 | Expected only single space before 'SELECT' keyword.
                       | Found '  '. [layout.spacing]
L:   1 | P:   1 | LT02 | First line should not be indented.
                       | [layout.indent]
L:   1 | P:   1 | LT13 | Files must not begin with newlines or whitespace.
                       | [layout.start_of_file]
L:   1 | P:  11 | LT01 | Expected only single space before binary operator '+'.
                       | Found '  '. [layout.spacing]
L:   1 | P:  14 | LT01 | Expected only single space before naked identifier.
                       | Found '  '. [layout.spacing]
L:   1 | P:  27 | LT01 | Unnecessary trailing whitespace at end of file.
                       | [layout.spacing]
L:   1 | P:  27 | LT12 | Files must end with a single trailing newline.
                       | [layout.end_of_file]
All Finished ๐Ÿ“œ ๐ŸŽ‰!

Alternatively, you can use the Official SQLFluff Docker Image or have a play using SQLFluff online.

For full CLI usage and rules reference, see the SQLFluff docs.

Documentation

For full documentation visit docs.sqlfluff.com. This documentation is generated from this repository so please raise issues or pull requests for any additions, corrections, or clarifications.

Releases

SQLFluff adheres to Semantic Versioning, so breaking changes should be restricted to major versions releases. Some elements (such as the python API) are in a less stable state and may see more significant changes more often. For details on breaking changes and how to migrate between versions, see our release notes. See the changelog for more details. If you would like to join in, please consider contributing.

New releases are made monthly. For more information, visit Releases.

SQLFluff on Slack

We have a fast-growing community on Slack, come and join us!

SQLFluff on Twitter

Follow us on Twitter @SQLFluff for announcements and other related posts.

Contributing

We are grateful to all our contributors. There is a lot to do in this project, and we are just getting started.

If you want to understand more about the architecture of SQLFluff, you can find more here.

If you would like to contribute, check out the open issues on GitHub. You can also see the guide to contributing.

Sponsors

Datacoves
The turnkey analytics stack, find out more at Datacoves.com.

sqlfluff's People

Contributors

adam-tokarski avatar aidanharveynelson avatar alanmcruickshank avatar barrywhart avatar borchero avatar dmohns avatar fdw avatar github-actions[bot] avatar greg-finley avatar james-johnston-thumbtack avatar jmc-bbk avatar joaostorrer avatar jpers36 avatar jpy-git avatar juhoautio avatar katzmann1983 avatar keraion avatar kzosabe avatar mhaley-miovision avatar niallrees avatar nolanbconaway avatar otoolemichael avatar pwildenhain avatar r7l208 avatar rpr-ableton avatar sti0 avatar tunetheweb avatar wittierdinosaur avatar yoichi avatar zhongjiajie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sqlfluff's Issues

Bug: Specifying paths which don't exist fails messily

When specifying a path which doesn't exist, we get a long traceback response rather than a compact and concise message saying that the path doesn't exist. For example the command:

sqlfluff lint foo

We get the output:

$ sqlfluff lint foo
Traceback (most recent call last):
  File "C:\Users\usr\dev2\sqlfluff\env\Scripts\sqlfluff-script.py", line 11, in <module>
    load_entry_point('sqlfluff', 'console_scripts', 'sqlfluff')()
  File "C:\Users\usr\dev2\sqlfluff\env\lib\site-packages\click\core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\usr\dev2\sqlfluff\env\lib\site-packages\click\core.py", line 717, in main
    rv = self.invoke(ctx)
  File "C:\Users\usr\dev2\sqlfluff\env\lib\site-packages\click\core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\usr\dev2\sqlfluff\env\lib\site-packages\click\core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\alan\dev2\sqlfluff\env\lib\site-packages\click\core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "c:\users\usr\dev2\sqlfluff\src\sqlfluff\cli\commands.py", line 138, in lint
    result = lnt.lint_paths(paths, verbosity=verbose)
  File "c:\users\usr\dev2\sqlfluff\src\sqlfluff\linter.py", line 424, in lint_paths
    result.add(self.lint_path(path, verbosity=verbosity, fix=fix))
  File "c:\users\usr\dev2\sqlfluff\src\sqlfluff\linter.py", line 408, in lint_path
    for fname in self.paths_from_path(path):
  File "c:\users\usr\dev2\sqlfluff\src\sqlfluff\linter.py", line 379, in paths_from_path
    raise IOError("Specified path does not exist")
OSError: Specified path does not exist

better would be something like:

$ sqlfluff lint foo
Error: the path `foo` doesn't exist.

Unexpected behavior of 'sqlfluff fix' for "Indentation length is not a multiple of 4"

Fixing indentation issues in this file ends up moving the incorrectly indented text left (unindenting it) rather than right (moving it to the indentation I expected).

Input:

SELECT
  user_id,
  list_id,
    gender_label,
    audience
FROM
  age_data
JOIN
    gender_data
USING
  (user_id, list_id)
JOIN
  audience_size
USING
  (user_id, list_id)
LEFT JOIN
  verts
USING
  (user_id)

Output:

SELECT
user_id,
list_id,
    gender_label,
    audience
FROM
age_data
JOIN
    gender_data
USING
(user_id, list_id)
JOIN
audience_size
USING
(user_id, list_id)
LEFT JOIN
verts
USING
(user_id)

Is there a way to have it deduce and apply the correct indentation based on the structure of the query?

Expected output:

SELECT
    user_id,
    list_id,
    gender_label,
    audience
FROM
    age_data
JOIN
    gender_data
USING
    (user_id, list_id)
JOIN
    audience_size
USING
    (user_id, list_id)
LEFT JOIN
    verts
USING
    (user_id)

Bug: The `fix` command is not templating safe

Running the command below does fix the linting error but it also removes the templating tags and replaces them with the dummy commands.

sqlfluff fix test\fixtures\templater\jinja_b\jinja.sql --rules L010

To fix this we need a way that that during the fixing process we compare the templated and un-templated versions to identify where substitutions have been made and either ignore those fixes (probably raising a warning) - or even better, compensate for them and try to work around them.

Error parsing complex mathematical expression

Input:

SELECT
    COS(2*ACOS(-1)*2*y/53) AS c2
FROM
    t

Output:

$ sqlfluff lint test1.sql
== [test1.sql] FAIL
L:   2 | P:  10 | L006 | Operators should be preceded by a space.
L:   2 | P:  11 | L006 | Operators should be followed by a space.
L:   2 | P:  15 | ???? | Found unparsable segment @L002P015: '(-1)*2*y/53...'
L:   2 | P:  31 | L014 | Inconsistent capitalisation of unquoted identifiers.
L:   4 | P:   5 | L014 | Inconsistent capitalisation of unquoted identifiers.

Interestingly, adding whitespace around the operators also fixes the parsing error. I wonder if two adjacent tokens are being mistakenly parsed as a single token when spaces are missing.

Here's the same SQL with spaces (and no parsing error):

SELECT
    COS(2 * ACOS(-1) * 2 * y / 53) AS c2
FROM
    t

Output:

$ sqlfluff lint test1.sql
== [test1.sql] FAIL
L:   2 | P:  22 | L006 | Operators should be preceded by a space.

I don't know why there's still a warning about needing space around an operator. Can you look into this as well?

Enhancement: Support BiqQuery Dialect with table expressions.

This query uses some ARRAY operations. This is BigQuery, but I think the same or similar SQL is valid in Postgres, possibly other databases.

Input:

SELECT
    y AS woy
FROM
    UNNEST(GENERATE_ARRAY(1, 53)) AS y

Output:

$ sqlfluff lint test2.sql
== [test2.sql] FAIL
L:   3 | P:   1 | ???? | Found unparsable segment @L003P001: 'FROM\n    UNNEST(GENE...'

Bug: SQLFluff needs to include jinja2 in its "install_requires" list in setup.py

Without this, if I pip install sqlfluff and run it, I get an error, e.g.:

$ sqlfluff lint test.sql
SELECT
Traceback (most recent call last):
  File "/Users/bhart/.pyenv/versions/sqlfluff-prod-3.6.4/bin/sqlfluff", line 11, in <module>
SELECT
    load_entry_point('sqlfluff==0.2.3', 'console_scripts', 'sqlfluff')()
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/cli/commands.py", line 139, in lint
    result = lnt.lint_paths(paths, verbosity=verbose)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/linter.py", line 589, in lint_paths
    result.add(self.lint_path(path, verbosity=verbosity, fix=fix))
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/linter.py", line 576, in lint_path
    linted_path.add(self.lint_string(f.read(), fname=fname, verbosity=verbosity, fix=fix, config=config))
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/linter.py", line 452, in lint_string
    parsed, vs, time_dict = self.parse_string(s=s, fname=fname, verbosity=verbosity, config=config)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/linter.py", line 392, in parse_string
    s = self.templater.process(s, fname=fname, config=config or self.config)
  File "/Users/bhart/.pyenv/versions/3.6.4/envs/sqlfluff-prod-3.6.4/lib/python3.6/site-packages/sqlfluff/templaters.py", line 178, in process
    from jinja2 import Environment, StrictUndefined  # noqa
ModuleNotFoundError: No module named 'jinja2'

Using with python raw sql

Hi! Thanks a lot for this awesome project!

Is it possible to use it with python code and sql written as strings inside?
It is pretty sad to know that these strings cannot be linted right now ๐Ÿ˜ž

Enhancement: Merge segment.name and segment.type.

Quite a low level issue here. Segments have both a name and a type property. They're becoming increasingly synonymous and I'm no longer convinced it makes sense to have both. I think we should work out how to remove one.

Feature Request: File based configuration

First of all thank you @alanmcruickshank , Really nice work on authoring this project.

I wanted to know what are your plans on putting more rules or making it more configurable via JSON Config as parameter as I would like to use this in our organisation and would hate to have multiple projects to maintain internally.

Mainly around this points in your ToDos

  • Configurable linting
    • Command line options for config
    • Ability to read from config files

Bug: Running "sqlfluff version" on a fresh install throws an error

I installed sqlfluff and ran the version and lint commands. Both failed, probably due to a missing configuration file. It would be good to report this error in a nicer way.

Here's the result I saw:

$ sqlfluff version
Traceback (most recent call last):
  File "/usr/local/pyenv/versions/3.7.3/bin/sqlfluff", line 11, in <module>
    load_entry_point('sqlfluff==0.2.1', 'console_scripts', 'sqlfluff')()
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/sqlfluff/cli/commands.py", line 82, in version
    c = get_config(**kwargs)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/sqlfluff/cli/commands.py", line 55, in get_config
    return FluffConfig.from_root(overrides=overrides)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/sqlfluff/config.py", line 283, in from_root
    return cls(configs=c, overrides=overrides)
  File "/usr/local/pyenv/versions/3.7.3/lib/python3.7/site-packages/sqlfluff/config.py", line 263, in __init__
    if self._configs['core']['rules']:
KeyError: 'rules'

Rule: Detect incorrect usage / understanding of DISTINCT

I often run across code where someone is using DISTINCT as if it were a function, when it's actually an option that applies to the whole list of result columns, e.g.:

SELECT
  DISTINCT(a),
  b,
  c
FROM
  foo

In this case, they are going to get all distinct combinations of a, b, and c, not just the distinct values of a as they seem to think.

In this case, the suggested corrections are one of:

  • Switch to using GROUP BY a and apply an aggregation function to b and c
  • Remove the parentheses after DISTINCT

Related info:

Rule: Lint indentation within queries.

This could be a big one, but I think we need to start on it before too long. This will also be controversial so need a bit of homework before anyone does any coding.

Want to help? Please post links and examples to good indentation here.

Output for subdirectories and files should be in alphabetical order

When reviewing output for a large directory structure, I noticed that the files are processed in an sequence that jumps across directories. It would be cool to process in alphabetical order (and depth first, I think?).

Example output (with the actual lint warnings removed, leaving just the filenames):

== [sql/benchmark_summary_queries/constraints/_postcondition_no_nulls.sql] FAIL
== [sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql] FAIL
== [sql/benchmark_summary_queries/constraints/_postcondition_anonymized_buckets.sql] FAIL

Notice a file from sql/audience_size_queries/ appears in between the output for two files in the same directory, sql/benchmark_summary_queries/.

Warn on missing whitespace in CTE declaration

I'd like to see a warning if there is missing whitespace after the AS in a CTE declaration, e.g.:

WITH
    count_audience_size AS( # <--- No space between AS and (
        SELECT
            user_id
        FROM
            table
    )

SELECT * FROM count_audience_size

Enhancement: Add the `NOT IN (SELECT *)` syntax

Query:

SELECT
    user_id AS non_GDPR_users
FROM
    p.d.t AS query
WHERE user_id NOT IN (SELECT user_id FROM p.d.t2)

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   2 | P:  16 | L014 | Inconsistent capitalisation of unquoted identifiers.
L:   5 | P:  15 | ???? | Found unparsable segment @L005P015: 'NOT IN (SELECT user_...'

Bug: fix ValueError: Unexpected capitalisation policy: 'inconsistent'

I don't even know what is happening here!

โžœ sqlfluff version                                                                                   
0.2.4

โžœ echo 'selECT * from table;' > test.sql

โžœ sqlfluff fix test.sql --rules L001,L002,L003,L004,L005,L006,L007,L008,L009,L010,L011,L012,L013,L014
==== finding violations ====
Traceback (most recent call last):
  File "/Users/nolan/anaconda3/bin/sqlfluff", line 10, in <module>
    sys.exit(cli())
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/cli/commands.py", line 174, in fix
    result = lnt.lint_paths(paths, verbosity=verbose)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/linter.py", line 605, in lint_paths
    result.add(self.lint_path(path, verbosity=verbosity, fix=fix))
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/linter.py", line 592, in lint_path
    fix=fix, config=config))
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/linter.py", line 536, in lint_string
    lerrs, _, _, _ = crawler.crawl(parsed)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/base.py", line 192, in crawl
    raw_stack=raw_stack, fix=fix, memory=memory)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/base.py", line 192, in crawl
    raw_stack=raw_stack, fix=fix, memory=memory)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/base.py", line 192, in crawl
    raw_stack=raw_stack, fix=fix, memory=memory)
  [Previous line repeated 1 more time]
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/base.py", line 162, in crawl
    raw_stack=raw_stack, memory=memory)
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/std.py", line 488, in _eval
    segment, self.capitalisation_policy))
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/std.py", line 463, in make_replacement
    return make_replacement(seg, list(cases_seen)[0])
  File "/Users/nolan/anaconda3/lib/python3.7/site-packages/sqlfluff/rules/std.py", line 469, in make_replacement
    raise ValueError("Unexpected capitalisation policy: {0!r}".format(policy))
ValueError: Unexpected capitalisation policy: 'inconsistent'

Enhancement: Support "smart indent" where appropriate

This is something to think about, not a "do exactly this" issue.

This query produces a warning:

SELECT
    a
FROM
    t
JOIN
    u
USING
    (user_id,
     list_id)

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   9 | P:   1 | L003 | Indentation length is not a multiple of 4.

It might be cool to support something like the Python "hanging indent" described here. I.e. since the second column is indented to align with the first column name, this should be reported as okay (no warning).

There may be other SQL constructs which could/should support the same behavior.

Feature request: functions (`DATE()`) and proper `ORDER BY`

Consider the following query:

SELECT
    col_a,
    col_b,
    date_col_a,
    date_col_b
FROM "database"."sample_table"
WHERE
    DATE(date_col_b) >= current_date
    AND length(col_a) = 4
ORDER BY date_col_a DESC

When saved to a.sql we can execute sqlfluff on it, which results into the following:

$ sqlfluff lint --rules L001,L002,L003,L004,L005,L008 -n a.sql
== [a.sql] FAIL
L:   7 | P:   1 | ???? | Found unparsable segment @ 7,1: 'WHERE\n    DATE(date_...'
L:  10 | P:  21 | ???? | Found unparsable segment @ 10,21: 'DESC\n...'

This is unexpected, as the query has no trouble being executed in PrestoDB (that is AWS Athena) for which it was prepared.

Could it be that the quotes (") around column/database names may cause the problem? Or is it the DATE function?

Do let me know what I can do to help debug this even further.

Thanks!

Make the documentation less awful.

The documentation is now getting to a scale where it doesn't just work in one Markdown file. We need something better, and probably hosted elsewhere (readthedocs or similar).

This is an issue to track that.

Bug: "CROSS JOIN" is not parsed correctly

This query has a cross join. It looks fine to me, yet it produces two warnings. I think CROSS JOIN is not recognized, and it thinks CROSS is an alias for the correctly_substituted table rather than a keyword.

Query:

SELECT
    count_correctly_substituted
FROM
    correctly_substituted
CROSS JOIN
    needs_substitution

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   5 | P:   1 | L011 | Implicit aliasing of table not allowed. Use explicit `AS` clause.
L:   5 | P:   1 | L014 | Inconsistent capitalisation of unquoted identifiers.

I suspect

Rule: No nested selects within join clauses.

This is a pretty general, high-level quality check. I often see this issue with SQL written by new data scientists or engineers -- they'll deliver a working query, but it's monolithic and 50+ lines long. Even with proper indentation, it's hard to grasp what going on.

When this occurs, SQLFluff should suggest breaking the query into two or more logical chunks, with some of these chunks written as (descriptively named) CTEs.

Bug: fix from stdin thinks "-" is a file

I was excited to find that stdin input is now accepted for fix! I tried it out and found that it is not really:

โžœ sqlfluff version  
0.2.4

โžœ echo 'select col from tbl' | sqlfluff fix - --rules L001
==== finding violations ====
The path(s) ('-',) could not be accessed. Check it/they exist(s).

I might just try to figure out whats going on here, but worth making an issue to keep track of the bug!

Enhancement: Inequality operator fails to parse

Both versions of the SQL inequality operator fail to parse:

  • <>
  • !=

E.g.

SELECT
    user_id
FROM
    p.d.t
WHERE
    p.d.t != 1

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   6 | P:  11 | ???? | Found unparsable segment @L006P011: '!= 1\n...'

Bug: Unexpected L003 fix behavior on python2

On python3, L003 handles this case as expected:

โžœ echo '  select 1 from tbl;' | sqlfluff fix - --rules L003
select 1 from tbl;

But not on python2:

โžœ echo '  select 1 from tbl;' | sqlfluff fix - --rules L003
    select 1 from tbl;

โžœ python --version       
Python 2.7.16

โžœ sqlfluff version  
0.2.4

I have noticed that python2 handles this correctly if there is only one preceding space. any idea what could be going on?

Feature request: autofix lint errors (that can be fixed)

Similar to autopep8, eslint --fix, or rubocop --auto-correct, it would be amazing if sqlfluff was able to fix some of the lint errors it discovered automatically. This is good because it:

  • makes users using sqlfluff able to move faster by not having to fix all errors themselves
  • makes rolling out sqlfluff to a new codebase faster and less of a speedbump
  • makes the signal to noise ratio of the linter much higher

This is definitely a big one, but it makes a huge difference in usability of the linter. I find that a lot of folks like the consistency that a linter brings but aren't willing to put in the manual effort to manually wrap lines, edit indentation, etc etc, so they end up with a lot of disabled rules or using the linter mostly as an advisory thing instead of a thing that would run in CI and enforce good quality code.

The antidote to that is making it so that humans only have to deal with lint errors when they themselves have actually made an errors that the computer can't fix on its own, like a very broken syntax error or a reference to an undefined variable. That way, "boring" lint fails (like whitespace) are just things you don't have to care about ever because the linter fixes them, and if you're feeling real frisky, you can set your editor to "Format on Save" and watch your code get better every time you save.

Rule: Inconsistent capitalisation of keywords

Either keywords should be uppercase or lowercase -but within a file they should be consistent.

An extension of this rule would be to allow a more specific version to be configured using the new config framework.

Bug: Issues with nested functions (e.g. SPLIT(LOWER(text), ' ') and so on)

Consider the following query:

SELECT
    SPLIT(LOWER(text), ' ') AS text
FROM "database"."sample_table"

Saving this query to b.sql and running sqlfluff lint -n b.sql results in the following:

Traceback (most recent call last):
  File "/home/user/sql/.venv/bin/sqlfluff", line 11, in <module>
    load_entry_point('sqlfluff==0.1.3', 'console_scripts', 'sqlfluff')()
  File "/home/user/sql/.venv/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/sql/.venv/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/cli/commands.py", line 85, in lint
    result = lnt.lint_paths(paths, verbosity=verbose)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/linter.py", line 324, in lint_paths
    result.add(self.lint_path(path, verbosity=verbosity, fix=fix))
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/linter.py", line 312, in lint_path
    linted_path.add(self.lint_file(f, fname=fname, verbosity=verbosity, fix=fix))
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/linter.py", line 212, in lint_file
    parsed, vs, time_dict = self.parse_file(f=f, fname=fname, verbosity=verbosity)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/linter.py", line 193, in parse_file
    parsed = fs.parse(recurse=recurse, verbosity=verbosity, dialect=self.dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 228, in parse
    verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 394, in expand
    res = stmt.parse(recurse=recurse, parse_depth=parse_depth, verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 228, in parse
    verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 394, in expand
    res = stmt.parse(recurse=recurse, parse_depth=parse_depth, verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 228, in parse
    verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 394, in expand
    res = stmt.parse(recurse=recurse, parse_depth=parse_depth, verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 228, in parse
    verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 394, in expand
    res = stmt.parse(recurse=recurse, parse_depth=parse_depth, verbosity=verbosity, dialect=dialect)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 186, in parse
    dialect=dialect, match_segment=self.__class__.__name__)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 70, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 273, in match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 70, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 485, in match
    match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 70, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 273, in match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 70, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 236, in match
    dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 361, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/segments_base.py", line 320, in match
    dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 70, in _match
    verbosity=verbosity, dialect=dialect, match_segment=match_segment)
  File "/home/user/sql/.venv/lib/python3.7/site-packages/sqlfluff/parser/grammar.py", line 586, in match
    segment=segments[start_bracket_idx])
TypeError: tuple indices must be integers or slices, not NoneType

Rule: In a JOIN, flag overly complex logic, suggesting to move it to WHERE

This is a situation that arose recently in some production code.

We have the following join:

...
JOIN
  content
ON (s.user_id = content.user_id
      AND s.unique_opens <= c.emails_sent - (s.hard_bounce + s.soft_bounce)
      AND s.campaign_id = content.campaign_id)

The ON is a mixture of what looks like actual join criteria (s.user_id = content.user_id and s.campaign_id = content.campaign_id) and filtering logic (the <= condition).

It would be cool to have a check that detects non-trivial join conditions (anything other than =, maybe?) and suggests moving those to a WHERE clause.

Question: Can we guarantee that this change won't change the query results? I think it's a reasonable suggestion either way, but if it's risky, the message should be worded appropriately to avoid someone blindly making the change and introducing a bug.

Enhancement: Logging for config diffs

When linting large file structures there's a possibility of nested configs which vary from file to file. This is really hard to debug at the moment. For some levels of verbosity, the root config should be printed at the start (in full) and then for each file if their config is any different, the diff (and not the full new config) should also be printed.

Looking for contributors?

Are you still developing this project, and are you looking for contributors? I am a data engineer at Mailchimp, and we make extensive use of BigQuery.

A tool like this could be helpful for our team, and we may be able to help with development if you're still planning to work on it.

Please let me know -- thanks!

Enhancement: Raise issue if table referred to in select is not present.

Quite frequently people write queries that fail because they reference things wrong. sqlfluff won't know what columns are available (probably), but it does know what tables are available.

A query like the one below should fail because the select elements refer to tables a and b which aren't in the FROM statement.

SELECT
    a.blah,
    b.foo
FROM bar

Feature Request: Optional Jinja2 pre-processor

Similar to #1, but for formatting rather than code style. It would be great if the linter could reformat the SQL to follow some common-sense formatting guidelines, primarily around indentation.

Our standard is to place SELECT, FROM, JOIN, and keywords like that at the left, then indent the column list, join conditions, etc. (or place them on the same line as the keyword if it easily fits).

Subqueries and CTEs are indented.

We use two-space indent.

We capitalize all SQL keywords and function names.

We include spaces around operators (comparison, arithmetic, etc.), also after commas in parameter lists, column lists, etc.

Rule: Warn if a query uses both DISTINCT and GROUP BY

Since these two features of SQL behave very similarly, to my knowledge there is no reason to use both in the same query (I have not been able to 100% confirm this.). Even if there are occasional reasons, it's still pretty unusual and worth flagging.

If someone does this, they may be misunderstanding the behavior. The tool should suggest they eliminate DISTINCT from the query and do what they need using GROUP BY.

"Whitespace around operators" check reports the same warning twice

Query:

SELECT
    a/b AS click_rate
FROM
    t

Output:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   2 | P:   6 | L006 | Operators should be surrounded by a single whitespace.
L:   2 | P:   7 | L006 | Operators should be surrounded by a single whitespace.

Note the warning is correct, but it is reported twice -- apparently once on the operator and once on the second operand.

Enhancement: Have a way to run SQLFluff for a git branch, filtering the output to new or modified areas

There is a Python package called diff-cover which provides two command-line tools with similar functionality for Python code:

  • diff-cover for coverage information
  • diff-quality for code quality

In both cases, the tool runs the underlying command (e.g. pylint, flake8, or coverage) and filters the results to display only the output relating to new or modified parts of the code.

It would be great to have a way to do the same with SQLFluff. diff-cover is mentioned only as a similar tool; it may not be directly useful or relevant to addressing this need.

Rule: If a JOIN uses "ON" but could use "USING", suggest changing it

In SQL, USING provides a more concise syntax for joins where the fields in the two tables have the same names and the fields are being compared for equality (=). It is more readable and therefore preferable for style reasons. If a JOIN uses ON and all the join conditions are like this, suggest switching to USING.

For example, a query like this:

SELECT
  c.user_id,
  c.campaign_id,
  s.num_clicks
FROM
  campaigns AS c
LEFT JOIN stats AS s
ON c.user_id = s.user_id AND c.campaign_id = s.campaign_id

Should suggest changing it to this:

SELECT
  c.user_id,
  c.campaign_id,
  s.num_clicks
FROM
  campaigns AS c
LEFT JOIN stats AS s
USING (user_id, campaign_id)

Enhancement: Implement "UNION ALL" in parser.

Linting this file:

SELECT "Majority Other" AS gender_label UNION ALL
SELECT "Mixed Gender" AS gender_label

Fails as follows:

$ sqlfluff lint test.sql
== [test.sql] FAIL
L:   1 | P:  41 | ???? | Found unparsable segment @L001P041: 'UNION ALL\nSELECT "Mi...'

Feature request: some pre-set options for SQL style guides

Hi @alanmcruickshank,

Thanks a ton again for taking the time to build sqlfluff -- I believe it has the potential to be the linter that is sorely missing!

I am not sure what are your future plans, but if I may, I'd suggest trying to add rules from a few guidelines -- here are the ones I am familiar with:

If you would be open to exploring this direction, I would be happy to put together a few rules as well -- from looking at the code it seems fairly straightforward.

Thanks again!

sqlfluff fails to install in Python 2.7

I started looking at integrating sqlfluff into our application today. We use a mixture of Python 2.7 and 3 on our projects, which new projects using Python 3.

I noticed that the package won't install in Python 2.7. It looks like a very simple issue. This is the error:

Collecting sqlfluff
  Using cached https://artifactory.rsglab.com/artifactory/api/pypi/pypi/packages/f3/71/6ec9f43964cb994a3b2075da1f335c021115593d9a11b3cc02ace2581ef5/sqlfluff-0.1.5.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-Te5lWx/sqlfluff/setup.py", line 7, in <module>
        import configparser
    ImportError: No module named configparser

This module has been renamed in Python 3.

  • Python 2.7: import ConfigParser
  • Python 3: import configparser

I believe the following will work:

try:
    import configparser
except ImportError:
    import ConfigParser as configparser

If you use the six library, you could instead: from six.moves import configparser

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.