GithubHelp home page GithubHelp logo

sqltree's Introduction

sqltree

sqltree is an experimental parser for SQL, providing a syntax tree for SQL queries. Possible use cases include:

  • Static analysis (for example, to validate column names)
  • Translating queries to another SQL dialect
  • Autoformatting

sqltree can parse queries:

$ python -m sqltree "SELECT * FROM x WHERE x = 3"
Select(select_exprs=[SelectExpr(expr=Star(), alias=None)], table=Identifier(text='x'), conditions=BinOp(left=Identifier(text='x'), op=Punctuation(text='='), right=IntegerLiteral(value=3)))

And format them:

$  python -m sqltree.formatter "SELECT * from x where x=3"
SELECT *
FROM x
WHERE x = 3

SQL is a big language with a complicated grammar that varies significantly between database vendors. sqltree is designed to be flexible enough to parse the full syntax supported by different databases, but I am prioritizing constructs used in my use cases for the parser. So far, that has meant a focus on parsing MySQL 8 queries. Further syntax will be added as I have time.

Features

Useful features of sqltree include:

Placeholder support

sqltree supports placeholders such as %s or ? in various positions in the query, so that queries using such placeholders can be formatted and analyzed.

$ python -m sqltree.formatter 'select * from x where y = 3 %(limit)s'
SELECT *
FROM x
WHERE y = 3
%(limit)s

Better error messages

sqltree's handwritten parser often produces better error messages than MySQL itself. For example:

$ mysql
mysql> show replicca status;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'replicca status' at line 1
$ python -m sqltree 'show replicca status'
Unexpected 'replicca' (expected one of REPLICA, SLAVE, REPLICAS, TABLES, TABLE, TRIGGERS, VARIABLES, STATUS, COUNT, WARNINGS, ERRORS, COLUMNS, FIELDS, INDEX, INDEXES, KEYS)
0: show replicca status
        ^^^^^^^^

API

  • sqltree.sqltree: parse a SQL query and return the parse tree. See sqltree.parser for the possible parse nodes.
  • sqltree.formatter.format: reformat a SQL query.
  • sqltree.tools.get_tables: get the tables referenced in a SQL query.

More detailed documentation to follow.

Requirements

sqltree runs on Python 3.6 and up and it has no dependencies.

Using the fixit rule

sqltree embeds a fixit rule for formatting SQL. Here is how to use it:

  • Install fixit if you don't have it yet
    • pip install fixit
    • python -m fixit.cli.init_config
  • Run python -m fixit.cli.apply_fix --rules sqltree.fixit.SqlFormatRule path/to/your/code

Changelog

Version 0.3.0 (July 12, 2022)

  • Add ANSI SQL as a dialect
  • Support escaping quotes by doubling them in string literals
  • Support scientific notation with a negative exponent
  • Fix formatting for quoted identifiers that contain non-alphanumeric characters
  • Support the unary NOT operator
  • Fix formatting of LEFT JOIN and similar queries

Version 0.2.0 (June 24, 2022)

  • Support SELECT ... INTO syntax
  • Support SET TRANSACTION syntax
  • Support a MOD B and a DIV b syntax
  • Support GROUP_CONCAT() syntax

Version 0.1.0 (June 13, 2022)

  • Initial release

sqltree's People

Contributors

jellezijlstra avatar pangyifish avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

zyv pangyifish

sqltree's Issues

support string escapes

print(format("select '''y' "))

print(format("select '''y' "))
Traceback (most recent call last):
File "", line 1, in
File "/Users/toby_mao/dev/sqlglot/env/lib/python3.8/site-packages/sqltree/formatter.py", line 821, in format
sqltree(sql, dialect), dialect=dialect, line_length=line_length, indent=indent
File "/Users/toby_mao/dev/sqlglot/env/lib/python3.8/site-packages/sqltree/api.py", line 12, in sqltree
return parse(tokens, dialect)
File "/Users/toby_mao/dev/sqlglot/env/lib/python3.8/site-packages/sqltree/parser.py", line 964, in parse
return _parse_statement(p)
File "/Users/toby_mao/dev/sqlglot/env/lib/python3.8/site-packages/sqltree/parser.py", line 984, in _parse_statement
_assert_done(p)
File "/Users/toby_mao/dev/sqlglot/env/lib/python3.8/site-packages/sqltree/parser.py", line 970, in _assert_done
raise ParseError.from_unexpected_token(remaining, expected)
sqltree.parser.ParseError: Unexpected "'y'" (expected EOF)
0: select '''y'

in sql, the escape for a quote is another quote

Fixit rule

We should provide a Fixit rule that formats SQL embedded in Python strings.

Ideas for how it would work:

  • Configure a list of functions that map to Dialects (presto.query -> Presto dialect object); use that dialect for parsing and formatting the SQL.
  • Also configure variable names: sql = "some string" -> format it with some default dialect
  • Should also handle f-strings by using placeholders.

SELECT COUNT(*) c FROM t

MySQL supports queries like SELECT COUNT(*) c FROM t. We should support this but first I need to figure out what exactly the allowed syntax is.

support cross join

print(format("select x cross join y "))
Traceback (most recent call last):
File "", line 1, in
File "/Users/toby_mao/dev/sqlglot/env/lib/python3.8/site-packages/sqltree/formatter.py", line 821, in format
sqltree(sql, dialect), dialect=dialect, line_length=line_length, indent=indent
File "/Users/toby_mao/dev/sqlglot/env/lib/python3.8/site-packages/sqltree/api.py", line 12, in sqltree
return parse(tokens, dialect)
File "/Users/toby_mao/dev/sqlglot/env/lib/python3.8/site-packages/sqltree/parser.py", line 964, in parse
return _parse_statement(p)
File "/Users/toby_mao/dev/sqlglot/env/lib/python3.8/site-packages/sqltree/parser.py", line 984, in _parse_statement
_assert_done(p)
File "/Users/toby_mao/dev/sqlglot/env/lib/python3.8/site-packages/sqltree/parser.py", line 970, in _assert_done
raise ParseError.from_unexpected_token(remaining, expected)
sqltree.parser.ParseError: Unexpected 'CROSS' (expected EOF)
0: select x cross join y

Transformer

Provide a Transformer class deriving from Visitor that produces a new AST. The default transfoormer should just return a copy of the original tree.

Dialect support

sqltree should support multiple SQL dialects, as implemented by major databases. I'm interested initially in Redshift, Presto, and MySQL.

I added a Dialect enum, but that's probably not enough. We should also add version support (--dialect=mysql --version=5.7.0), because sometimes syntax changes between versions. Instead, there should be a Dialect class that gets passed along to the parser, tokenizer, formatter, etc. It should have methods for every syntactic detail that changes between dialects.

need space with join and side

print(format("select x from foo left join bar on foo.a = bar.a right join baz on baz.a = bar.b"))
SELECT x
FROM
foo
LEFTJOIN
bar
ON foo.a = bar.a
RIGHTJOIN
baz
ON baz.a = bar.b

should be left join and right join

Command-line interface

There should be a nicely usable API. Some ideas:

sqltree format "select *"  # prints formatted SQL
sqltree parse "select *"  # prints pretty-printed parse tree
sqltree translate --from=mysql --to=sqlite "select *"  # prints SQL changed to work with SQLite

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.