GithubHelp home page GithubHelp logo

hyrise / sql-parser Goto Github PK

View Code? Open in Web Editor NEW
728.0 48.0 242.0 2.92 MB

SQL Parser for C++. Building C++ object structure from SQL statements.

License: MIT License

Makefile 0.95% Python 0.15% Shell 0.65% C++ 88.52% C 2.56% Yacc 6.15% Lex 1.03%
sql c-plus-plus sql-parser hyrise

sql-parser's Introduction

C++ SQL Parser

Build Status

This is a SQL Parser for C++. It parses the given SQL query into C++ objects. It has been developed for integration in Hyrise, but can be used perfectly well in other environments as well.

In March 2015 we've also written a short paper outlining discussing some development details and the integration into our database Hyrise. You can find the paper here.

Usage

Note: You can also find a detailed usage description here.

To use the SQL parser in your own projects you simply have to follow these few steps.

  1. Download the latest release here
  2. Compile the library make to create libsqlparser.so
  3. (Optional, Recommended) Run make install to copy the library to /usr/local/lib/
  4. Run the tests make test to make sure everything worked
  5. Include the SQLParser.h from src/ (or from /usr/local/lib/hsql/ if you installed it) and link the library in your project
  6. Take a look at the example project here
#include "hsql/SQLParser.h"

/* ... */

{
    // Basic Usage Example

    const std::string query = "...";
    hsql::SQLParserResult result;
    hsql::SQLParser::parse(query, &result);

    if (result.isValid() && result.size() > 0) {
        const hsql::SQLStatement* statement = result.getStatement(0);

        if (statement->isType(hsql::kStmtSelect)) {
            const auto* select = static_cast<const hsql::SelectStatement*>(statement);
            /* ... */
        }
    }
}

Quick Links:

How to Contribute

Developer Documentation

We strongly encourage you to contribute to this project! If you want to contribute to this project there are several options. If you've noticed a bug or would like an improvement let us know by creating a new issue. If you want to develop a new feature yourself or just improve the quality of the system, feel free to fork the reposistory and implement your changes. Open a pull request as soon as your done and we will look over it. If we think it's good then your pull request will be merged into this repository.

License

HYRISE sql-parser is licensed as open source after the MIT License which is declared in the LICENSE file of this project.

sql-parser's People

Contributors

ankushrayabhari avatar bensk1 avatar bmwiedemann avatar bouncner avatar cclauss avatar cdmh avatar chrisxu333 avatar d-justen avatar dencrash avatar dey4ss avatar f4lco avatar j-tr avatar javrucebo avatar julianmenzler avatar klauck avatar lawben avatar mengke-mk avatar mgonnav avatar mrks avatar msiomkin avatar mweisgut avatar nanxiao avatar santazhang avatar schwald avatar simonstadlinger avatar tennyzhuang avatar till-later avatar timzimmermann avatar torpedro avatar valsight-foss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sql-parser's Issues

Join multiple tables

This typical OLAP query (join fact table with multiple dimensions) is not supported

SELECT City.name, Product.category, SUM(price) FROM fact
INNER JOIN City ON fact.city_id = City.id
INNER JOIN Product ON fact.product_id = Product.id
GROUP BY City.name, Product.category

It gives "syntax error, unexpected INNER, expecting $end", however it is a valid SQL syntax.

Will your parser cope with a specific kind of request?

For example, the query:

	|-- Найти КУТЫ у которых модель принадлежит другому производителю, нежели указано в карточке КУта(и товара) с доп информацией. по определенному производителю.
	|-- ВыбПроизводитель1=Справочник.Производители
	|SELECT 
	|	спрК1.ID as [Кут1 $Справочник.КлассификаторТоваров]
	|	, (SELECT COUNT(спрКО1.ID) FROM $Справочник.КлассифТоваров_КлючевыеХарактеристики as спрКО1 WHERE спрКО1.PARENTEXT = спрК1.ID) as КолвоКО1
	|	, $спрК1.КомментарийОптовогоПрайса as КомментарийОптовогоПрайса
	|	, спрК2.ID as [Кут2 $Справочник.КлассификаторТоваров]
	|	, (SELECT COUNT(спрКО2.ID) FROM $Справочник.КлассифТоваров_КлючевыеХарактеристики as спрКО2 WHERE спрКО2.PARENTEXT = спрК2.ID) as КолвоКО2
	|	, $спрК2.КомментарийОптовогоПрайса as КомментарийОптовогоПрайса
	|FROM 
	|	$Справочник.КлассификаторТоваров as спрК1(NOLOCK) 
	|	INNER JOIN $Справочник.КлассификаторТоваров спрК2(NOLOCK) on ($спрК1.Модель = $спрК2.Модель AND $спрК1.Производитель <> $спрК2.Производитель)
	|WHERE
	|	$спрК1.Производитель = :ВыбПроизводитель1

I need an output of the following form:
спрК1: $Справочник.КлассификаторТоваров
спрК2: $Справочник.КлассификаторТоваров

SQL keyword as alias

Some TPC-DS queries contain sql keywords as aliases, e.g., order. The parser interprets these aliases as sql keywords for the corresponding sql operations.

Example, TPC-DS query 94, netezza dialect:

select  
   count(distinct ws_order_number) as "order count"
  ,sum(ws_ext_ship_cost) as "total shipping cost"
  ,sum(ws_net_profit) as "total net profit"
from
   web_sales ws1
  ,date_dim
  ,customer_address
  ,web_site
where
    d_date between '2002-5-01' and 
           (cast('2002-5-01' as date) + 60 days)
and ws1.ws_ship_date_sk = d_date_sk
and ws1.ws_ship_addr_sk = ca_address_sk
and ca_state = 'OK'
and ws1.ws_web_site_sk = web_site_sk
and web_company_name = 'pri'
and exists (select *
            from web_sales ws2
            where ws1.ws_order_number = ws2.ws_order_number
              and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
and not exists(select *
               from web_returns wr1
               where ws1.ws_order_number = wr1.wr_order_number)
order by count(distinct ws_order_number)
limit 100;

how can the bison_paser.y to support the follow insert syntax ?

./example/example "insert into XT_CZRZB (CZRZBH,CZLB,CZTJ,CZCT,CZR,CZDWMC,CXDWBM,CZSJ,CZRIP) VALUES ('RZ320100221612' || to_char('5287','FM000000'),'111','gmsfzhm = ''1'' and xm = ''2'' and xb = ''1'' and csrq = ''19901202'' and hjqh like ''11____'' and bmch = ''3'' and zjzl = ''100'' and zjhm = ''4'' and hjzrq like ''320102______'' and sf = ''400'' and mz = ''01'' and rylb = ''1'' and ryxzzzrq like ''320100______'' and zc = ''02'' and fwcs = ''123'' and lxdh = ''777777'' and jgqx like ''1101__'' and rysx = ''111100000000000000000000000000000000000000000000000000000000''','101010','yaoshibin','kjxxhc','320100220000','20161221133856','10.33.2.68')"
Given string is not a valid SQL query.
syntax error, unexpected CONCAT, expecting ')' or ',' (L0:99)

WITH support

Many TPC-DS queries (1, 2, 4, 5, 11, 14, ...) contain the WITH clause which is currently not supported by the parser.

TPC-DS query 14, natezza dialect:

with  cross_items as
 (select i_item_sk ss_item_sk
 from item,
 (select iss.i_brand_id brand_id
     ,iss.i_class_id class_id
     ,iss.i_category_id category_id
 from store_sales
     ,item iss
     ,date_dim d1
 where ss_item_sk = iss.i_item_sk
   and ss_sold_date_sk = d1.d_date_sk
   and d1.d_year between 1999 AND 1999 + 2
 intersect 
 select ics.i_brand_id
     ,ics.i_class_id
     ,ics.i_category_id
 from catalog_sales
     ,item ics
     ,date_dim d2
 where cs_item_sk = ics.i_item_sk
   and cs_sold_date_sk = d2.d_date_sk
   and d2.d_year between 1999 AND 1999 + 2
 intersect
 select iws.i_brand_id
     ,iws.i_class_id
     ,iws.i_category_id
 from web_sales
     ,item iws
     ,date_dim d3
 where ws_item_sk = iws.i_item_sk
   and ws_sold_date_sk = d3.d_date_sk
   and d3.d_year between 1999 AND 1999 + 2)
 where i_brand_id = brand_id
      and i_class_id = class_id
      and i_category_id = category_id
),
 avg_sales as
 (select avg(quantity*list_price) average_sales
  from (select ss_quantity quantity
             ,ss_list_price list_price
       from store_sales
           ,date_dim
       where ss_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2
       union all 
       select cs_quantity quantity 
             ,cs_list_price list_price
       from catalog_sales
           ,date_dim
       where cs_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2 
       union all
       select ws_quantity quantity
             ,ws_list_price list_price
       from web_sales
           ,date_dim
       where ws_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2) x)
  select  channel, i_brand_id,i_class_id,i_category_id,sum(sales), sum(number_sales)
 from(
       select 'store' channel, i_brand_id,i_class_id
             ,i_category_id,sum(ss_quantity*ss_list_price) sales
             , count(*) number_sales
       from store_sales
           ,item
           ,date_dim
       where ss_item_sk in (select ss_item_sk from cross_items)
         and ss_item_sk = i_item_sk
         and ss_sold_date_sk = d_date_sk
         and d_year = 1999+2 
         and d_moy = 11
       group by i_brand_id,i_class_id,i_category_id
       having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)
       union all
       select 'catalog' channel, i_brand_id,i_class_id,i_category_id, sum(cs_quantity*cs_list_price) sales, count(*) number_sales
       from catalog_sales
           ,item
           ,date_dim
       where cs_item_sk in (select ss_item_sk from cross_items)
         and cs_item_sk = i_item_sk
         and cs_sold_date_sk = d_date_sk
         and d_year = 1999+2 
         and d_moy = 11
       group by i_brand_id,i_class_id,i_category_id
       having sum(cs_quantity*cs_list_price) > (select average_sales from avg_sales)
       union all
       select 'web' channel, i_brand_id,i_class_id,i_category_id, sum(ws_quantity*ws_list_price) sales , count(*) number_sales
       from web_sales
           ,item
           ,date_dim
       where ws_item_sk in (select ss_item_sk from cross_items)
         and ws_item_sk = i_item_sk
         and ws_sold_date_sk = d_date_sk
         and d_year = 1999+2
         and d_moy = 11
       group by i_brand_id,i_class_id,i_category_id
       having sum(ws_quantity*ws_list_price) > (select average_sales from avg_sales)
 ) y
 group by rollup (channel, i_brand_id,i_class_id,i_category_id)
 order by channel,i_brand_id,i_class_id,i_category_id
 limit 100;
with  cross_items as
 (select i_item_sk ss_item_sk
 from item,
 (select iss.i_brand_id brand_id
     ,iss.i_class_id class_id
     ,iss.i_category_id category_id
 from store_sales
     ,item iss
     ,date_dim d1
 where ss_item_sk = iss.i_item_sk
   and ss_sold_date_sk = d1.d_date_sk
   and d1.d_year between 1999 AND 1999 + 2
 intersect
 select ics.i_brand_id
     ,ics.i_class_id
     ,ics.i_category_id
 from catalog_sales
     ,item ics
     ,date_dim d2
 where cs_item_sk = ics.i_item_sk
   and cs_sold_date_sk = d2.d_date_sk
   and d2.d_year between 1999 AND 1999 + 2
 intersect
 select iws.i_brand_id
     ,iws.i_class_id
     ,iws.i_category_id
 from web_sales
     ,item iws
     ,date_dim d3
 where ws_item_sk = iws.i_item_sk
   and ws_sold_date_sk = d3.d_date_sk
   and d3.d_year between 1999 AND 1999 + 2) x
 where i_brand_id = brand_id
      and i_class_id = class_id
      and i_category_id = category_id
),
 avg_sales as
(select avg(quantity*list_price) average_sales
  from (select ss_quantity quantity
             ,ss_list_price list_price
       from store_sales
           ,date_dim
       where ss_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2
       union all
       select cs_quantity quantity
             ,cs_list_price list_price
       from catalog_sales
           ,date_dim
       where cs_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2
       union all
       select ws_quantity quantity
             ,ws_list_price list_price
       from web_sales
           ,date_dim
       where ws_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2) x)
  select  this_year.channel ty_channel
                           ,this_year.i_brand_id ty_brand
                           ,this_year.i_class_id ty_class
                           ,this_year.i_category_id ty_category
                           ,this_year.sales ty_sales
                           ,this_year.number_sales ty_number_sales
                           ,last_year.channel ly_channel
                           ,last_year.i_brand_id ly_brand
                           ,last_year.i_class_id ly_class
                           ,last_year.i_category_id ly_category
                           ,last_year.sales ly_sales
                           ,last_year.number_sales ly_number_sales 
 from
 (select 'store' channel, i_brand_id,i_class_id,i_category_id
        ,sum(ss_quantity*ss_list_price) sales, count(*) number_sales
 from store_sales 
     ,item
     ,date_dim
 where ss_item_sk in (select ss_item_sk from cross_items)
   and ss_item_sk = i_item_sk
   and ss_sold_date_sk = d_date_sk
   and d_week_seq = (select d_week_seq
                     from date_dim
                     where d_year = 1999 + 1
                       and d_moy = 12
                       and d_dom = 3)
 group by i_brand_id,i_class_id,i_category_id
 having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) this_year,
 (select 'store' channel, i_brand_id,i_class_id
        ,i_category_id, sum(ss_quantity*ss_list_price) sales, count(*) number_sales
 from store_sales
     ,item
     ,date_dim
 where ss_item_sk in (select ss_item_sk from cross_items)
   and ss_item_sk = i_item_sk
   and ss_sold_date_sk = d_date_sk
   and d_week_seq = (select d_week_seq
                     from date_dim
                     where d_year = 1999
                       and d_moy = 12
                       and d_dom = 3)
 group by i_brand_id,i_class_id,i_category_id
 having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) last_year
 where this_year.i_brand_id= last_year.i_brand_id
   and this_year.i_class_id = last_year.i_class_id
   and this_year.i_category_id = last_year.i_category_id
 order by this_year.channel, this_year.i_brand_id, this_year.i_class_id, this_year.i_category_id
 limit 100;

make

the README says 'make build' should work, however there is no target for 'build'. Also:

[bray@localhost sql-parser]$ make
Compiling...
bison_parser.y:71.20-26: error: syntax error, unexpected {...}
make[2]: *** [bison_parser.cpp] Error 1
make[1]: *** [parser/bison_parser.cpp] Error 2
make: *** [test] Error 1

Segment fault with example while using |create table statement|

I built the code on Mac and tried executing the example with below statement

./example "CREATE TABLE Persons (PersonID int);"

And this results in a SEGFAULT. So I tried to debug and got the following stack

$ lldb example
(lldb) target create "example"
Current executable set to 'example' (x86_64).
(lldb) r "CREATE TABLE Persons (PersonID int);"
Process 45712 launched: '/Users/user/workspace/sql-parser/example/example' (x86_64)
Parsed successfully!
Number of statements: 1
CreateStatment
	Persons
Process 45712 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00007fffdae6ab52 libsystem_c.dylib`strlen + 18
libsystem_c.dylib`strlen:
->  0x7fffdae6ab52 <+18>: pcmpeqb (%rdi), %xmm0
    0x7fffdae6ab56 <+22>: pmovmskb %xmm0, %esi
    0x7fffdae6ab5a <+26>: andq   $0xf, %rcx
    0x7fffdae6ab5e <+30>: orq    $-0x1, %rax
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00007fffdae6ab52 libsystem_c.dylib`strlen + 18
    frame #1: 0x0000000100092ccd libsqlparser.so`hsql::inprint(char const*, unsigned long) + 93
    frame #2: 0x0000000100000c19 example`main + 489
    frame #3: 0x00007fffdae34235 libdyld.dylib`start + 1
    frame #4: 0x00007fffdae34235 libdyld.dylib`start + 1

why the result is different in the whereclause between the select with count(*) and select *

[root@thomas-dev sql-parser-master]# ./example/example "select count(*) from t1 where doctor_dept_id = '0095' and doctor_name = 'jack'"
Parsed successfully!
Number of statements: 1
SelectStatement
Fields:
count
[root@thomas-dev sql-parser-master]#
[root@thomas-dev sql-parser-master]# ./example/example "select * from t1 where doctor_dept_id = '0095' and doctor_name = 'jack'"
Parsed successfully!
Number of statements: 1
SelectStatement
Fields:
*
Sources:
t1
Search Conditions:
AND
9
doctor_dept_id
0095
9
doctor_name
jack

Improve memory management

Currently the parse tree contains a large number of raw pointers to maintain the nodes. This means that we always need to explicitly delete all children upon deletion of a node. While we do have an automatic memory leak checker, the risk of memory leaks is still there and handling memory this way is cumbersome.

Possible improvements:

  • Use std::string instead of char*
  • Use smart pointers instead of raw pointers

We should measure the performance impact of using smart pointers. In general, I think the advantages of smart memory management will outweigh possible performance drawbacks.

Related Blog Post: Feeding a Bison with tasty C++11 grAST!

Clarify licsense.

Please make a statement with regard to the license this is released under.

Parse error in multiple table join

This is my query

select * from events join sensors on events.sid = sensors.sid join entities on entities.eid = sensors.eid where entities.name = "Ball 1" and events.v < 1000000

and here is the error message:
syntax error, unexpected JOIN, expecting $end (L0:62)

The library still performs well with the simple joining of two table.

Aliases starting with a digit are not supported

Aliases starting with a digit are evaluated as invalid.

Example 1:

SELECT x AS "2ndColumn"
FROM table1

Example 2 (TPC-DS query 50, netezza dialect):

select  
   s_store_name
  ,s_company_id
  ,s_street_number
  ,s_street_name
  ,s_street_type
  ,s_suite_number
  ,s_city
  ,s_county
  ,s_state
  ,s_zip
  ,sum(case when (sr_returned_date_sk - ss_sold_date_sk <= 30 ) then 1 else 0 end)  as "30 days" 
  ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 30) and 
                 (sr_returned_date_sk - ss_sold_date_sk <= 60) then 1 else 0 end )  as "31-60 days" 
  ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 60) and 
                 (sr_returned_date_sk - ss_sold_date_sk <= 90) then 1 else 0 end)  as "61-90 days" 
  ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 90) and
                 (sr_returned_date_sk - ss_sold_date_sk <= 120) then 1 else 0 end)  as "91-120 days" 
  ,sum(case when (sr_returned_date_sk - ss_sold_date_sk  > 120) then 1 else 0 end)  as ">120 days" 
from
   store_sales
  ,store_returns
  ,store
  ,date_dim d1
  ,date_dim d2
where
    d2.d_year = 2001
and d2.d_moy  = 8
and ss_ticket_number = sr_ticket_number
and ss_item_sk = sr_item_sk
and ss_sold_date_sk   = d1.d_date_sk
and sr_returned_date_sk   = d2.d_date_sk
and ss_customer_sk = sr_customer_sk
and ss_store_sk = s_store_sk
group by
   s_store_name
  ,s_company_id
  ,s_street_number
  ,s_street_name
  ,s_street_type
  ,s_suite_number
  ,s_city
  ,s_county
  ,s_state
  ,s_zip
order by s_store_name
        ,s_company_id
        ,s_street_number
        ,s_street_name
        ,s_street_type
        ,s_suite_number
        ,s_city
        ,s_county
        ,s_state
        ,s_zip
limit 100;

Unable to make after renaming the 'sql-parser' folder

I forked your repository and use it as a submodule.
Before I ran make I rename the folder to 'sqlParser' and then run the make command.
Results:

[user@localhost sqlParser]$ make
make -C src/parser/
make[1]: Entering directory `/home/user/general/core/external/sqlParser/src/parser'
bison bison_parser.y -v
bison_parser.y:73.20-26: error: syntax error, unexpected {...}
make[1]: *** [bison_parser.cpp] Error 1
make[1]: Leaving directory `/home/user/general/core/external/sqlParser/src/parser'
make: *** [parser] Error 2
[user@localhost sqlParser]$ 

how to support the sql syntax ?

the sql is following:

select * from (
SELECT o.OBJECT_NAME, o.OBJECT_ID ,'' short_name, decode(bitand(t.property, 32), 32, 'YES', 'NO') partitioned,
decode(bitand(t.property, 64), 64, 'IOT',
decode(bitand(t.property, 512), 512, 'IOT_OVERFLOW',
decode(bitand(t.flags, 536870912), 536870912, 'IOT_MAPPING', null))) iot_type,
o.OWNER OBJECT_OWNER, o.CREATED, o.LAST_DDL_TIME, O.GENERATED, O.TEMPORARY, case when xt.obj# is null then 'N' else 'Y' end EXTERNAL
FROM SYS.Dba_OBJECTS O ,sys.tab$ t, sys.external_tab$ xt
WHERE O.OWNER = :SCHEMA
and o.object_id = t.obj#(+)
and o.object_id = xt.obj#(+)
AND O.OBJECT_TYPE = 'TABLE'
union all
SELECT OBJECT_NAME, OBJECT_ID , syn.SYNONYM_NAME short_NAME, decode(bitand(t.property, 32), 32, 'YES', 'NO') partitioned,
decode(bitand(t.property, 64), 64, 'IOT',
decode(bitand(t.property, 512), 512, 'IOT_OVERFLOW',
decode(bitand(t.flags, 536870912), 536870912, 'IOT_MAPPING', null))) iot_type,
SYN.TABLE_OWNER OBJECT_OWNER, o.CREATED, o.LAST_DDL_TIME, O.GENERATED, O.TEMPORARY, case when xt.obj# is null then 'N' else 'Y' end EXTERNAL
FROM SYS.Dba_OBJECTS O, sys.user_synonyms syn,sys.tab$ t, sys.external_tab$ xt
WHERE syn.table_owner = o.owner
and syn.TABLE_NAME = o.object_NAME
and o.object_id = t.obj#
and o.object_id = xt.obj#(+)
and o.object_type = 'TABLE'
and :INCLUDE_SYNS = 1
)
WHERE OBJECT_NAME NOT IN (SELECT OBJECT_NAME FROM RECYCLEBIN)
AND not object_name like 'BIN$%'

when use the test :sql-parser-master/example "xxx", the xxx is the up sql , then run result is following

Given string is not a valid SQL query.
syntax error, unexpected $undefined (L0:507)

the current sql-parser-master can't parse the paramter field, the specail character as above "t.obj#", “xt.obj#(+)” ,"tab.$"

Table Alias Ambiguity

This statement is currently ambiguous:

SELECT * FROM table AS foo

Is the foo referring to "table" or to the Select query?
Find out how to resolve this...

UTF-8 support

It is not possible to pass non ascii identifiers right now. It would be great to support utf-8 queries.

Flex can process utf-8 characters with regex like this

%option 8bit

([A-Za-z0-9_]|[\xc2-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xf0-\xf4][\x80-\xbf][\x80-\xbf][\x80-\xbf])* {
    yylval->sval = strdup(yytext);
    return SQL_IDENTIFIER;
}

License mismatch

sql-parser/LICENSE currently states at the top:

This computer program can be used by anyone in accordance with the OpenSource
licence of the Hasso-Plattner Institute. Its terms of use can be downloaded
under the URL https://github.com/hyrise/hyrise/blob/master/LICENSE

and then goes on to spell out the terms of the Hasso-Plattner OpenSource license. However, the referenced hyrise/LICENSE is the standard MIT license.

Could you please clarify which license - Hasso-Plattner OpenSource or MIT - currently applies to sql-parser? Thanks!

SQL syntax caching

Does this parser (or flex/bison) support caching syntax? I realize it very take time when parsing the same syntax with different argument several times. It would be better if we cache them.

P/s: maybe the link about your paper already died(http://torpedro.com/paper/HyriseSQL-03-2015.pdf), it responsed code 404. Could you update it pls?

multiple UNION support

TPC-DS queries contain multiple, not nested UNION operations.

Example: TPC-DS query 71, netezza dialect:

select i_brand_id brand_id, i_brand brand,t_hour,t_minute,
 	sum(ext_price) ext_price
 from item, (select ws_ext_sales_price as ext_price, 
                        ws_sold_date_sk as sold_date_sk,
                        ws_item_sk as sold_item_sk,
                        ws_sold_time_sk as time_sk  
                 from web_sales,date_dim
                 where d_date_sk = ws_sold_date_sk
                   and d_moy=12
                   and d_year=2002
                 union all
                 select cs_ext_sales_price as ext_price,
                        cs_sold_date_sk as sold_date_sk,
                        cs_item_sk as sold_item_sk,
                        cs_sold_time_sk as time_sk
                 from catalog_sales,date_dim
                 where d_date_sk = cs_sold_date_sk
                   and d_moy=12
                   and d_year=2002
                 union all
                 select ss_ext_sales_price as ext_price,
                        ss_sold_date_sk as sold_date_sk,
                        ss_item_sk as sold_item_sk,
                        ss_sold_time_sk as time_sk
                 from store_sales,date_dim
                 where d_date_sk = ss_sold_date_sk
                   and d_moy=12
                   and d_year=2002
                 ) tmp,time_dim
 where
   sold_item_sk = i_item_sk
   and i_manager_id=1
   and time_sk = t_time_sk
   and (t_meal_time = 'breakfast' or t_meal_time = 'dinner')
 group by i_brand, i_brand_id,t_hour,t_minute
 order by ext_price desc, i_brand_id
 ;

DB support list

Could you please help to list what databases are supported in this parser?
Oracle/Mysql/DB2/etc.

Run benchmark error

make benchmark

g++ -std=c++11 -Wall -Isrc/ -L./ -O3 benchmark/parser_benchmark.cpp benchmark/benchmark.cpp benchmark/queries.cpp benchmark/benchmark_utils.cpp -o bin/benchmark -lbenchmark -lpthread -lsqlparser -lstdc++ -lstdc++fs
benchmark/benchmark.cpp: In function ‘int main(int, char**)’:
benchmark/benchmark.cpp:11:82: error: no matching function for call to ‘RegisterBenchmark(const char*, void ()(benchmark::State&, const string&), const std::basic_string&)’
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);
^
benchmark/benchmark.cpp:11:82: note: candidates are:
In file included from benchmark/benchmark.cpp:1:0:
/usr/local/include/benchmark/benchmark.h:958:29: note: benchmark::internal::Benchmark
benchmark::RegisterBenchmark(const char*, void ()(benchmark::State&))
inline internal::Benchmark
RegisterBenchmark(const char* name,
^
/usr/local/include/benchmark/benchmark.h:958:29: note: candidate expects 2 arguments, 3 provided
/usr/local/include/benchmark/benchmark.h:966:22: note: template benchmark::internal::Benchmark* benchmark::RegisterBenchmark(const char*, Lambda&&)
internal::Benchmark* RegisterBenchmark(const char* name, Lambda&& fn) {
^
/usr/local/include/benchmark/benchmark.h:966:22: note: template argument deduction/substitution failed:
benchmark/benchmark.cpp:11:82: note: candidate expects 2 arguments, 3 provided
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);
^
benchmark/benchmark.cpp:13:85: error: no matching function for call to ‘RegisterBenchmark(const char*, void ()(benchmark::State&, const string&), const std::basic_string&)’
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
^
benchmark/benchmark.cpp:13:85: note: candidates are:
In file included from benchmark/benchmark.cpp:1:0:
/usr/local/include/benchmark/benchmark.h:958:29: note: benchmark::internal::Benchmark
benchmark::RegisterBenchmark(const char*, void ()(benchmark::State&))
inline internal::Benchmark
RegisterBenchmark(const char* name,
^
/usr/local/include/benchmark/benchmark.h:958:29: note: candidate expects 2 arguments, 3 provided
/usr/local/include/benchmark/benchmark.h:966:22: note: template benchmark::internal::Benchmark* benchmark::RegisterBenchmark(const char*, Lambda&&)
internal::Benchmark* RegisterBenchmark(const char* name, Lambda&& fn) {
^
/usr/local/include/benchmark/benchmark.h:966:22: note: template argument deduction/substitution failed:
benchmark/benchmark.cpp:13:85: note: candidate expects 2 arguments, 3 provided
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
^
benchmark/benchmark.cpp:20:82: error: no matching function for call to ‘RegisterBenchmark(const char*, void ()(benchmark::State&, const string&), const std::basic_string&)’
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);
^
benchmark/benchmark.cpp:20:82: note: candidates are:
In file included from benchmark/benchmark.cpp:1:0:
/usr/local/include/benchmark/benchmark.h:958:29: note: benchmark::internal::Benchmark
benchmark::RegisterBenchmark(const char*, void ()(benchmark::State&))
inline internal::Benchmark
RegisterBenchmark(const char* name,
^
/usr/local/include/benchmark/benchmark.h:958:29: note: candidate expects 2 arguments, 3 provided
/usr/local/include/benchmark/benchmark.h:966:22: note: template benchmark::internal::Benchmark* benchmark::RegisterBenchmark(const char*, Lambda&&)
internal::Benchmark* RegisterBenchmark(const char* name, Lambda&& fn) {
^
/usr/local/include/benchmark/benchmark.h:966:22: note: template argument deduction/substitution failed:
benchmark/benchmark.cpp:20:82: note: candidate expects 2 arguments, 3 provided
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);
^
benchmark/benchmark.cpp:23:85: error: no matching function for call to ‘RegisterBenchmark(const char*, void ()(benchmark::State&, const string&), const std::basic_string&)’
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
^
benchmark/benchmark.cpp:23:85: note: candidates are:
In file included from benchmark/benchmark.cpp:1:0:
/usr/local/include/benchmark/benchmark.h:958:29: note: benchmark::internal::Benchmark
benchmark::RegisterBenchmark(const char*, void ()(benchmark::State&))
inline internal::Benchmark
RegisterBenchmark(const char* name,
^
/usr/local/include/benchmark/benchmark.h:958:29: note: candidate expects 2 arguments, 3 provided
/usr/local/include/benchmark/benchmark.h:966:22: note: template benchmark::internal::Benchmark* benchmark::RegisterBenchmark(const char*, Lambda&&)
internal::Benchmark* RegisterBenchmark(const char* name, Lambda&& fn) {
^
/usr/local/include/benchmark/benchmark.h:966:22: note: template argument deduction/substitution failed:
benchmark/benchmark.cpp:23:85: note: candidate expects 2 arguments, 3 provided
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
^
benchmark/queries.cpp:3:35: fatal error: experimental/filesystem: No such file or directory
#include <experimental/filesystem>
^
compilation terminated.
make: *** [bin/benchmark] Error 1

OVER (PARTITION BY …) support

TPC-DS queries contain the OVER() clause. Using this clause, PARTITION BY is often used additionally.

Example: TPC-DS query 12, natezza dialect:

select  i_item_id
      ,i_item_desc 
      ,i_category 
      ,i_class 
      ,i_current_price
      ,sum(ws_ext_sales_price) as itemrevenue 
      ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over
          (partition by i_class) as revenueratio
from	
	web_sales
    	,item 
    	,date_dim
where 
	ws_item_sk = i_item_sk 
  	and i_category in ('Men', 'Books', 'Electronics')
  	and ws_sold_date_sk = d_date_sk
	and d_date between cast('2001-06-15' as date) 
				and (cast('2001-06-15' as date) + 30 days)
group by 
	i_item_id
        ,i_item_desc 
        ,i_category
        ,i_class
        ,i_current_price
order by 
	i_category
        ,i_class
        ,i_item_id
        ,i_item_desc
        ,revenueratio
limit 100;

Arbitrary text allowed behind SQL query

(debug)> SELECT * FROM t ICANDOMAGICHERE;
=== Columns
|       a|       b|
|     int|     int|
=== Chunk 0 ===
|   12345|       1|
|     123|       2|
|    1234|       3|
===
3 rows total (COMPILE: 13 µs, EXECUTE: 207 µs (wall time))
(debug)> SELECT * FROM t WHERE a=b ICANDOMAGICHERE;
SQL query not valid.
SQL query:
==========
SELECT * FROM t WHERE a=b ICANDOMAGICHERE;
                          ^=== ERROR HERE!
==========
Error line: 0
Error column: 26
Error message: syntax error, unexpected IDENTIFIER, expecting $end

Multiple "WHEN" conditions not supported in a "CASE" statement

The following statement has two "WHEN" conditions and an error ("syntax error, unexpected WHEN") is returned when trying to parse it by calling SQLParser::parseSQLString:

SELECT (CASE
WHEN ("Calcs"."int0" = "NULL") OR ("Calcs"."int1" = "NULL") THEN "NULL"
WHEN "Calcs"."int0" > "Calcs"."int1" THEN "Calcs"."int0"
ELSE "Calcs"."int1" END) AS "TEMP_Test__1523549003__0_"
FROM "Calcs"

If the statement is changed to only contain one "WHEN" condition, then the parsing succeeds:
SELECT (CASE
WHEN ("Calcs"."int0" = "NULL") OR ("Calcs"."int1" = "NULL") THEN "NULL"
ELSE "Calcs"."int1" END) AS "TEMP_Test__1523549003__0_"
FROM "Calcs"

Unable to use in Windows

I tried to use the parser in Windows with Visual studio 2012. I was finally somehow succeeded in compiling the basic project. But the flex_lexer.cpp file throws the fatal error that the limit of nested conditions exceeded. i.e. "flex_lexer.l(215): fatal error C1061: compiler limit : blocks nested too deeply"
Can you please help me out here what i am missing and how can i make this thing to work.

why i run the make in the src/parser , fail ?

/home/cxx/sql-parser-master/src/parser
[root@thomas-dev parser]# make clean;make
rm -f bison_parser.cpp flex_lexer.cpp bison_parser.h flex_lexer.h *.output
bison (GNU Bison) 2.4.1
bison bison_parser.y --output=bison_parser.cpp --defines=bison_parser.h --verbose
bison_parser.y:63.18-21: syntax error, unexpected identifier
make: *** [bison_parser.cpp] 错误 1
[root@thomas-dev parser]# pwd
/home/cxx/sql-parser-master/src/parser

CAST support

TPC-DS queries (5, 12, 18, 21, ...) contain the CAST function which is currently not supported by the parser.

TPC-DS query 12, natezza dialect:

select  i_item_id
      ,i_item_desc 
      ,i_category 
      ,i_class 
      ,i_current_price
      ,sum(ws_ext_sales_price) as itemrevenue 
      ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over
          (partition by i_class) as revenueratio
from	
	web_sales
    	,item 
    	,date_dim
where 
	ws_item_sk = i_item_sk 
  	and i_category in ('Men', 'Books', 'Electronics')
  	and ws_sold_date_sk = d_date_sk
	and d_date between cast('2001-06-15' as date) 
				and (cast('2001-06-15' as date) + 30 days)
group by 
	i_item_id
        ,i_item_desc 
        ,i_category
        ,i_class
        ,i_current_price
order by 
	i_category
        ,i_class
        ,i_item_id
        ,i_item_desc
        ,revenueratio
limit 100;

MSVC build

It seems that it is possible to build sql-parser with msvc (I built it with msvc 2015), however there is no <unistd.h> on Windows.

Properly test the print methods

As seen in #111, the print* methods are entirely untested. Since we want this to be useful for people outside of Hyrise, too, we should make sure that we add tests for this (and any other untested parts).

Dose "SQLStatementTransformer" have been implemented or not?

We read the paper provided in README.md and want to follow this work. However, I fail to find the SQLStatementTransformer and SQLQueryTask (in codes) which are described in your paper. So I wonder if these modulars have not been implemented?

Use Makefile Generation Script

Use one of the following:

  • Cmake
  • automake
  • premake

Update 09 Dec 2019: cmake seems to have won the race in open-source software. Let's stick to that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.