The dataframe_sql from zbrookle

Issue in join queries

Hello,

I tried to execute the below three queries using dataframe_sql. but only first and third are working, ideally, all three should work as there in proper SQL format. Its saying error as - "Joined tables have overlapping names: ['pid', 'pdh', 'enddate', 'dep_head', 'serialNo']".

SELECT table1.pdh as pdh
FROM table1 as t1
inner join table2 as t2 on (t2.serialNo=t1.serialNo and t2.pdh=t1.pdh)
SELECT table1.pdh as pdh
FROM table1 as t1
inner join table2 as t2 on (t2.serialNo=t1.serialNo and t2.pdh=t1.pdh)
where s2 < '100'
SELECT *
FROM table1 as t1
inner join table2 as t2 on (t2.serialNo=t1.serialNo and t2.pdh=t1.pdh)
where s2 < '100'

Add support for delete and update statements

ENH: Add a query optimizer or factor in query optimization techniques

BUG: Look into issues with multiple joins

can not use <(less than) or >(greater than) in case condition

mergequery = """
select  distinct * 
from mergeTable
where 
case when Status='52wkHigh' then strike < lastPrice
else strike > lastPrice end 
"""
rr = query(mergequery)

Getting below error

module 'sqlalchemy' has no attribute 'Binary'

I'm try to do a simple query.
I load my DataFrame (Pandas) 600.000 rows - OK
After:
q = '''SELECT id_data FROM pd LIMIT 10;'''
dataframe_sql.register_temp_table(pdDataFrame, "my_table")
dataframe_sql.query(q)

And fails:
AttributeError: module 'sqlalchemy' has no attribute 'Binary'

I'm not sure but I found that:
https://stackoverflow.com/questions/65851741/attributeerror-module-sqlalchemy-has-no-attribute-binary

Am I making some mistake or the issue is my sqlalchemy version?

Package Version

astroid 2.4.2
colorama 0.4.4
dataframe-sql 0.4.0
greenlet 1.1.0
ibis-framework 1.4.0
isort 5.7.0
lark-parser 0.8.5
lazy-object-proxy 1.4.3
mccabe 0.6.1
multipledispatch 0.6.0
numpy 1.20.1
pandas 1.2.3
pandasql 0.7.3
pip 21.1.3
psycopg2 2.8.6
pylint 2.6.0
pyModbusTCP 0.1.8
pyserial 3.5
python-dateutil 2.8.1
pytz 2021.1
regex 2021.7.6
setuptools 49.2.1
six 1.15.0
sql-to-ibis 0.4.0
SQLAlchemy 1.4.21
tabulate 0.8.7
toml 0.10.2
toolz 0.11.1
uModbus 1.0.4
urllib3 1.26.3
wrapt 1.12.1

How to query a data frame with spaces in the column names and/or numbers as column names

Hi,

I love this package to write sql queries on pandas data frames. But, I have few data sets. They usually have spaces in the column names. Also in some cases only just numbers as the columns such as year. Could you please help me, how to address this issue. Also, kindly help with some documentation if you have any.

Support for Py3.6

Hi Zach,
Thanks for this amazing tool!

Is it possible for this package to work in Python 3.6? If yes, are there any plans to support it?

AttributeError: module 'sqlalchemy' has no attribute 'Binary'

Currently we can't use this repository on kaggle and colab because of AttributeError: module 'sqlalchemy' has no attribute 'Binary' .For ibis repository issue ,it seems it has been fixed a year ago .Could you provide a minize env setup example ?

How to use Count(distinct ID)

Dear zbrookle,
thank you for this useful package.

I'm currently trying to find duplicates and want to apply a distinct count.
However, count(distinct ID_Nr) seems not to work.

query("""SELECT COUNT(DISTINCT ID_NR) FROM data""")

Do you have any suggestions on how I can do this with your package?

Thank you,
Best,
Minh

ENH: Add versioning support with versioneer

AttributeError: module 'sqlalchemy' has no attribute 'Binary'

Version : dataframe-sql-0.4.0

Error when calling trying to import the package.

import dataframe_sql

Full trace:

Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import dataframe_sql
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\dataframe_sql\__init__.py", line 2, in <module>
    from dataframe_sql.sql_select_query import query, register_temp_table, remove_temp_table
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\dataframe_sql\sql_select_query.py", line 4, in <module>
    import ibis
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\ibis\__init__.py", line 36, in <module>
    from ibis.backends import sqlite  # noqa: F401
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\ibis\backends\sqlite\__init__.py", line 16, in <module>
    from .client import SQLiteClient
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\ibis\backends\sqlite\client.py", line 11, in <module>
    import ibis.backends.base_sqlalchemy.alchemy as alch
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\ibis\backends\base_sqlalchemy\alchemy.py", line 47, in <module>
    dt.Binary: sa.Binary,
AttributeError: module 'sqlalchemy' has no attribute 'Binary'
>>> import dataframe_sql
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\dataframe_sql\__init__.py", line 2, in <module>
    from dataframe_sql.sql_select_query import query, register_temp_table, remove_temp_table
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\dataframe_sql\sql_select_query.py", line 4, in <module>
    import ibis
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\ibis\__init__.py", line 36, in <module>
    from ibis.backends import sqlite  # noqa: F401
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\ibis\backends\sqlite\__init__.py", line 16, in <module>
    from .client import SQLiteClient
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\ibis\backends\sqlite\client.py", line 11, in <module>
    import ibis.backends.base_sqlalchemy.alchemy as alch
  File "C:\Users\siroh\PycharmProjects\anywhereData\venv\lib\site-packages\ibis\backends\base_sqlalchemy\alchemy.py", line 47, in <module>
    dt.Binary: sa.Binary,
AttributeError: module 'sqlalchemy' has no attribute 'Binary'

Binary has been removed from sqlalchemy package

https://docs.sqlalchemy.org/en/14/changelog/changelog_14.html

Can you please address this issue?

pandas dependency

Hi! Thanks for making this very cool package.

Just noticed install_requires has a hard dep "pandas == 1.0.1". can it be relaxed to support 1.0.x ?

Thanks again!

Not support three table join

query("""select count(df1.id) as n from df1 left join df2 on df2.id = df1.id left join df on df.id=df1.id""")

error log:

VisitError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\sql_to_ibis\sql_select_query.py in parse_sql(self)
136 # ambiguous column references are not distorted
--> 137 ).transform(tree)
138 except UnexpectedToken as err:

~\Anaconda3\lib\site-packages\lark\visitors.py in transform(self, tree)
104 def transform(self, tree):
--> 105 return self._transform_tree(tree)
106

~\Anaconda3\lib\site-packages\lark\visitors.py in _transform_tree(self, tree)
100 def _transform_tree(self, tree):
--> 101 children = list(self._transform_children(tree.children))
102 return self._call_userfunc(tree, children)

~\Anaconda3\lib\site-packages\lark\visitors.py in _transform_children(self, children)
91 if isinstance(c, Tree):
---> 92 yield self._transform_tree(c)
93 elif self.visit_tokens and isinstance(c, Token):

~\Anaconda3\lib\site-packages\lark\visitors.py in _transform_tree(self, tree)
100 def _transform_tree(self, tree):
--> 101 children = list(self._transform_children(tree.children))
102 return self._call_userfunc(tree, children)

~\Anaconda3\lib\site-packages\lark\visitors.py in _transform_children(self, children)
91 if isinstance(c, Tree):
---> 92 yield self._transform_tree(c)
93 elif self.visit_tokens and isinstance(c, Token):

~\Anaconda3\lib\site-packages\lark\visitors.py in _transform_tree(self, tree)
100 def _transform_tree(self, tree):
--> 101 children = list(self._transform_children(tree.children))
102 return self._call_userfunc(tree, children)

~\Anaconda3\lib\site-packages\lark\visitors.py in _transform_children(self, children)
91 if isinstance(c, Tree):
---> 92 yield self._transform_tree(c)
93 elif self.visit_tokens and isinstance(c, Token):

~\Anaconda3\lib\site-packages\lark\visitors.py in _transform_tree(self, tree)
101 children = list(self._transform_children(tree.children))
--> 102 return self._call_userfunc(tree, children)
103

~\Anaconda3\lib\site-packages\lark\visitors.py in _call_userfunc(self, tree, new_children)
71 except Exception as e:
---> 72 raise VisitError(tree.data, tree, e)
73

VisitError: Error trying to process rule "select":

'Join' object has no attribute 'name'

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last)
in
----> 1 query("""select count(df1.id) as n from df1 left join df2 on df2.id = df1.id left join df on df.id=df1.id""")

~\Anaconda3\lib\site-packages\dataframe_sql\sql_select_query.py in query(sql)
94
95 """
---> 96 return ibis_query(sql).execute()

~\Anaconda3\lib\site-packages\sql_to_ibis\sql_select_query.py in query(sql)
111
112 """
--> 113 return SqlToTable(sql).ibis_expr
114
115

~\Anaconda3\lib\site-packages\sql_to_ibis\sql_select_query.py in init(self, sql)
120 self.sql = sql
121
--> 122 self.ast = self.parse_sql()
123 self.ibis_expr = self.ast
124

~\Anaconda3\lib\site-packages\sql_to_ibis\sql_select_query.py in parse_sql(self)
150 else:
151 break
--> 152 raise curr_err
153
154

~\Anaconda3\lib\site-packages\lark\visitors.py in _call_userfunc(self, tree, new_children)
64 wrapper = getattr(f, 'visit_wrapper', None)
65 if wrapper is not None:
---> 66 return f.visit_wrapper(f, tree.data, children, tree.meta)
67 else:
68 return f(children)

~\Anaconda3\lib\site-packages\lark\visitors.py in _vargs_inline(f, data, children, meta)
311
312 def _vargs_inline(f, data, children, meta):
--> 313 return f(*children)
314 def _vargs_meta_inline(f, data, children, meta):
315 return f(meta, *children)

~\Anaconda3\lib\site-packages\lark\visitors.py in f(self, *args, **kwargs)
295 if with_self:
296 def f(self, *args, **kwargs):
--> 297 return _f(self, *args, **kwargs)
298 else:
299 def f(self, *args, **kwargs):

~\Anaconda3\lib\site-packages\sql_to_ibis\parsing\sql_parser.py in select(self, *select_expressions)
356 self._column_to_table_name,
357 self._table_name_map,
--> 358 self._alias_registry,
359 )
360

~\Anaconda3\lib\site-packages\sql_to_ibis\parsing\transformers.py in init(self, tables, table_map, column_name_map, column_to_table_name, table_name_map, alias_registry)
119 )
120 self._tables = tables
--> 121 self._table_names_list = [table.name for table in tables]
122 self._column_to_table_name = column_to_table_name.copy() # This must be
123 # copied because when ambiguity is resolved in the following method,

~\Anaconda3\lib\site-packages\sql_to_ibis\parsing\transformers.py in (.0)
119 )
120 self._tables = tables
--> 121 self._table_names_list = [table.name for table in tables]
122 self._column_to_table_name = column_to_table_name.copy() # This must be
123 # copied because when ambiguity is resolved in the following method,

AttributeError: 'Join' object has no attribute 'name'

ENH: Add window function support

Related to discussion in yhat/pandasql#63

BUG: Not Standard/Expected SQL Syntax

Appreciate your work.

Issue

I am usingdataframe_sql to perform some simple queries. However, I found that some of the query syntaxes that dataframe_sql implemented were not standard enough for me.

Reproduce

Input

from pandas import DataFrame
from dataframe_sql import register_temp_table, query

demo_df = DataFrame({'A':[1,2,3], 'B':[4,5,6]})
register_temp_table(demo_df, "demo_df")
query("""SELECT A FROM demo_df WHERE B > 4 LIMIT 1""")

Outout

KeyError                                  Traceback (most recent call last)
~\Miniconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'B'

Expected Output

+----+
| A  | 
+----+
| 2  | 
+----+

Possible Solution

Input

# NOTE: Change the Query
query("""SELECT A,B FROM demo_df WHERE B > 4 LIMIT 1""")

Outout

+----+----+
| A  | B  | 
+----+----+
| 2  | 5  | 
+----+----+

Question

Although I can use the new query as mentioned above, I get the additional column(s) in the result and has to rewrite some of the queries in my usage for the reason that the syntax has a little bit different from the normal syntax

Conda distribution is required

As of now, there is no conda distribution for this package, only pip is there, but our operations support team can only use conda distribution due to limited access privileges on the shared node that we have. So a conda distribution package will be very well acknowledged.

ENH: Add support for azure pipeline

DOC: Write a syntax guide for the SQL

ENH: Add support for dask and rapids

ENH: Create a release pipeline

error - [where variable is not null], [where variable != '']

Hi, thank you for the great package!
Could I ask if there is any way to achieve checks for empty/null fields?

Example:
"""SELECT variable FROM table WHERE variable != '' LIMIT 100"""

Returns:

Invalid query!
Expected one of the following input(s): {'__ANON_5'}
Unexpected input at line 3, column 56
lding_building_name_building_number != ''

How to include count(*)

Select count(*) from my_table
This doesn't work

AttributeError: module 'sqlalchemy' has no attribute 'Binary'

I was trying to run SQL query on my pandas DataFrame, but got the following error
AttributeError: module 'sqlalchemy' has no attribute 'Binary'
It looks like a bagg

import dataframe_sql as df_sql

df_sql.register_temp_table(df, "my_table")
df_sql.query("""select count() from my_table""")

My DataFrame

df.dtypes

LocationID int32
DateCreated_UTC datetime64[ns]
Latitude float32
Longitude float32
ServiceStatusID int8
year int64
month int64
dtype: object

Full trace:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-03f721a36d45> in <module>
----> 1 import dataframe_sql as df_sql
      2 
      3 df_sql.register_temp_table(df, "my_table")
      4 
      5 df_sql.query("""select count() from my_table""")

/opt/conda/lib/python3.7/site-packages/dataframe_sql/__init__.py in <module>
      1 # flake8: noqa
----> 2 from dataframe_sql.sql_select_query import query, register_temp_table, remove_temp_table
      3 
      4 from ._version import get_versions
      5 

/opt/conda/lib/python3.7/site-packages/dataframe_sql/sql_select_query.py in <module>
      2 Convert dataframe_sql statement to run on pandas dataframes
      3 """
----> 4 import ibis
      5 from pandas import DataFrame
      6 from sql_to_ibis import (

/opt/conda/lib/python3.7/site-packages/ibis/__init__.py in <module>
     34 with suppress(ImportError):
     35     # pip install ibis-framework[sqlite]
---> 36     from ibis.backends import sqlite  # noqa: F401
     37 
     38 with suppress(ImportError):

/opt/conda/lib/python3.7/site-packages/ibis/backends/sqlite/__init__.py in <module>
     14 
     15 
---> 16 from .client import SQLiteClient
     17 from .compiler import dialect, rewrites  # noqa: F401
     18 

/opt/conda/lib/python3.7/site-packages/ibis/backends/sqlite/client.py in <module>
      9 import sqlalchemy as sa
     10 
---> 11 import ibis.backends.base_sqlalchemy.alchemy as alch
     12 from ibis.client import Database
     13 

/opt/conda/lib/python3.7/site-packages/ibis/backends/base_sqlalchemy/alchemy.py in <module>
     45     dt.Time: sa.Time,
     46     dt.Boolean: sa.Boolean,
---> 47     dt.Binary: sa.Binary,
     48     dt.String: sa.Text,
     49     dt.Decimal: sa.NUMERIC,

AttributeError: module 'sqlalchemy' has no attribute 'Binary'

subqueries experience syntax errors

Subqueries using an IN or NOT IN statement are presently failing. Example:

SELECT variable_1, variable_2 FROM table_1 WHERE variable_2 IN (SELECT
variable_3 FROM table_2)

Error when joining on multiple columns

When joining on multiple columns "IndexError: list index out of range" will occur.

query(""" select a1.symbol, a1.date_count, a2.date_count_d
from
(select symbol, count(date) as date_count from zh_a_all group by symbol) a1
inner join
(select symbol, count(date) as date_count_d from (select distinct symbol, date from zh_a_all) a group by symbol)a2
on a1.symbol = a2.symbol and a1.date_count = a2.date_count_d
""")
[Column(final_name=symbol, value=IbisStringColumn(), name=symbol, table=<sql_to_ibis.sql_objects.Table object at 0x00000250AD580C88>), Aggregate(final_name=date_count, value=IbisIntegerScalar(), alias=date_count), Token(from_expression, <sql_to_ibis.sql_objects.Table object at 0x00000250AE94BCC8>), GroupByColumn(final_name=symbol, value=IbisStringColumn(), name=symbol, table=None)]
PandasTable[table]
name: zh_a_all
schema:
date : timestamp
open : float64
high : float64
low : float64
close : float64
volume : float64
outstanding_share : float64
turnover : float64
symbol : string
[Token(SELECT_CONSTRAINT, 'distinct'), Column(final_name=symbol, value=IbisStringColumn(), name=symbol, table=<sql_to_ibis.sql_objects.Table object at 0x00000250AD580C88>), Column(final_name=date, value=IbisTimestampColumn(), name=date, table=<sql_to_ibis.sql_objects.Table object at 0x00000250AD580C88>), Token(from_expression, <sql_to_ibis.sql_objects.Table object at 0x00000250AD4A3648>)]
PandasTable[table]
name: zh_a_all
schema:
date : timestamp
open : float64
high : float64
low : float64
close : float64
volume : float64
outstanding_share : float64
turnover : float64
symbol : string
[Column(final_name=symbol, value=IbisStringColumn(), name=symbol, table=Subquery(name=a, value=ref_0
PandasTable[table]
name: zh_a_all
schema:
date : timestamp
open : float64
high : float64
low : float64
close : float64
volume : float64
outstanding_share : float64
turnover : float64
symbol : string

ref_1
Selection[table]
table:
Table: ref_0
selections:
symbol = Column[string*] 'symbol' from table
ref_0
date = Column[timestamp*] 'date' from table
ref_0

Distinct[table]
table:
Table: ref_1)), Aggregate(final_name=date_count_d, value=IbisIntegerScalar(), alias=date_count_d), Token(from_expression, Subquery(name=a, value=ref_0
PandasTable[table]
name: zh_a_all
schema:
date : timestamp
open : float64
high : float64
low : float64
close : float64
volume : float64
outstanding_share : float64
turnover : float64
symbol : string

ref_1
Selection[table]
table:
Table: ref_0
selections:
symbol = Column[string*] 'symbol' from table
ref_0
date = Column[timestamp*] 'date' from table
ref_0

Distinct[table]
table:
Table: ref_1)), GroupByColumn(final_name=symbol, value=IbisStringColumn(), name=symbol, table=None)]
ref_0
PandasTable[table]
name: zh_a_all
schema:
date : timestamp
open : float64
high : float64
low : float64
close : float64
volume : float64
outstanding_share : float64
turnover : float64
symbol : string

ref_1
Selection[table]
table:
Table: ref_0
selections:
symbol = Column[string*] 'symbol' from table
ref_0
date = Column[timestamp*] 'date' from table
ref_0

Distinct[table]
table:
Table: ref_1

VisitError Traceback (most recent call last)
~\anaconda3\lib\site-packages\sql_to_ibis\sql_select_query.py in parse_sql(self)
135 # ambiguous column references are not distorted
--> 136 ).transform(tree)
137 except UnexpectedToken as err:

~\anaconda3\lib\site-packages\lark\visitors.py in transform(self, tree)
104 def transform(self, tree):
--> 105 return self._transform_tree(tree)
106

~\anaconda3\lib\site-packages\lark\visitors.py in _transform_tree(self, tree)
100 def _transform_tree(self, tree):
--> 101 children = list(self._transform_children(tree.children))
102 return self._call_userfunc(tree, children)