datafuselabs / databend-py Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 5.0 180 KB

Databend Cloud Python Driver with native interface support

License: Apache License 2.0

Python 99.72% Makefile 0.28%

database databend driver python

databend-py's People

Contributors

Stargazers

Watchers

Forkers

wolfv pengye91 everpcpc

databend-py's Issues

Support client.upload

upload a file to database.table

`databend://` dsn not work in cloud

databend-py: 0.5.8

Follow this doc:
https://docs.databend.com/guides/sql-clients/developers/python

from databend_py import Client

client = Client.from_url(f"databend://{USER}:{PASSWORD}@${HOST}:443/{DATABASE}?&warehouse={WAREHOUSE_NAME})
client.execute('DROP TABLE IF EXISTS data')
client.execute('CREATE TABLE if not exists data (x Int32,y VARCHAR)')
client.execute('DESC  data')
client.execute("INSERT INTO data (Col1,Col2) VALUES ", [1, 'yy', 2, 'xx'])
_, res = client.execute('select * from data')
print(res)

databend:// does not work, return errors:

/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
Please enter the DSN for the database connection: databend://bohu:***@***--small.gw.aws-us-east-2.default.databend.com:443/***?&warehouse=small
http error on http://***--askbend-small.gw.aws-us-east-2.default.databend.com:443/v1/query/

Support replace into with attachment

use insert with attachment instead of copy into
support replace into

[bug] Read `boolean` type failed

ischema_names = {
"int": INTEGER,
"int64": INTEGER,
"int32": INTEGER,
"int16": INTEGER,
"int8": INTEGER,
"uint64": INTEGER,
"uint32": INTEGER,
"uint16": INTEGER,
"uint8": INTEGER,
"decimal": DECIMAL,
"date": DATE,
"timestamp": DATETIME,
"float": FLOAT,
"double": FLOAT,
"float64": FLOAT,
"float32": FLOAT,
"string": VARCHAR,
"array": ARRAY,
"map": MAP,
"json": JSON,
"varchar": VARCHAR,
}

There is no bool.

Trans `copy into` to `stage withAttachment`

The copy into need the column order same with schema and it is inconvenient. So we use stage Attachement instead.

generate csv buffer in memory instead of a file

https://github.com/databendcloud/databend-py/blob/d1131e83dce91844a699eb141a0144345e5570bc/databend_py/client.py#L131

it's less likely to make OOM since the values itself also comes from memory.

duplicated columns when data count greater than 10000

Version

databend_py: 0.4.2
databend_query: 1.1.45-nightly

Detail

import databend_py
conn = databend_py.Client(host='127.0.0.1',port=8000)
columns, data =  conn.execute('SELECT * FROM numbers(10001)',with_column_types=True)
print(columns)

columns:
[('number', 'UInt64'), ('number', 'UInt64')]
instead of
[('number', 'UInt64')]

When the amount of data exceeds 10000, the data will be returned by paging. function store will append columns multi times

https://github.com/databendcloud/databend-py/blob/6ebf8f11727834eb350cc52f805d815a457a508f/databend_py/result.py#LL44C10-L44C10

Batch insert error with multi query node in k8s

Version:

databend-query Version: 1.1.29-nightly
databend-py version: 0.3.9

Deploy method:
kubernetes helm charts

3 pods of databend-meta
2 pod of databend-query.
1 ClusterIP Service for databend-query

Description:
When execute batch insert, client will upload csv to object storage and then execute COPY INTO , and receive_result with query id.
But the kubernetes Service uses a load balancing strategy, which receive_result request maybe send to different query node, and got http request error: query id not found

Temporary Solution:
Set Kubernetes Service sessionAffinity to ClientIP, make all requests made by the same Pod go to the same node, but if use multiprocessing in one pod, all requests will send to single query node.

I have no ideas on how to fix this issue completely, It seems the receive_result request must be sent to specific query node.

0.4.7 fails imports

It seems the latest version of this package is relying on sdk_info.py reading a VERSION file that is included in the repo, but not in the actual pip source. I think this is an issue with the setup.py file but I'm not sure how to fix it.

Logs:

>>> from databend_py import Client
  File "/usr/local/lib/python3.8/site-packages/databend_py/__init__.py", line 1, in <module>
    from .client import Client
  File "/usr/local/lib/python3.8/site-packages/databend_py/client.py", line 4, in <module>
    from .connection import Connection
  File "/usr/local/lib/python3.8/site-packages/databend_py/connection.py", line 16, in <module>
    headers = {'Content-Type': 'application/json', 'User-Agent': sdk_info(), 'Accept': 'application/json',
  File "/usr/local/lib/python3.8/site-packages/databend_py/sdk_info.py", line 18, in sdk_info
    return f"{sdk_lan()}/{sdk_version()}"
  File "/usr/local/lib/python3.8/site-packages/databend_py/sdk_info.py", line 8, in sdk_version
    with open(version_py, encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.8/site-packages/databend_py/VERSION'

Add unit test and integration test in CI

Add github CI
Add unit tests and integration tests

add `purge` option to `COPY INTO` command

when execute batch insert with params, client will upload csv file to object storage.
I think client should pass PURGE option to query node, or the csv file will not be deleted from the storage

document:
https://databend.rs/doc/sql-commands/dml/dml-copy-into-table#file_format

Support client.insert

example:

row1 = [1000, 'String Value 1000', 5.233]
row2 = [2000, 'String Value 2000', -107.04]
data = [row1, row2]
client.insert('new_table', data, column_names=['key', 'value', 'metric'])

support put file to stage directly

display logs to show the distribution of time in a insert operation when ?debug=True

currently when we insert data into tables, we need several steps:

connect and wait warehouse becoming active
serialize the data
presign and get a url
upload to the presigned url
execute copy

we can add some logs when we enabled a debug option, which helps us diagnosis the performance issues on writing huge amount of data.

support Ping to do version check

Support session settings

currently the session settings is not supported in the drivers yet.

like:

SET fast_parquet_read_bytes = 1024;

this statement can be executed successfully, but the session parameter is not taking effect at all.

this is because databend takes a client-side session implementation:

when the SET statement got executed, the server side responses a new session state (which contains the session settings)
the client need pass the session settings to the server on the next query

add a workflow to automatically release when the VERSION constant changes

currently databend-py is released when we add tags.

however there might have some risks when the tag and the VERSION constant did not match.

maybe we can add a workflow to automatically handle this. when the VERSION constant changes, try add a relevant tag.

datafuselabs / databend-py Goto Github PK

databend-py's People

Contributors

Stargazers

Watchers

Forkers

databend-py's Issues

Version

Detail

Recommend Projects

Recommend Topics

Recommend Org

Jobs