datafuselabs / databend-py Goto Github PK
View Code? Open in Web Editor NEWDatabend Cloud Python Driver with native interface support
License: Apache License 2.0
Databend Cloud Python Driver with native interface support
License: Apache License 2.0
upload a file to database.table
databend-py: 0.5.8
Follow this doc:
https://docs.databend.com/guides/sql-clients/developers/python
from databend_py import Client
client = Client.from_url(f"databend://{USER}:{PASSWORD}@${HOST}:443/{DATABASE}?&warehouse={WAREHOUSE_NAME})
client.execute('DROP TABLE IF EXISTS data')
client.execute('CREATE TABLE if not exists data (x Int32,y VARCHAR)')
client.execute('DESC data')
client.execute("INSERT INTO data (Col1,Col2) VALUES ", [1, 'yy', 2, 'xx'])
_, res = client.execute('select * from data')
print(res)
databend://
does not work, return errors:
/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
warnings.warn(
Please enter the DSN for the database connection: databend://bohu:***@***--small.gw.aws-us-east-2.default.databend.com:443/***?&warehouse=small
http error on http://***--askbend-small.gw.aws-us-east-2.default.databend.com:443/v1/query/
ischema_names = {
"int": INTEGER,
"int64": INTEGER,
"int32": INTEGER,
"int16": INTEGER,
"int8": INTEGER,
"uint64": INTEGER,
"uint32": INTEGER,
"uint16": INTEGER,
"uint8": INTEGER,
"decimal": DECIMAL,
"date": DATE,
"timestamp": DATETIME,
"float": FLOAT,
"double": FLOAT,
"float64": FLOAT,
"float32": FLOAT,
"string": VARCHAR,
"array": ARRAY,
"map": MAP,
"json": JSON,
"varchar": VARCHAR,
}
There is no bool
.
The copy into
need the column order same with schema and it is inconvenient. So we use stage Attachement instead.
it's less likely to make OOM since the values itself also comes from memory.
databend_py: 0.4.2
databend_query: 1.1.45-nightly
import databend_py
conn = databend_py.Client(host='127.0.0.1',port=8000)
columns, data = conn.execute('SELECT * FROM numbers(10001)',with_column_types=True)
print(columns)
columns:
[('number', 'UInt64'), ('number', 'UInt64')]
instead of
[('number', 'UInt64')]
When the amount of data exceeds 10000, the data will be returned by paging. function store will append columns multi times
Version:
Deploy method:
kubernetes helm charts
Description:
When execute batch insert, client will upload csv to object storage and then execute COPY INTO
, and receive_result
with query id
.
But the kubernetes Service uses a load balancing strategy, which receive_result
request maybe send to different query node, and got http request error: query id not found
Temporary Solution:
Set Kubernetes Service sessionAffinity
to ClientIP
, make all requests made by the same Pod go to the same node, but if use multiprocessing in one pod, all requests will send to single query node.
I have no ideas on how to fix this issue completely, It seems the receive_result
request must be sent to specific query node.
It seems the latest version of this package is relying on sdk_info.py
reading a VERSION
file that is included in the repo, but not in the actual pip source. I think this is an issue with the setup.py
file but I'm not sure how to fix it.
Logs:
>>> from databend_py import Client
File "/usr/local/lib/python3.8/site-packages/databend_py/__init__.py", line 1, in <module>
from .client import Client
File "/usr/local/lib/python3.8/site-packages/databend_py/client.py", line 4, in <module>
from .connection import Connection
File "/usr/local/lib/python3.8/site-packages/databend_py/connection.py", line 16, in <module>
headers = {'Content-Type': 'application/json', 'User-Agent': sdk_info(), 'Accept': 'application/json',
File "/usr/local/lib/python3.8/site-packages/databend_py/sdk_info.py", line 18, in sdk_info
return f"{sdk_lan()}/{sdk_version()}"
File "/usr/local/lib/python3.8/site-packages/databend_py/sdk_info.py", line 8, in sdk_version
with open(version_py, encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.8/site-packages/databend_py/VERSION'
when execute batch insert with params, client will upload csv file to object storage.
I think client should pass PURGE
option to query node, or the csv file will not be deleted from the storage
document:
https://databend.rs/doc/sql-commands/dml/dml-copy-into-table#file_format
example:
row1 = [1000, 'String Value 1000', 5.233]
row2 = [2000, 'String Value 2000', -107.04]
data = [row1, row2]
client.insert('new_table', data, column_names=['key', 'value', 'metric'])
currently when we insert data into tables, we need several steps:
we can add some logs when we enabled a debug
option, which helps us diagnosis the performance issues on writing huge amount of data.
currently the session settings is not supported in the drivers yet.
like:
SET fast_parquet_read_bytes = 1024;
this statement can be executed successfully, but the session parameter is not taking effect at all.
this is because databend takes a client-side session implementation:
SET
statement got executed, the server side responses a new session state (which contains the session settings)currently databend-py is released when we add tags.
however there might have some risks when the tag and the VERSION constant did not match.
maybe we can add a workflow to automatically handle this. when the VERSION constant changes, try add a relevant tag.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.