mymarilyn / clickhouse-driver Goto Github PK
View Code? Open in Web Editor NEWClickHouse Python Driver with native interface support
Home Page: https://clickhouse-driver.readthedocs.io
License: Other
ClickHouse Python Driver with native interface support
Home Page: https://clickhouse-driver.readthedocs.io
License: Other
It seems the Progress packets are received and managed but there is no way to get the info from the Client or Connection objects. Here an API proposition with a fetch*
method, this is common on a database API.
# quick way
c = client('server')
c.execute("select * from table")
result = c.fetchall()
# get progress (and trash the results for an easy example)
c = client('server')
c.execute("select * from verybigtable")
while c.fetchsome():
print c.progress
Hi, I have created table with Nullable columns. How to pass NULL values via bulk insert?
It seems there are no analogue value for NULL, None is not working atm.
Actually None is working and it inserts NULL values :)
Meta/Column names... how do I get them?
I'm running Clickhouse on production, using the asyncio
wrapper. But sometimes I get a issue when inserting into database.
Job processing failed
job: <Job coro=<<coroutine object user_events_producer at 0x7f39382c1fc0>>>
Traceback (most recent call last):
File "./app.py", line 189, in user_events_producer
[e.data for e in events_queue]
File "/usr/local/lib/python3.6/site-packages/aioch/client.py", line 92, in execute
**kwargs)
File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 159, in execute
self.connection.force_connect()
File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 124, in force_connect
elif not self.ping():
File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 303, in ping
packet_type = read_varint(self.fin)
File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/reader.py", line 29, in read_varint
i = _read_one(f)
File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/reader.py", line 14, in _read_one
c = f.read(1)
info ValueError: PyMemoryView_FromBuffer(): info->buf must not be NULL
Job processing failed
job: <Job coro=<<coroutine object user_events_producer at 0x7f3939175308>>>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 169, in execute
types_check=types_check
File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 237, in process_insert_query
sample_block = self.receive_sample_block()
File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 251, in receive_sample_block
raise packet.exception
clickhouse_driver.errors.ServerException: Code: 101.
DB::NetException. DB::NetException: Unexpected packet Query received from client. Stack trace:
ingestion-server
0. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x15) [0x84eb065]
1. /usr/bin/clickhouse-server(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x21) [0x2ccbe01]
2. /usr/bin/clickhouse-server(DB::TCPHandler::receivePacket()+0x22e) [0x2cdb7de]
3. /usr/bin/clickhouse-server(DB::TCPHandler::readData(DB::Settings const&)+0x1c2) [0x2cdbb42]
4. /usr/bin/clickhouse-server(DB::TCPHandler::processInsertQuery(DB::Settings const&)+0x205) [0x2cdbee5]
5. /usr/bin/clickhouse-server(DB::TCPHandler::runImpl()+0x476) [0x2cdc566]
6. /usr/bin/clickhouse-server(DB::TCPHandler::run()+0x2a) [0x2cdd3ba]
7. /usr/bin/clickhouse-server(Poco::Net::TCPServerConnection::start()+0xe) [0x86d53de]
8. /usr/bin/clickhouse-server(Poco::Net::TCPServerDispatcher::run()+0x169) [0x86d57b9]
9. /usr/bin/clickhouse-server(Poco::PooledThread::run()+0x76) [0x8776976]
10. /usr/bin/clickhouse-server(Poco::ThreadImpl::runnableEntry(void*)+0x37) [0x8772b87]
11. /usr/bin/clickhouse-server() [0x8dd397e]
12. /lib/x86_64-linux-gnu/libpthread.so.0(+0x76da) [0x7f650da8d6da]
13. /lib/x86_64-linux-gnu/libc.so.6(clone+0x3e) [0x7f650d21488e]
ingestion-server
I don't know if the issue is the server or the client.
Would it be possible to support https://clickhouse.yandex/docs/en/data_types/nested_data_structures/nested.html ?
clickhouse-driver==0.0.10
ClickHouse server version 1.1.54362
Clickhouse JSONEachRow format supports omitting values when inserting records; it would be great if clickhouse-driver would also support this behavior - http://clickhouse-docs.readthedocs.io/en/latest/formats/jsoneachrow.html
It is also possible to omit values in which case the default value of the column is inserted.
However, clickhouse-driver instead throws an exception:
record = {'x': 1}
client.execute('DROP TABLE IF EXISTS test')
client.execute('CREATE TABLE test (x Int32, y Int32) ENGINE=Memory')
client.execute('INSERT INTO test FORMAT JSONEachRow', [record])
print(client.execute('SELECT * FROM test'))
Traceback (most recent call last):
File "test.py", line 89, in <module>
client.execute('INSERT INTO test FORMAT JSONEachRow', [record])
File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 169, in execute
types_check=types_check
File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 239, in process_insert_query
self.send_data(sample_block, data, types_check=types_check)
File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 261, in send_data
types_check=types_check)
File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/block.py", line 51, in __init__
self.dicts_to_rows(data)
File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/block.py", line 70, in dicts_to_rows
self.data[i] = [row[name] for name in column_names]
KeyError: u'y'
Working example using clickhouse-client:
$ clickhouse-client -n --query='DROP TABLE IF EXISTS test; CREATE TABLE test (x Int32, y Int32) ENGINE=Memory; INSERT INTO test FORMAT JSONEachRow' <<< '{"x": 1}'
$ clickhouse-client -E --query='SELECT * FROM test'
Row 1:
──────
x: 1
y: 0
I was able to get past the KeyError
in the Traceback above by changing
clickhouse-driver/src/block.py
Line 70 in 56395db
self.data[i] = [row[name] for name in column_names if name in row]
I get to the point of sending the data to the server but get the following error, which I unfortunately don't have the time to dig into right now to come up with a possible PR/solution:
File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/streams/native.py", line 37, in write
raise ValueError('Different rows length')
ValueError: Different rows length
This would be a great feature so that JSON objects don't always have to be fully specified in the client code!
Hi!
Thanks for this great library. I was thinking whether it has any protection against SQL injection? https://github.com/mymarilyn/clickhouse-driver/blob/master/src/util/escape.py does not seem to have any checks for injected queries.
Currently the client returns typenames.
It would be great to get the typespec instead, including nullable
flag and/or inner type spec.
current version of driver ignore select into outfile?
there are no difference between queries
select * from log LIMIT 10000 INTO OUTFILE '/var/tmp/test123.csv' FORMAT TabSeparated
and
select * from log LIMIT 10000
clickhouse-driver version 0.0.16
Hi!
I'm working on clickhouse backend for ibis.
Ibis is installable from both pip and conda. I'd like to use clickhouse-driver, but currently clickhouse-driver
and clickhouse-cityhash
don't have conda packages.
I've already created the recipes, but conda forge packages require maintainers. Would You please create the feedstocks?
Here are the recipes: https://github.com/kszucs/clickhouse-conda
Recent Clickhouse supports intervals / timedeltas but they have special types.
Query:
SELECT INTERVAL `int_col` DAY AS `tmp`
FROM ibis_testing.`functional_alltypes`
LIMIT 10000
Traceback:
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/client.py:176: in execute
types_check=types_check, columnar=columnar
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/client.py:227: in process_ordinary_query
columnar=columnar)
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/client.py:75: in receive_result
self.receive_no_progress_result(result, columnar)
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/client.py:102: in receive_no_progress_result
packet = self.receive_packet()
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/client.py:133: in receive_packet
packet = self.connection.receive_packet()
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/connection.py:323: in receive_packet
packet.block = self.receive_data()
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/connection.py:386: in receive_data
block = self.block_in.read()
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/streams/native.py:77: in read
column = read_column(column_type, n_rows, self.fin)
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/columns/service.py:55: in read_column
column = get_column_by_spec(column_spec)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
spec = 'IntervalDay', column_options = {}
def get_column_by_spec(spec, column_options=None):
column_options = column_options or {}
def create_column_with_options(x):
return get_column_by_spec(x, column_options)
if spec.startswith('FixedString'):
length = int(spec[12:-1])
return FixedString(length)
elif spec.startswith('Enum'):
return create_enum_column(spec)
elif spec.startswith('Array'):
return create_array_column(spec, create_column_with_options)
elif spec.startswith('Nullable'):
return create_nullable_column(spec, create_column_with_options)
else:
try:
cls = column_by_type[spec]
return cls(**column_options)
except KeyError as e:
> raise errors.UnknownTypeError('Unknown type {}'.format(e.args[0]))
E clickhouse_driver.errors.UnknownTypeError: Code: 50. Unknown type IntervalDay
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/columns/service.py:51: UnknownTypeError
------------------------------------------------- Captured log call --------------------------------------------------
connection.py 174 INFO Connecting. Database: ibis_testing. User: default
connection.py 275 INFO Connected to ClickHouse server version 1.1.54310
connection.py 439 INFO Query: SELECT INTERVAL `int_col` DAY AS `tmp`
FROM ibis_testing.`functional_alltypes`
LIMIT 10000
connection.py 414 DEBUG Block send time: 8.8e-05
The BlockStreamProfileInfo.rows_before_limit
property is useful to get the rows count for pagination without running an extra query, but there does not seem to be any way to access it through the API (i.e. the client silently ignores the PROFILE_INFO
packet).
As a workaround I hacked together a small change in Client.receive_packet
where it saves the last packet.profile_info
in the Client instance and we can use it like this:
client = Client()
rows = client.execute('SELECT ...')
total_count = client.profile_info.rows_before_limit
So it's not pretty, but it seems to work fine. Anyway, I think it would make sense to have it in the main API without additional hacks, but I'm not sure what's the best place to put it in... Do you have any thoughts on that? Or perhaps are there any additional caveats that might have caused leaving this out of the API scope?
Do you need to explicitly close the connection using clickhouse_driver?
date.fromtimestamp is supposed to take a local timestamp not a UTC one. I am in a timezone that is behind UTC, so all dates are 1 day behind.
>>> datetime.date.fromtimestamp(0)
datetime.date(1969, 12, 31)
To fix this you need to use datetime.datetime.utcfromtimestamp().date() instead.
return datetime.datetime.utcfromtimestamp(value * self.offset).date()
I often get the error TooLargeStringSize: Code: 131.
on inserting data into a table. How I can prevent it? I already tried to insert really small batches.
Hi,
I have an issue with a big insert query > 2000 cols.
Finally it runs with native clickhouse client.
After some config changes runs in 0.5 seconds
Now I tried to run the same query with the clickhouse-driver.
It times out.
Before I had the same script working with pymysql.
Any idea how to fix this?
Traceback (most recent call last):
File "Click_v01.py", line 154, in
client.execute(sql)
File "/usr/lib/python3.4/site-packages/clickhouse_driver/client.py", line 73, in execute
query_id=query_id, settings=settings
File "/usr/lib/python3.4/site-packages/clickhouse_driver/client.py", line 88, in process_ordinary_query
return self.receive_result(with_column_types=with_column_types)
File "/usr/lib/python3.4/site-packages/clickhouse_driver/client.py", line 19, in receive_result
block = self.receive_block()
File "/usr/lib/python3.4/site-packages/clickhouse_driver/client.py", line 38, in receive_block
packet = self.connection.receive_packet()
File "/usr/lib/python3.4/site-packages/clickhouse_driver/connection.py", line 234, in receive_packet
packet.type = packet_type = read_varint(self.fin)
File "/usr/lib/python3.4/site-packages/clickhouse_driver/reader.py", line 46, in read_varint
i = _read_one(f)
File "/usr/lib/python3.4/site-packages/clickhouse_driver/reader.py", line 31, in _read_one
c = f.read(1)
File "/usr/lib64/python3.4/socket.py", line 378, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
Recent clickhouse has a new type Nothing
which is currently unhandled by clickhouse-driver.
Selecting Null
raises:
/tmp/workspace/miniconda/envs/ibis_2.7/lib/python2.7/site-packages/clickhouse_driver/columns/nullablecolumn.py:5: in create_nullable_column
nested = column_by_spec_getter(inner)
/tmp/workspace/miniconda/envs/ibis_2.7/lib/python2.7/site-packages/clickhouse_driver/columns/service.py:30: in create_column_with_options
return get_column_by_spec(x, column_options)
/tmp/workspace/miniconda/envs/ibis_2.7/lib/python2.7/site-packages/clickhouse_driver/columns/service.py:51: in get_column_by_spec
raise errors.UnknownTypeError('Unknown type {}'.format(e.args[0]))
E UnknownTypeError: Code: 50. Unknown type Nothing
Hi,
I'm experiencing some strange issues with module and there are no other issues with any other modules:
Installed from pip without issues:
# pip install clickhouse-driver[lz4]
Collecting clickhouse-driver[lz4]
Using cached clickhouse-driver-0.0.6.tar.gz
Requirement already satisfied (use --upgrade to upgrade): six in /usr/lib/python2.7/site-packages (from clickhouse-driver[lz4])
Requirement already satisfied (use --upgrade to upgrade): enum34 in /usr/lib/python2.7/site-packages (from clickhouse-driver[lz4])
Requirement already satisfied (use --upgrade to upgrade): lz4 in /usr/lib64/python2.7/site-packages (from clickhouse-driver[lz4])
Collecting clickhouse-cityhash==1.0.2 (from clickhouse-driver[lz4])
Using cached clickhouse-cityhash-1.0.2.tar.gz
Installing collected packages: clickhouse-cityhash, clickhouse-driver
Running setup.py install for clickhouse-cityhash ... done
Running setup.py install for clickhouse-driver ... done
Successfully installed clickhouse-cityhash-1.0.2 clickhouse-driver-0.0.6
# pip show clickhouse_driver
Name: clickhouse-driver
Version: 0.0.6
Summary: Python driver with native interface for ClickHouse
Home-page: https://github.com/mymarilyn/clickhouse-driver
Author: Konstantin Lebedev
Author-email: [email protected]
License: MIT
Location: /usr/lib/python2.7/site-packages
Requires: six, enum34
# ll /usr/lib/python2.7/site-packages/clickhouse_driver
total 164
-rw-r--r-- 1 root root 2155 Jun 30 18:36 block.py
-rw-r--r-- 1 root root 3109 Sep 28 17:42 block.pyc
-rw-r--r-- 1 root root 693 May 22 18:29 blockstreamprofileinfo.py
-rw-r--r-- 1 root root 1247 Sep 28 17:42 blockstreamprofileinfo.pyc
-rw-r--r-- 1 root root 2165 May 22 18:29 clientinfo.py
-rw-r--r-- 1 root root 2856 Sep 28 17:42 clientinfo.pyc
-rw-r--r-- 1 root root 7666 Sep 18 19:58 client.py
-rw-r--r-- 1 root root 7783 Sep 28 17:42 client.pyc
drwxr-xr-x 2 root root 4096 Sep 28 17:42 columns
drwxr-xr-x 2 root root 4096 Sep 28 17:42 compression
-rw-r--r-- 1 root root 11992 Sep 16 19:59 connection.py
-rw-r--r-- 1 root root 12551 Sep 28 17:42 connection.pyc
-rw-r--r-- 1 root root 596 Sep 16 20:24 defines.py
-rw-r--r-- 1 root root 885 Sep 28 17:42 defines.pyc
-rw-r--r-- 1 root root 2899 Sep 16 20:24 errors.py
-rw-r--r-- 1 root root 6561 Sep 28 17:42 errors.pyc
-rw-r--r-- 1 root root 68 Sep 19 18:00 __init__.py
-rw-r--r-- 1 root root 443 Sep 28 17:42 __init__.pyc
-rw-r--r-- 1 root root 522 May 22 18:29 progress.py
-rw-r--r-- 1 root root 1113 Sep 28 17:42 progress.pyc
-rw-r--r-- 1 root root 2002 Aug 31 18:09 protocol.py
-rw-r--r-- 1 root root 2645 Sep 28 17:42 protocol.pyc
-rw-r--r-- 1 root root 186 May 22 18:29 queryprocessingstage.py
-rw-r--r-- 1 root root 590 Sep 28 17:42 queryprocessingstage.pyc
-rw-r--r-- 1 root root 1686 Sep 4 20:49 reader.py
-rw-r--r-- 1 root root 3676 Sep 28 17:42 reader.pyc
-rw-r--r-- 1 root root 717 May 22 18:29 readhelpers.py
-rw-r--r-- 1 root root 945 Sep 28 17:42 readhelpers.pyc
drwxr-xr-x 2 root root 4096 Sep 28 17:42 settings
drwxr-xr-x 2 root root 4096 Sep 28 17:42 streams
drwxr-xr-x 2 root root 4096 Sep 28 17:42 util
-rw-r--r-- 1 root root 1770 Aug 31 18:09 writer.py
-rw-r--r-- 1 root root 3882 Sep 28 17:42 writer.pyc
Example code test.py
#! /usr/bin/env python
from clickhouse_driver.client import Client
ch_db = Client(host='clickhouse-host', database='test_db',
compression=True)
query = 'select * from test_table'
result = ch_db.execute(query)
print result
Output:
Traceback (most recent call last):
File "./test.py", line 3, in <module>
from clickhouse_driver.client import Client
ImportError: No module named client
Hello!
I'm getting a timeout error when trying to fetch results from clickhouse using this package on windows.
Any ideas how I can fix it?
from clickhouse_driver.client import Client
print('Import Client - Ok', Client)
client = Client('***-clickhouse.***', 8123, , database='test')
print('Get instance of Client - Ok', client)
result = client.execute('select 1')
print('Get results - Ok', result)
Output:
Import Client - Ok <class 'clickhouse_driver.client.Client'>
Get instance of Client - Ok <clickhouse_driver.client.Client object at 0x0000000004B95C88>
---------------------------------------------------------------------------
timeout Traceback (most recent call last)
C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\connection.py in connect(self)
126 self.send_hello()
--> 127 self.receive_hello()
128
C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\connection.py in receive_hello(self)
175 def receive_hello(self):
--> 176 packet_type = read_varint(self.fin)
177
C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\reader.py in read_varint(f)
45 while True:
---> 46 i = _read_one(f)
47 result |= (i & 0x7f) << shift
C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\reader.py in _read_one(f)
30 def _read_one(f):
---> 31 c = f.read(1)
32 if c == '':
C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\socket.py in readinto(self, b)
585 try:
--> 586 return self._sock.recv_into(b)
587 except timeout:
timeout: timed out
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
<ipython-input-7-73d5ad8d95e2> in <module>()
5 print('Get instance of Client - Ok', client)
6
----> 7 result = client.execute('select 1')
8 print('Get results - Ok', result)
C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\client.py in execute(self, query, params, with_column_types, external_tables, query_id, settings)
58 def execute(self, query, params=None, with_column_types=False,
59 external_tables=None, query_id=None, settings=None):
---> 60 self.connection.force_connect()
61
62 try:
C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\connection.py in force_connect(self)
97
98 if not self.connected:
---> 99 self.connect()
100
101 elif not self.ping():
C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\connection.py in connect(self)
133 self.disconnect()
134 raise errors.SocketTimeoutError(
--> 135 '{} ({})'.format(e.message, self.get_description())
136 )
137
AttributeError: 'timeout' object has no attribute 'message'
Ps.
I checked this connection credentials via jdbc-driver
Exception:
unsupported operand type(s) for -: 'str' and 'datetime.date'
when i try insert into Datetime format column, it's success.
how to fix this problem? thanks.
Recent clickhouse have an experimental decimal type: https://github.com/yandex/ClickHouse/blob/master/CHANGELOG.md#clickhouse-release-181213-2018-09-10
Can't execute query with column of type Array(Tuple(Float64, Float64)) as result.
DROP TABLE IF EXISTS test_polygon;
CREATE TABLE test_polygon
(
id Int64,
polygon Nested(
lat Float64,
lon Float64
)
)
ENGINE = Log();
INSERT INTO test_polygon VALUES
(1111, [0., 8., 7., 0.], [0., 7., 8., 0.]),
(2222, [5., 5., 4., 3., 5.], [10., 12., 11., 10., 10.]);
SELECT
id,
arrayMap((x, y) -> (x, y), polygon.lat, polygon.lon) AS polygon
FROM test_polygon
┌───id─┬─polygon──────────────────────────────┐
│ 1111 │ [(0,0),(8,7),(7,8),(0,0)] │
│ 2222 │ [(5,10),(5,12),(4,11),(3,10),(5,10)] │
└──────┴──────────────────────────────────────┘
2 rows in set. Elapsed: 0.003 sec.
from clickhouse_driver import Client
client = Client('localhost')
print(client.execute('SELECT id, arrayMap((x, y)-> tuple(x, y), polygon_lat, polygon_lon) AS polygon FROM test_db.test_polygon'))
KeyError Traceback (most recent call last)
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in get_column_by_spec(spec, column_options)
57 try:
---> 58 cls = column_by_type[spec]
59 return cls(**column_options)
KeyError: 'Tuple(Float64, Float64)'
During handling of the above exception, another exception occurred:
UnknownTypeError Traceback (most recent call last)
<ipython-input-51-1fc000cc1814> in <module>()
----> 1 print(client.execute('SELECT id, arrayMap((x, y)-> tuple(x, y), polygon_lat, polygon_lon) AS polygon FROM test_db.test_polygon'))
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/client.py in execute(self, query, params, with_column_types, external_tables, query_id, settings, types_check, columnar)
187 external_tables=external_tables,
188 query_id=query_id, settings=query_settings,
--> 189 types_check=types_check, columnar=columnar
190 )
191
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/client.py in process_ordinary_query(self, query, params, with_column_types, external_tables, query_id, settings, types_check, columnar)
244 types_check=types_check)
245 return self.receive_result(with_column_types=with_column_types,
--> 246 columnar=columnar)
247
248 def process_insert_query(self, query_without_data, data,
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/client.py in receive_result(self, with_column_types, progress, columnar)
80
81 else:
---> 82 self.receive_no_progress_result(result, columnar)
83 return result.get_result()
84
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/client.py in receive_no_progress_result(self, result, columnar)
107 def receive_no_progress_result(self, result, columnar=False):
108 while True:
--> 109 packet = self.receive_packet()
110 if not packet:
111 break
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/client.py in receive_packet(self)
138
139 def receive_packet(self):
--> 140 packet = self.connection.receive_packet()
141
142 if packet.type == ServerPacketTypes.EXCEPTION:
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/connection.py in receive_packet(self)
332
333 if packet_type == ServerPacketTypes.DATA:
--> 334 packet.block = self.receive_data()
335
336 elif packet_type == ServerPacketTypes.EXCEPTION:
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/connection.py in receive_data(self)
391 read_binary_str(self.fin)
392
--> 393 block = self.block_in.read()
394 self.block_in.reset()
395 return block
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/streams/native.py in read(self)
78 if n_rows:
79 column = read_column(self.context, column_type, n_rows,
---> 80 self.fin)
81 data.append(column)
82
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in read_column(context, column_spec, n_items, buf)
65 def read_column(context, column_spec, n_items, buf):
66 column_options = {'context': context}
---> 67 column = get_column_by_spec(column_spec, column_options=column_options)
68 return column.read_data(n_items, buf)
69
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in get_column_by_spec(spec, column_options)
49
50 elif spec.startswith('Array'):
---> 51 return create_array_column(spec, create_column_with_options)
52
53 elif spec.startswith('Nullable'):
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/arraycolumn.py in create_array_column(spec, column_by_spec_getter)
186 def create_array_column(spec, column_by_spec_getter):
187 inner = spec[6:-1]
--> 188 return ArrayColumn(column_by_spec_getter(inner))
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in create_column_with_options(x)
37
38 def create_column_with_options(x):
---> 39 return get_column_by_spec(x, column_options)
40
41 if spec.startswith('FixedString'):
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in get_column_by_spec(spec, column_options)
60
61 except KeyError as e:
---> 62 raise errors.UnknownTypeError('Unknown type {}'.format(e.args[0]))
63
64
UnknownTypeError: Code: 50. Unknown type Tuple(Float64, Float64)
Hi,
Something new is happening while adding a clickhouse db to supserset at the time superset is trying to get table meta data.
Traceback (most recent call last):
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/superset/views/core.py", line 1764, in testconn
return json_success(json.dumps(engine.table_names(), indent=4))
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 2139, in table_names
return self.dialect.get_table_names(conn, schema)
File "<string>", line 2, in get_table_names
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/reflection.py", line 42, in cache
return fn(self, con, *args, **kw)
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_sqlalchemy/drivers/base.py", line 482, in get_table_names
return [row.name for row in self._execute(connection, 'SHOW TABLES')]
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_sqlalchemy/drivers/native/base.py", line 34, in _execute
return connection.execute(sql)
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 942, in execute
return self._execute_text(object, multiparams, params)
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1104, in _execute_text
statement, parameters
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
context)
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1416, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 249, in reraise
raise value
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
context)
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_sqlalchemy/drivers/base.py", line 492, in do_execute
cursor.execute(statement, parameters, context=context)
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_sqlalchemy/drivers/native/connector.py", line 125, in execute
external_tables=external_tables, settings=settings
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_driver/client.py", line 102, in execute
self.connection.force_connect()
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_driver/connection.py", line 124, in force_connect
self.connect()
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_driver/connection.py", line 189, in connect
self.send_hello()
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_driver/connection.py", line 260, in send_hello
write_binary_str(self.password, self.fout)
File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_driver/writer.py", line 18, in write_binary_str
text = text.encode('utf-8')
AttributeError: 'NoneType' object has no attribute 'encode'
2018-09-25 21:09:57,639:INFO:werkzeug:127.0.0.1 - - [25/Sep/2018 21:09:57] "POST /superset/testconn HTTP/1.1" 500 -
I suspect that something has changed in a new version of the driver and that now it is checking that there is no none UTF-8 bytes in the string before storing it ?
How do you suggest me to fix that ? Is it something to be set on string sent by superset or is it something on clickhouse driver ?
Thanks Matthieu.
Request the the driver be able to load settings from a client configuration file similar to the clickhouse-client
. e.g.
~/.clickhouse-client/config.xml
<config>
<user>default</user>
<password>my_password</password>
</config>
This will remove the need to store credentials in code.
After updating to 0.0.16 the following error appears:
/opt/conda/envs/ibis-dev-3.6/lib/python3.6/site-packages/clickhouse_driver/client.py:222: in process_ordinary_query
columnar=columnar)
/opt/conda/envs/ibis-dev-3.6/lib/python3.6/site-packages/clickhouse_driver/client.py:57: in receive_result
return result.get_result()
/opt/conda/envs/ibis-dev-3.6/lib/python3.6/site-packages/clickhouse_driver/result.py:39: in get_result
for packet in self.packet_generator:
/opt/conda/envs/ibis-dev-3.6/lib/python3.6/site-packages/clickhouse_driver/client.py:69: in packet_generator
packet = self.receive_packet()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <clickhouse_driver.client.Client object at 0x7ff6b95f0cc0>
def receive_packet(self):
packet = self.connection.receive_packet()
if packet.type == ServerPacketTypes.EXCEPTION:
raise packet.exception
elif packet.type == ServerPacketTypes.PROGRESS:
return packet
elif packet.type == ServerPacketTypes.END_OF_STREAM:
return False
elif packet.type == ServerPacketTypes.DATA:
return packet
elif packet.type == ServerPacketTypes.TOTALS:
return packet
elif packet.type == ServerPacketTypes.EXTREMES:
return packet
elif packet.type == ServerPacketTypes.PROFILE_INFO:
> self.last_query.store_profile(packet)
E AttributeError: 'NoneType' object has no attribute 'store_profile'
hi, for select query, only the values are returned , how to get column names and types info?
Clickhouse allows to set an ID on queries, you can see this IDs with a:
select query_id from system.processes;
this is useful when you want to cancel a query from an external process as explain here: https://stackoverflow.com/questions/40546983/how-to-kill-a-process-query-in-clickhouse
I propose that the connection.send_query()
method accepts a string argument to name the query. It could be used in client.process_*_query()
functions and execute()
as an optional argument:
import uuid
c = client('server')
query_id = str(uuid.uuid4())
c.execute("select something", queryid=query_id)
It might worth considering to support (optionally) pandas dataframes as inputs and outputs in clickhouse-driver.
Creating pandas dataframe would be quite straightforward from the block stream before transposing to row oriented block.
Here is a basic pandas dtype -> clickhouse type mapping: https://github.com/kszucs/ibis/blob/0101250a9d96f6a387129fcd770f3e092856dc56/ibis/clickhouse/types.py#L14
@xzkostyan would You like to include pandas support?
[root@lyc tmp]# python
Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
from clickhouse_driver import Client
client = Client('192.168.133.2',18123,'client_report','admin','************',insert_block_size=1)
client.execute('SHOW TABLES')
^CTraceback (most recent call last):
File "", line 1, in
File "/usr/lib/python2.6/site-packages/clickhouse_driver/client.py", line 159, in execute
self.connection.force_connect()
File "/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py", line 122, in force_connect
self.connect()
File "/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py", line 188, in connect
self.receive_hello()
File "/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py", line 263, in receive_hello
packet_type = read_varint(self.fin)
File "/usr/lib/python2.6/site-packages/clickhouse_driver/reader.py", line 29, in read_varint
i = _read_one(f)
File "/usr/lib/python2.6/site-packages/clickhouse_driver/reader.py", line 14, in _read_one
c = f.read(1)
File "/usr/lib64/python2.6/socket.py", line 383, in read
data = self._sock.recv(left)
KeyboardInterrupt
I can't return unique combinations of two columns (e.g. src
and dst
).
Executing a query with SELECT DISTINCT(field1, field2)
causes the following error when returning data from Clickhouse:
Traceback (most recent call last):
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/columns/service.py", line 56, in get_column_by_spec
cls = column_by_type[spec]
KeyError: 'Tuple(String, String)'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./cronapp.py", line 71, in
run(period=period)
File "./cronapp.py", line 52, in run
data = get_conn_list(period=period)
File "./cronapp.py", line 21, in get_conn_list
return client.execute(query)
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 176, in execute
types_check=types_check, columnar=columnar
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 227, in process_ordinary_query
columnar=columnar)
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 75, in receive_result
self.receive_no_progress_result(result, columnar)
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 102, in receive_no_progress_result
packet = self.receive_packet()
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 133, in receive_packet
packet = self.connection.receive_packet()
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 331, in receive_packet
packet.block = self.receive_data()
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 394, in receive_data
block = self.block_in.read()
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/streams/native.py", line 77, in read
column = read_column(column_type, n_rows, self.fin)
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/columns/service.py", line 64, in read_column
column = get_column_by_spec(column_spec)
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/columns/service.py", line 60, in get_column_by_spec
raise errors.UnknownTypeError('Unknown type {}'.format(e.args[0]))
clickhouse_driver.errors.UnknownTypeError: Code: 50. Unknown type Tuple(String, String)```
Thanks for the hard working on this great project.
Does this driver already implement the DBAPI? If not, do we have a plan?
The documentation recommends to insert chunks of at least 1000 items at once. Is there an easy way to add some sort of buffer to gather inserts and thus enhance to performance? Of course, one could use a simple list and append to it/insert if it reaches a specific size, but things get more complicated when someone uses multiprocessing etc.
Can you provide 'format JSON' as the clickhouse-client does?
When writing a FixedString column, the data is first encoded using utf-8 to bytes in Python. This could cause a problem when the string cannot be encoded using utf-8. For instance, cipher text bytes generated by hmac sha256. I would suggest add type checking before encoding in https://github.com/mymarilyn/clickhouse-driver/blob/master/src/writer.py#L18 and https://github.com/mymarilyn/clickhouse-driver/blob/master/src/writer.py#L24. If the text is already bytes, just don't encode it. For example, import hmac; hmac.new(b"done", msg=b"yes", digestmod="sha256").digest().decode('utf-8')
will throw an exception 'utf-8' codec can't decode byte 0xcb in position 3
.
Hi,
looks like there is no execute_iter method in version '0.0.10'.
from clickhouse_driver import Client
client = Client(host='localhost', port=2102, database='shard01')
settings = {'max_block_size': 100000}
rows_gen = client.execute_iter('select * from query.TCC_S1', settings=settings)
AttributeError Traceback (most recent call last)
in ()
2 client = Client(host='localhost', port=2102, database='shard01')
3 settings = {'max_block_size': 100000}
----> 4 rows_gen = client.execute_iter('select * from query.TCC_S1', settings=settings)
AttributeError: 'Client' object has no attribute 'execute_iter'
column/string value may be None
def try_encode(self, value):
if not isinstance(value, bytes):
return value.encode('utf-8')
return value
Hi,
I am afraid to come with a form of specific request here. I opened this issue last week on Yandex/clickhouse repository: ClickHouse/ClickHouse#2605. It was about issues with support of specific column and types to store IPv4 and IPv6 data.
I didn't get any sort of positive answer from them... at least short term.
I was wondering if it could make sense to develop a form of additional types in your code such that column named IPv4_... or IPv6_... benefits from a specific behaviour. the idea would be to apply conversion function in you code and introduce those column as type INET (similar to postGRESQL) : https://www.postgresql.org/docs/9.1/static/datatype-net-types.html.
By having this feature, This type could be handled by upper layers like your SQLAlchemy driver and related UI and Frontend.
Does it sounds like something doable / acceptable ?
Thanks Matthieu.
I use domain instead of IP address, in order to do load balance and failover.
But when I change the domain mapping, the script still continuing connect to the origin IP address.
For example, I use domain 'xxx.test.com' which mapping ip1, and ip2, and start the script, the data will write into ip1 and ip2 round-robin.
After the mapping of domain to ip2 and ip3, that means replace ip1 with ip3, the script still continues to write to ip1.
Hi!
After insert datetime.datetime(2018, 1, 19, 10) through this driver I see '2018-01-19 13:00:00' value in table.
Timezone on my computer and clickhouse server is Moskow.
What I must do to see '2018-01-19 10:00:00' after insert?
I've had some issues with using clickhouse_driver
in a multithreaded asyncio environment. The connection seems not to be thread safe. I'm not sure that is the best approach to solve the issues I've encountered, but here is a pooled connection implementation I am using:
import multiprocessing
import asyncio
import logging
from functools import partial
from concurrent.futures import ThreadPoolExecutor
from clickhouse_driver import Client
import threading
class ThreadSafeClient:
def __init__(self, host, compression):
self._host = host
self._compression = compression
self._loop = asyncio.get_event_loop()
self._pool = {}
async def get_conn(self):
tid = threading.get_ident()
logging.info(f"{tid} - get connection")
if tid not in self._pool:
self._pool[tid] = Client(self._host, compression=self._compression, client_name=f"mlcoin-api-{tid}")
logging.info(f"{tid} - new connection created")
return self._pool[tid]
async def execute(self, statement, values=None, types_check=True):
def execute_in_pool(statement, values, types_check):
tid = threading.get_ident()
if tid not in self._pool:
self._pool[tid] = Client(self._host, compression=self._compression, client_name=f"mlcoin-api-{tid}")
logging.info(f"{tid} - new connection created")
conn = self._pool[tid]
return conn.execute(statement, values, types_check=types_check)
execute_stmt = partial(execute_in_pool, types_check=True)
result = await self._loop.run_in_executor(
None,
execute_stmt,
statement,
values)
return result
I have a list data type data like this:
[['2018-10-10', 600]]
table schema:
create table example(
TestDate Date,
TestValue Int32
) engigne=TinyLog
When I use cllickhouse-driver insert data to table, a data type error has occurred:
Code: 53. Type mismatch in VALUES section. Expected Date got <class 'str'>: 2018-10-10 for column "TestDate".
How to slove it?Thanks
Clickhouse queries accept lots of settings, according to https://github.com/yandex/ClickHouse/blob/master/dbms/src/Interpreters/Settings.h#L30
If a setting is set to a value different from the default one, then it should be sent to the server.
It could be done in connection.send_query()
before write_binary_str('', self.fout) # end of query settings
Example of use:
c = client('server')
settings = Settings()
settings.replace_running_query = True
c.execute('SELECT 1', settings, query_id=a_query_id)
It seems that selecting large datasets using the native client is extremely slow. Here is my benchmark https://gist.github.com/dmitriyshashkin/6a4849bdcf882ba340cdfbc1990da401
Initially, I've encountered this behavior on my own dataset, but I was able to reproduce it using the dataset and the structure described here https://clickhouse.yandex/docs/en/getting_started/example_datasets/ontime/
To simplify things a little bit I've used the data for just one month: http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_2017_12.zip
As you can see the fastest way to get the data is by using HTTP protocol with requests and pandas. The problem gets worse as the number of the rows grows, on my own dataset with 5M rows I waited for 1 hour before I had to interrupt the process. The bottleneck is not CH itself, the "top" command shows that all the work is done by python with 100% CPU utilization, while CH is almost idle.
Create table with enum field and insert a few rows
DROP TABLE IF EXISTS events;
CREATE TABLE events (
time DateTime('UTC'),
project Enum8('' = 0, 'first' = 1),
text String
) ENGINE = MergeTree() ORDER BY (time) PARTITION BY toYYYYMM(time);
INSERT INTO events VALUES(now(), 'first', 'descr1');
INSERT INTO events VALUES(now(), '', 'descr2');
Try to select with python driver:
import clickhouse_driver
client = clickhouse_driver.Client(host='127.0.0.1')
print(client.execute("SELECT * FROM events"))
Traceback (most recent call last):
File "/tmp/driver-test.py", line 4, in <module>
print(client.execute("SELECT * FROM events"))
File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 189, in execute
types_check=types_check, columnar=columnar
File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 246, in process_ordinary_query
columnar=columnar)
File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 82, in receive_result
self.receive_no_progress_result(result, columnar)
File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 109, in receive_no_progress_result
packet = self.receive_packet()
File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 140, in receive_packet
packet = self.connection.receive_packet()
File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/connection.py", line 334, in receive_packet
packet.block = self.receive_data()
File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/connection.py", line 393, in receive_data
block = self.block_in.read()
File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/streams/native.py", line 80, in read
self.fin)
File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/columns/service.py", line 67, in read_column
column = get_column_by_spec(column_spec, column_options=column_options)
File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/columns/service.py", line 45, in get_column_by_spec
return create_enum_column(spec, column_options)
File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/columns/enumcolumn.py", line 67, in create_enum_column
return cls(Enum(cls.ch_type, d), **column_options)
File "/Users/name/pyenv/lib/python2.7/site-packages/enum/__init__.py", line 349, in __call__
return cls._create_(value, names, module=module, type=type, start=start)
File "/Users/name/pyenv/lib/python2.7/site-packages/enum/__init__.py", line 462, in _create_
classdict[member_name] = member_value
File "/Users/name/pyenv/lib/python2.7/site-packages/enum/__init__.py", line 125, in __setitem__
if _is_sunder(key):
File "/Users/name/pyenv/lib/python2.7/site-packages/enum/__init__.py", line 80, in _is_sunder
return (name[0] == name[-1] == '_' and
IndexError: string index out of range
First of all, great to have a native driver!
I'd like to use clickhouse driver from asyncio, how should I implement asyncio connection?
Would it be enough to provide an asynchronous version for Connection
and Client
?
Hi, thanks fro the previous patches about settings !
Now, according to
https://github.com/yandex/ClickHouse/blob/master/dbms/src/Interpreters/Settings.h
https://github.com/yandex/ClickHouse/blob/master/dbms/src/Interpreters/Limits.h
Clickhouse also allows to set some limits like 'max_memory_usage' or 'max_execution_time' which can be very useful.
Is there any way that the return of a SELECT query would be a row iterator instead of loading the whole query result into memory. Thank you!
Hi,
I can execute:
client.execute('SELECT * FROM test3')
or
client.execute('SELECT a, b, c FROM test3')
and I will get list of tuples (one tuple for each row), but I don't have column names.
Is it somehow possible to get all columns names (named tuple, or something) with the data itself, in order to simplify further data manipulation?
For example, after getting result set from client.execute('SELECT * FROM test3'), I have to manually find out all column names and than provide that information to the pandas dataframe.
Regards.
According to the Clickhouse documentation, an exception should be raised if input_format_skip_unknown_fields
is set to false
https://clickhouse.yandex/docs/en/operations/settings/settings/#input_format_skip_unknown_fields
I can reproduce this behavior as expected with clickhouse-client, but clickhouse-driver seems to ignore this setting. In the example below, "z" is not part of the schema.
clickhouse-driver==0.0.10
ClickHouse server version 1.1.54362
for skip in (True, False):
print('Skip Unknown:', skip)
record = {'x': 1, 'y': 2, 'z': 3}
client.execute('DROP TABLE IF EXISTS test')
client.execute('CREATE TABLE test (x Int32, y Int32) ENGINE=Memory')
client.execute('INSERT INTO test FORMAT JSONEachRow', [record], settings={'input_format_skip_unknown_fields': skip})
print(client.execute('SELECT * FROM test'))
output: (no exception rasied in either case)
('Skip Unknown:', True)
[(1, 2)]
('Skip Unknown:', False)
[(1, 2)]
clickhouse-client - skip unknown fields
$ clickhouse-client -n --query='DROP TABLE IF EXISTS test; CREATE TABLE test (x Int32, y Int32) ENGINE=Memory; set input_format_skip_unknown_fields=1; INSERT INTO test FORMAT JSONEachRow' <<< '{"x": 1, "y": 2, "z": 3}'
$ clickhouse-client -E --query='SELECT * FROM test'
Row 1:
──────
x: 1
y: 2
clickhouse-client - don't skip unknown fields
$ clickhouse-client -n --query='DROP TABLE IF EXISTS test; CREATE TABLE test (x Int32, y Int32) ENGINE=Memory; set input_format_skip_unknown_fields=0; INSERT INTO test FORMAT JSONEachRow' <<< '{"x": 1, "y": 2, "z": 3}'
Error on processing query: INSERT INTO test FORMAT JSONEachRow
Code: 117, e.displayText() = DB::Exception: Unknown field found while parsing JSONEachRow format: z, e.what() = DB::Exception, Stack trace:
0. clickhouse-client(StackTrace::StackTrace()+0x16) [0x4d32516]
1. clickhouse-client(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x1f) [0x26ae89f]
2. clickhouse-client(DB::JSONEachRowRowInputStream::read(std::vector<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn> > >&)+0x1401) [0x4a16ee1]
3. clickhouse-client(DB::BlockInputStreamFromRowInputStream::readImpl()+0x88) [0x4c37388]
4. clickhouse-client(DB::IProfilingBlockInputStream::read()+0x237) [0x408f377]
5. clickhouse-client(DB::AsynchronousBlockInputStream::calculate(MemoryTracker*)+0x47) [0x26c1327]
6. clickhouse-client(ThreadPool::worker()+0x167) [0x4facd67]
7. clickhouse-client() [0x53f6f1f]
8. /lib64/libpthread.so.0(+0x7e25) [0x7f35d8f0fe25]
9. /lib64/libc.so.6(clone+0x6d) [0x7f35d82c534d]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.