GithubHelp home page GithubHelp logo

ildus / clickhouse_fdw Goto Github PK

View Code? Open in Web Editor NEW
252.0 17.0 53.0 1.35 MB

ClickHouse FDW for PostgreSQL

License: Apache License 2.0

CMake 0.64% C 20.41% C++ 78.38% Dockerfile 0.06% Shell 0.06% Go 0.23% PLpgSQL 0.15% Starlark 0.07%
clickhouse fdw postgresql pushdown http binary

clickhouse_fdw's People

Contributors

a-bro avatar alrocar avatar daniel-garcia avatar deem0n avatar denchick avatar gh56123 avatar ildus avatar javisantana avatar laurenz avatar peter-sh avatar x4m avatar za-arthur avatar zzn01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clickhouse_fdw's Issues

Unknown function btrim

Hi guys.

I tried to use function trim in "where" block in postgres and got the error message: "Unknown function btrim QUERY:SELECT field FROM "default".test_table WHERE ((btrim(field) <> ''))"

How to reproduce:

  1. Create table in ClickHouse
create table test_table (field String) engine = MergeTree() ORDER BY (field);
  1. Add value to table
insert into test_table values ('213');
  1. Create fdw table in postgres
create foreign table test_table ("field" text) server clickhouse_svr;
  1. Select data and use function trim in where
select * from test_table where trim(field) <> '';

I checked on versions:

  • PostgreSQL 12.3 (Debian 12.3-1.pgdg110+1+b1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 9.3.0-14) 9.3.0, 64-bit
  • ClickHouse Verion 20.5.4.40

group bt date_part or extract fails

Hi.

First of all, thank you for the work you are doing. The FDW we have been using has be far more stable and better performance than before.

We have seen a fault with the date_part (extract) function in the the group by push down to clickhouse.

A request like this will work fine
select extract('year' from "startDateTime")
from ran.kpi_cell_data
where "startDateTime"::date >= '2019/09/28'::date
and "startDateTime"::date <= '2019/09/30'::date
and site = '0001'

But the following will fail with the response (\ERROR: clickhouse_fdw: DB::Exception: Unknown function date_part)

select date_part('day',"startDateTime")
from ran.kpi_cell_data
where "startDateTime"::date >= '2019/09/28'::date
and "startDateTime"::date <= '2019/09/30'::date
and site = '0001'
group by date_part('day',"startDateTime")

select extract('year' from "startDateTime")
from ran.kpi_cell_data
where "startDateTime"::date >= '2019/09/28'::date
and "startDateTime"::date <= '2019/09/30'::date
and site = '0001'
group by extract('year' from "startDateTime")

We have tried multiple options and settings but it seems like an issue in the mapping of the functions

Aggregates argMin, argMax

Is there any way to implement in FDW Clickhouse-specific aggregate functions? Those like argMin, argMax would be very useful.

Support LIMIT OPTION for SQL SELECT QUERY

When executing the following SQL in postgres

select * from ran.ids limit 100;

QUERY PLAN
Limit (cost=0.00..0.00 rows=1 width=128)
-> Foreign Scan on ids (cost=0.00..0.00 rows=0 width=128)

The following will be passed down to Clickhouse to perform
SELECT vendor, node, "nodeType", "ID" FROM "RAN"."IDs

The "LIMIT 100" query option is not past
This causes the server to provide the full data set to the client, which is not requested

Clickhouse log

2020.02.07 07:46:50.890220 [ 832 ] {} TCPHandler: Connected ClickHouse client version 1.1.0, revision: 54126, database: RAN, user: ran_user.
2020.02.07 07:46:50.891945 [ 832 ] {43b66dc7-cbee-4cd6-bef9-2507818ea23b} executeQuery: (from 172.28.247.16:51884, user: ran_user) SELECT vendor, node, "nodeType", "ID" FROM "RAN"."IDs"
2020.02.07 07:46:50.892903 [ 832 ] {43b66dc7-cbee-4cd6-bef9-2507818ea23b} RAN.IDs (SelectExecutor): Key condition: unknown
2020.02.07 07:46:50.892956 [ 832 ] {43b66dc7-cbee-4cd6-bef9-2507818ea23b} RAN.IDs (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1537 marks to read from 1 ranges
2020.02.07 07:46:50.895235 [ 832 ] {43b66dc7-cbee-4cd6-bef9-2507818ea23b} executeQuery: Query pipeline:
Union
Expression × 32
Expression
MergeTreeThread

2020.02.07 07:46:50.908439 [ 31 ] {} DiskSpaceMonitor: Reserving 1.00 MiB on disk default, having unreserved 2.92 TiB.
2020.02.07 07:46:50.912089 [ 16 ] {} system.trace_log (MergerMutator): Selected 3 parts from 202002_70208_70208_0 to 202002_70210_70210_0
2020.02.07 07:46:50.912305 [ 16 ] {} DiskSpaceMonitor: Reserving 1.00 MiB on disk default, having unreserved 2.92 TiB.
2020.02.07 07:46:50.912373 [ 16 ] {} system.trace_log (MergerMutator): Merging 3 parts: from 202002_70208_70208_0 to 202002_70210_70210_0 into tmp_merge_202002_70208_70210_1
2020.02.07 07:46:50.913084 [ 16 ] {} system.trace_log (MergerMutator): Selected MergeAlgorithm: Horizontal
2020.02.07 07:46:50.917649 [ 16 ] {} system.trace_log (MergerMutator): Merge sorted 518 rows, containing 7 columns (7 merged, 0 gathered) in 0.01 sec., 98466.68 rows/sec., 14.03 MB/sec.
2020.02.07 07:46:58.412786 [ 31 ] {} DiskSpaceMonitor: Reserving 1.00 MiB on disk default, having unreserved 2.92 TiB.
2020.02.07 07:47:03.976250 [ 832 ] {43b66dc7-cbee-4cd6-bef9-2507818ea23b} executeQuery: Read 12531115 rows, 2.10 GiB in 13.084 sec., 957733 rows/sec., 164.20 MiB/sec.
2020.02.07 07:47:03.976336 [ 832 ] {43b66dc7-cbee-4cd6-bef9-2507818ea23b} MemoryTracker: Peak memory usage (for query): 257.36 MiB.
2020.02.07 07:47:03.978453 [ 832 ] {43b66dc7-cbee-4cd6-bef9-2507818ea23b} MemoryTracker: Peak memory usage (total): 257.36 MiB.
2020.02.07 07:47:03.978522 [ 832 ] {43b66dc7-cbee-4cd6-bef9-2507818ea23b} TCPHandler: Processed in 13.088 sec.
2020.02.07 07:47:05.915827 [ 31 ] {} DiskSpaceMonitor: Reserving 1.00 MiB on disk default, having unreserved 2.92

Installation clickhouse_fdw

Hi! Command "cmake .." completed with error: "CMake Error at src/CMakeLists.txt:39 (target_compile_features): target_compile_features specified unknown feature "c_std_11" for target
"clickhouse_fdw"". Which C/C++ compilers is better for clickhouse_fdw? ALT Linux, postgresql11, gcc5, gcc5-c++, libcurl, libuuid are installed on server.

Could you add "epoch" support in group by?

Hi! Im trying to make a query

select extract(epoch from msgtime), count(*) from table group by 1

and getting message

date_part cannot be exported for: epoch

source msgtime format is 2020-01-15 22:50:07

Could you add support epoch?

GROUP BY in joined subqueries

FDW seems to not pushdown JOIN of subqueries if they contain GROUP BY inside.

SELECT * FROM 
(
SELECT sale_id FROM sales
) a
JOIN
(
SELECT sale_id, product_id FROM products
) b ON a.sale_id = b.sale_id

Lets say both tables (sales and products) are foreign tables.
Previous query seems to work well but next query seems to not pushdown join.

SELECT * FROM 
(
SELECT sale_id FROM sales
GROUP BY  sale_id
) a
JOIN
(
SELECT sale_id, product_id FROM products
) b ON a.sale_id = b.sale_id

Is there any way to change this behaviour?

Clickhouse database in IMPORT FOREIGN SCHEMA

Hi.

Do I understand correctly that when doing IMPORT FOREIGN SCHEMA "x" FROM SERVER clickhouse INTO ch the "x" is ignored?

And that if I want to import from a specific Clickhouse database (equivalent, more or less, to a Postgres schema), then I need to CREATE SERVER for each Clickhouse database?

Seems like an unintuitive and poor usability workflow, is there any chance this could be changed?

Thanks.

Array insert error (HV004)

Getting an error when attempting to insert array data type.

ClickHouse (19.14.3.3):

CREATE TABLE arr_test (
    ts Date, 
    arr Array(Nullable(String))
) ENGINE = MergeTree() PARTITION BY toYYYYMM(ts) ORDER BY (ts)

PostgreSQL (12.2):

CREATE SCHEMA click;
IMPORT FOREIGN SCHEMA "default" LIMIT TO ("arr_test") 
    FROM SERVER clickhouse_svr INTO click;
INSERT INTO click.arr_test SELECT now()::date, '{test}'::text[];

Message:

ERROR:  cannot convert constant value to clickhouse value
HINT:  Constant value data type: 1009
SQL state: HV004

Clickhouse database in IMPORT FOREIGN SCHEMA

#40 is closed as fixed, but it doesn't really work as intended:

CREATE SERVER clickhouse_my_stuff FOREIGN DATA WRAPPER clickhouse_fdw OPTIONS(host '10.2.3.4', port '8123');    -- OK
CREATE USER MAPPING FOR CURRENT_USER SERVER clickhouse_my_stuff OPTIONS (user 'readonly', password 'qwerty');    -- OK
CREATE SCHEMA my_stuff;    -- OK
IMPORT FOREIGN SCHEMA "some_db" FROM SERVER clickhouse_my_stuff INTO my_stuff;    -- OK

SELECT * FROM my_stuff.some_table;    -- ERROR:
ERROR:  clickhouse_fdw:Code: 60, e.displayText() = DB::Exception: Table default.some_table doesn't exist.
QUERY:SELECT a, b, c FROM "default".some_table

compilation error: ‘runtime_error’ is not a member of ‘std’

With the following versions
ClickHouse server version 20.4.2.9 (official build).
gcc (GCC) 10.1.0
psql (PostgreSQL) 12.2

The cmake runs just fine:

-- The C compiler identification is GNU 10.1.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Setting clickhouse_fdw build type -
-- The CXX compiler identification is GNU 10.1.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PkgConfig: /usr/bin/pkg-config (found version "1.6.3")
-- Checking for module 'libcurl'
--   Found libcurl, version 7.70.0
-- Checking for module 'uuid'
--   Found uuid, version 2.35.1
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ctlaltdefeat/clickhouse_fdw/build

But I get the following error when running make:

Scanning dependencies of target clickhouse_fdw_sql
[  1%] Generating ../clickhouse_fdw--1.2.sql
[  1%] Built target clickhouse_fdw_sql
Scanning dependencies of target lz4-lib
[  3%] Building C object src/clickhouse-cpp/contrib/lz4/CMakeFiles/lz4-lib.dir/lz4.c.o
[  4%] Building C object src/clickhouse-cpp/contrib/lz4/CMakeFiles/lz4-lib.dir/lz4hc.c.o
[  6%] Linking C static library liblz4-lib.a
[  6%] Built target lz4-lib
Scanning dependencies of target cityhash-lib
[  7%] Building CXX object src/clickhouse-cpp/contrib/cityhash/CMakeFiles/cityhash-lib.dir/city.cc.o
[  9%] Linking CXX static library libcityhash-lib.a
[  9%] Built target cityhash-lib
Scanning dependencies of target clickhouse-cpp-lib-static
[ 10%] Building CXX object src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/base/coded.cpp.o
[ 12%] Building CXX object src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/base/compressed.cpp.o
[ 14%] Building CXX object src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/base/input.cpp.o
[ 15%] Building CXX object src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/base/output.cpp.o
[ 17%] Building CXX object src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/base/platform.cpp.o
[ 18%] Building CXX object src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/base/socket.cpp.o
[ 20%] Building CXX object src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/columns/array.cpp.o
[ 21%] Building CXX object src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/columns/date.cpp.o
[ 23%] Building CXX object src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/columns/decimal.cpp.o
[ 25%] Building CXX object src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/columns/enum.cpp.o
[ 26%] Building CXX object src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/columns/factory.cpp.o
In file included from /home/ctlaltdefeat/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/factory.cpp:12:
/home/ctlaltdefeat/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/nothing.h: In member function ‘virtual void clickhouse::ColumnNothing::Save(clickhouse::CodedOutputStream*)’:
/home/ctlaltdefeat/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/nothing.h:46:20: error: ‘runtime_error’ is not a member of ‘std’
   46 |         throw std::runtime_error("method Save is not supported for Nothing column");
      |                    ^~~~~~~~~~~~~
make[2]: *** [src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/build.make:213: src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/columns/factory.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:284: src/clickhouse-cpp/clickhouse/CMakeFiles/clickhouse-cpp-lib-static.dir/all] Error 2
make: *** [Makefile:150: all] Error 2

Problem with timestamp selecting mapped in PostgreSQL

Hello there!

Thanks for your great job! We really like it!

We met with issue related to selecting DateTime field in clickhouse

In clickhouse table we have a field msgtime that defined as DateTime type of clickhouse DB. Here is the following example of data │ 2019-12-13 22:46:30 │

In PostgreSQL in foreign table we defined msgtime field as time without time zone field.

When we do SELECT msgtime FROM security_logs LIMIT 10;
we get only time without dates │22:46:30 │
But expected to get │ 2019-12-13 22:46:30 │

Why this can happen?

Below the information about CK schema
│ CREATE TABLE SEQUENCEDATA.security_logs ( msgdateDate,string1Nullable(String),string2Nullable(String),string3Nullable(String),string4Nullable(String),string5Nullable(String),string6Nullable(String),string7Nullable(String),string8Nullable(String),string9Nullable(String),string10 Nullable(String)) ENGINE = MergeTree(msgdate, (uniqid, pattern_id, device_ip), 8192) │

Thanks!

INSERT INTO SELECT operation fill all memory

Hi!
There is a table with 100 rows in Postgresql . The table is very small.
I'm attempting to perform the INSERT INTO SELECT operation but getting an error (SELECT from table PG. INSERT INTO at Clickhouse):
2020-06-23 16:29:04.937 MSK [1571]: [3-1] db=usn_core,tx=0,user=usn_core,app=[unknown],client=10.0.. ERROR: clickhouse_fdw:Code: 242, e.displayText() = DB::Exception: Table is in readonly mode
2020-06-23 16:29:04.937 MSK [1571]: [4-1] db=usn_core,tx=0,user=usn_core,app=[unknown],client=10.0.. DETAIL: query: INSERT INTO table_B(col_1, col_2, col_3) FORMAT TSV
1 2020-02-06 \N
2 2020-02-06 \N
....
2020-06-23 16:29:04.937 MSK [1571]: [5-1] db=usn_core,tx=0,user=usn_core,app=[unknown],client=10.0.. CONTEXT: SQL statement "insert into table_B select * from table_A"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 152 0 0 100 152 0 20479 --:--:-- --:--:-- --:--:-- 21714
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 124 0 0 100 124 0 30053 --:--:-- --:--:-- --:--:-- 31000
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 33.7M 0 33.7M 100 110 50.7M 165 --:--:-- --:--:-- --:--:-- 50.7M
2020-06-23 16:29:06.173 MSK [6304]: [2-1] db=template1,tx=0,user=postgres,app=[unknown],client=[local] LOG: connection authorized: user=postgres database=template1
2020-06-23 16:29:06.178 MSK [6304]: [3-1] db=template1,tx=0,user=postgres,app=psql,client=[local] LOG: disconnection: session time: 0:00:00.009 user=postgres database=template1 host=[local]
2020-06-23 16:29:06.381 MSK [6350]: [1-1] db=[unknown],tx=0,user=[unknown],app=[unknown],client=[local] LOG: connection received: host=[local]
2020-06-23 16:29:06.384 MSK [6350]: [2-1] db=template1,tx=0,user=postgres,app=[unknown],client=[local] LOG: connection authorized: user=postgres database=template1
2020-06-23 16:29:06.393 MSK [6350]: [3-1] db=template1,tx=0,user=postgres,app=psql,client=[local] LOG: disconnection: session time: 0:00:00.012 user=postgres database=template1 host=[local]

100 44.5M 0 44.5M 100 110 26.7M 66 0:00:01 0:00:01 --:--:-- 26.7M
................................................................................
100 31.6G 0 31.6G 0 110 252M 0 --:--:-- 0:02:08 --:--:-- 120M
2020-06-23 16:31:14.484 MSK [16360]: [1-1] db=[unknown],tx=0,user=[unknown],app=[unknown],client=[local] LOG: connection received: host=[local]
2020-06-23 16:31:14.493 MSK [16360]: [2-1] db=template1,tx=0,user=postgres,app=[unknown],client=[local] LOG: connection authorized: user=postgres database=template1
2020-06-23 16:31:14.578 MSK [16360]: [3-1] db=template1,tx=0,user=postgres,app=psql,client=[local] LOG: disconnection: session time: 0:00:00.098 user=postgres database=template1 host=[local]
2020-06-23 16:31:15.250 MSK [16392]: [1-1] db=[unknown],tx=0,user=[unknown],app=[unknown],client=[local] LOG: connection received: host=[local]
2020-06-23 16:31:15.254 MSK [16392]: [2-1] db=template1,tx=0,user=postgres,app=[unknown],client=[local] LOG: connection authorized: user=postgres database=template1
2020-06-23 16:31:15.285 MSK [16392]: [3-1] db=template1,tx=0,user=postgres,app=psql,client=[local] LOG: disconnection: session time: 0:00:00.035 user=postgres database=template1 host=[local]
2020-06-23 16:31:17.798 MSK [930]: [6-1] db=,tx=0,user=,app=,client= LOG: server process (PID 1571) was terminated by signal 9: Killed
2020-06-23 16:31:17.798 MSK [930]: [7-1] db=,tx=0,user=,app=,client= DETAIL: Failed process was running: select receipt_details()
2020-06-23 16:31:17.798 MSK [930]: [8-1] db=,tx=0,user=,app=,client= LOG: terminating any other active server processes
2020-06-23 16:31:17.798 MSK [11195]: [3-1] db=usn_core,tx=0,user=usn_core,app=[unknown],client=10.253.45.141 WARNING: terminating connection because of crash of another server process
2020-06-23 16:31:17.798 MSK [11195]: [4-1] db=usn_core,tx=0,user=usn_core,app=[unknown],client=10.253.45.141 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-06-23 16:31:17.798 MSK [11195]: [5-1] db=usn_core,tx=0,user=usn_core,app=[unknown],client=10.253.45.141 HINT: In a moment you should be able to reconnect to the database and repeat your command.
.................................................................................
2020-06-23 16:31:17.892 MSK [930]: [10-1] db=,tx=0,user=,app=,client= LOG: database system is shut down

Error when working with uppercase letters

Example:
#Clickhouse version
create table Messages
(
LineTopic String,
weight Float64,
length Float64,
task_speed Float64,
density Float64,
ts datetime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(ts)
ORDER BY (LineTopic , ts)

#Postgres version
CREATE FOREIGN TABLE Messages(
LineTopic text,
weight numeric(10, 2),
length numeric(10, 2),
task_speed numeric(10, 2),
density numeric(10, 2),
ts timestamp(0)
) SERVER clickhouse_svr OPTIONS (table_name 'Messages');

Select * from Messages Limit 5;
Result:
ERROR: clickhouse_fdw:Code: 47, e.displayText() = DB::Exception: Missing columns: 'linetopic' while processing query: 'SELECT linetopic, weight, length, task_speed, density, ts FROM mlines.Messages ', required columns: 'weight' 'linetopic' 'density' 'length' 'task_speed' 'ts', source columns: 'density' 'ts' 'task_speed' 'length' 'LineTopic' 'weight'
QUERY:SELECT linetopic, weight, length, task_speed, density, ts FROM mlines."Messages"

When I replaced LineTopic on line_topic, it's working.
Should it be so?

communication error: No URL set!

After installation clickhouse_fdw i try to make example from "usage" part in Readme, but when i make: "SELECT bbl,tbea,bav,insertion_date FROM tax_bills_nyc LIMIT 5;"
postgres recive:
clickhouse_fdw: communication error: No URL set!

Where I wrong?

ERROR: clickhouse_fdw: unimplemented 13228

Hi there!

Im trying to create table schema

CREATE FOREIGN TABLE table_1 (
  uuid text,
  pattern_id integer,
  device_ip text,       
  msgdate date,                   
  raw_log text,             
  "TokenElevationType" text,        
  "EAPReasonCode" text,             
  srcip text,                     
  srcmac text,          
  srcemail text,              
  "Security" text,                  
  "TargetLogonId" text,            
  "TargetDomainName" text,          
  "Opcode" integer,                    
  "MemberSid" text,
  "AuthenticationPackageName" text,
  srcuser text,
  dstgroup text,             
  "SamAccountName" text,
  "PreviousTime" text,      
  "Execution" text,        
  priority integer,           
  dstemail text,        
  "KeyLength" text,            
  "SubjectLogonId" text,
  "ErrorCode" text,
  "EAPErrorCode" text,
  srchost text,
  "TargetLinkedLogonId" text,
  "RestrictedAdminMode" text,  
  "SidHistory" text,
  "ActivityID" text,
  outiface text,               
  policyid integer,            
  "Keywords" text,
  "CommandLine" text,
  "ScriptPath" text,
  "SubjectDomainName" text,
  "SessionName" text,
  "Identity" text,
  msgtime time without time zone,
  dsthost text,
  pktssent integer,             
  "AllowedToDelegateTo" text,
  "NewUacValue" text,
  "ProviderName" text,
  "PeerMac" text,
  "ProviderGuid" text,             
  srcuid integer,        
  status text,          
  bytessent integer,             
  srcipnat text,        
  "AccountExpires" text,
  "PrivilegeList" text,
  command text,       
  reason text,             
  "EventID" integer,              
  "NewProcessId" text,
  "LogonType" text,
  srcportnat integer,           
  "UserWorkstations" text,
  "HomePath" text,
  "PasswordLastSet" text,
  severity integer,
  "Task" integer,
  "UserParameters" text,
  srczone text,      
  "ClientName" text,
  "DisplayName" text,
  dstdomain text,
  "ProcessName" text,
  "UserPrincipalName" text,
  "AccountName" text,
  "AccessMask" text,
  dstipnat text,
  dstuser text, 
  "Event" text,
  "LocalMac" text,
  "TransmittedServices" text,       
  "Computer" text, 
  dstip text,            
  dstportnat INTEGER,
  "NewTime" text,
  "ImpersonationLevel" text,
  dstgid integer,  
  "System" text,
  "ReasonCode" text,               
  "OldUacValue" TEXT, 
  "ShareName" TEXT,         
  "TargetUserSid" TEXT,
  appvendor TEXT,
  object TEXT,
  "AccessList" TEXT,
  "Level" INTEGER,
  appname TEXT,            
  "ProcessId" TEXT,
  "SSID" TEXT,
  "LogonProcessName" TEXT,
  "ObjectType" TEXT,
  "NewProcessName" TEXT,
  apphost TEXT,      
  "VirtualAccount" TEXT,
  "WorkstationName" TEXT,
  "UserAccountControl" TEXT,
  "HomeDirectory" TEXT,
  protocol TEXT,
  action TEXT,
  "ProfilePath" TEXT,
  "ClientAddress" TEXT,
  "IpPort" TEXT,
  srcgroup TEXT,
  "PrimaryGroupId" TEXT,
  "ParentProcessName" TEXT,
  "LmPackageName" TEXT,
  msgid TEXT,             
  srcdomain TEXT,
  duration INTEGER,
  "ReasonText" text,
  "IntfGuid" TEXT,
  "IpAddress" TEXT,
  "ElevatedToken" TEXT,
  "LogonHours" TEXT,
  appip TEXT,
  dstport INTEGER,        
  dstuid INTEGER,
  bytesrecv INTEGER,
  "ServiceShutdown" TEXT,
  "TargetOutboundDomainName" TEXT,
  "Version" INTEGER,               
  "MemberName" TEXT,
  "ShareLocalPath" TEXT,
  dstzone TEXT,
  method TEXT,
  "AccountDomain" TEXT,
  "LogFileCleared" TEXT,
  "Channel" TEXT,
  srcport INTEGER,
  sessionid INTEGER,
  "MandatoryLabel" TEXT,
  pktsrecv INTEGER,
  "LogonID" TEXT,
  "SubjectUserSid" TEXT,
  "ThreadId" INTEGER,
  "Correlation" TEXT,
  "LogonGuid" TEXT,
  srcgid INTEGER,
  "TargetOutboundUserName" TEXT,
  "EapRootCauseString" TEXT,
  dstmac TEXT,
  iniface TEXT,
  "TargetUserName" TEXT,
  "SubjectUserName" TEXT,
  "EventRecordID" INTEGER,
  "TargetSid" TEXT,
  string1 TEXT,
  string2 TEXT,
  string3 TEXT,
  string4 TEXT,
  string5 TEXT,
  string6 TEXT,
  string7 TEXT,
  string8 TEXT,
  string9 TEXT,
  string10 TEXT
) SERVER clickhouse_svr;

and after creating and trying to make a query:

select * from table_1 LIMIT 10;

getting an error

ERROR:  clickhouse_fdw: unsupported column type: f��f��f��f��

And if I try the same select one more time, then im getting another error

ERROR:  clickhouse_fdw: unimplemented 13228

Could you help why its happening? Im noticed that if creating schema just with first 2 field, hten everything is okay!

Problem with cursors

Hi,

I'm using Tableau connected to PostgreSQL server, when I try to apply some filters it use cursors:

BEGIN;declare "SQL_CUR000001DB5D2F8590" cursor with hold for SELECT CAST("table1"."p1" AS TEXT) AS "p1"
        FROM "public"."table1" "table1"
        GROUP BY 1
        ORDER BY 1 ASC NULLS FIRST;fetch 2048 in "SQL_CUR000001DB5D2F8590"

When I check the query in clickhouse I see that not the full sentence has been sent:
SELECT p1 FROM "default".table1 GROUP BY p1

It would be possible to push down to clickhouse the entire request?

Thanks!

Support limit pushdown

In some cases after grouping by certain column(s) ClickHouse still might return a lot of rows, so performing LIMIT on clickhouse side will be beneficial.

Password Must be url encoded

Hi.
I set my clickhouse password to 'odiF2t11#' and using http driver , every query returned me an error.
after encoding the password manually, it Was OK.

support `COUNT(DISTINCT col1) FILTER (WHERE col2='xx')`

COUNT(col1) FILTER (WHERE col2='xx') will translate to countIf(col1, (col2='xx'))

but with DISTINCT, it will try fetch much more data from clickhouse.
it seems that:
COUNT(DISTINCT col1) FILTER (WHERE col2='xx') can translate to uniqExactIf(col1, col2='xx')

make error

Hi, Ildus,

I want to try your FDW, but I got the following ERROR when I 'make' the master branch. Anything wrong?

[ 87%] Built target clickhouse_fdw_sql
make[1]: *** No rule to make target 'src/CMakeFiles/install.dir/all', needed by 'src/CMakeFiles/clickhouse_fdw.dir/all'. Stop.
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

Yineng

Push functions down to clickhouse layer?

If I am interacting with DateTime64(x) types I typically need to either cast a date to them via toDateTime64() or cast them to a DateTime. However I cannot use the toDateTime64 or toDateTime functions in a query as postgres tries to parse them itself. Similarly with toStartOfInterval. From what I recall when trying to use clickhousedb_fdw it supported pushing at least some functions through to clickhouse (but doesn't support arrays, which is a bit of a show-stopper). Would it be possible to have some option in clickhouse_fdw to pass functions through to clickhouse?

Could not map type when importing foreign schema

Using
ClickHouse server version 20.4.2.9 (official build).
gcc (GCC) 10.1.0
psql (PostgreSQL) 12.2

When running:
IMPORT FOREIGN SCHEMA "kk" FROM SERVER clickhouse_svr INTO public;

I run into the following error:
[XX000] ERROR: clickhouse_fdw: could not map type <6> on events Detail: select name, type from system.columns where database='default' and table='events'

The clickhouse table events has the following definition:

CREATE TABLE events
(
    id                  String,
    timestamp           DateTime64(6),
    type                String,
    col1            String,
    col2             String,
    text                String,
    col3              UInt8,
    col4            String,
    col5 String,
    col6 UInt8,
    col7 Nested (name String, version UInt16),
    col8 Nested (name String, version UInt16),
    col9 Array(String),
    col10 String,
    col11 Int32,
    col 12 UInt8,
    col13 Int16,
    col14 Int16,
    col15 Int32,
    col16 UInt8,
    col17 Decimal(10, 2),
    INDEX idx_type type TYPE set(30) GRANULARITY 4,
    INDEX idx_text lower(text) TYPE ngrambf_v1(3, 256, 2, 0) GRANULARITY 4,
    INDEX idx_time timestamp TYPE minmax GRANULARITY 4
)
    ENGINE = MergeTree() PARTITION BY toYYYYMM(timestamp) ORDER BY (col1, col2, timestamp);

and the query select name, type from system.columns where database='default' and table='events' shows the same thing except that instead of

col7 Nested (name String, version UInt16),
col8 Nested (name String, version UInt16)

There is:

col7.name         │ Array(String)  │
│ col7.version      │ Array(UInt16)  │
│ col8.name    │ Array(String)  │
│ col8.version │ Array(UInt16)

I saw in an earlier issue that nested types seemed to be working, so I'm not sure where the issue could be.

Crash when table has LowCardinality(Nullable(String)) field

Pg 12.5, latest clickhouse-fdw, latest clickhouse:20 docker image.

In clickhouse:

CREATE TABLE test (
    time DateTime ,
    test LowCardinality(Nullable(String))
) ENGINE = MergeTree()
PARTITION BY toDate(time)
ORDER BY (time);         

IMPORT FOREIGN SCHEMA "test" FROM SERVER clickhouse_svr INTO public; then crashes.

LowCardinality(String) and Nullable(String) work fine by themselves.

Tables containing VARCHAR type are not imported

Thanks so much for FDW. Already started to use between our bases.
When using the IMPORT FOREIGN SCHEMA "emma" FROM SERVER ch_servertv INTO ch_db; command, we found the following error:


ERROR:  syntax error at or near ","
LINE 5:  "block_type" VARCHAR(50)),                                       

QUERY:  CREATE FOREIGN TABLE ch_db.merged_blocks (
	"block_begin_time" INT4,
	"block_date" DATE NOT NULL,
	"block_is_actual" INT2,
	"block_type" VARCHAR(50)),
	"block_volume" INT2
) SERVER ch_servertv OPTIONS (table_name 'merged_blocks');

The text of the command uses a double character to create an external table )).

Converting field values for the BOOLEAN type on INSERT INTO SELECT operations

Without explicit conversion of type BOOLEN to INTEGER, operation INSERT INTO SELECT cannot be performed.

There are two tables.

In Clickhouse:

CREATE TABLE table_test_boolean
(
    block_begin_time Nullable(Int32), 
    block_date Date, 
    block_distr Nullable(UInt8),    
    block_id Int64    
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(block_date)
PRIMARY KEY (block_id, block_date)
ORDER BY (block_id, block_date)
SETTINGS index_granularity = 8192

In PostgreSQL:

CREATE TABLE blocks.table_test_boolean (
  block_begin_time INTEGER,
  block_date DATE NOT NULL,
  block_distr BOOLEAN,
  block_id BIGINT
) 

I execute query in PG:

INSERT INTO ch_db_http.merged_blocks
SELECT
    block_begin_time,
    block_date,
    block_distr,
    block_id
FROM data_marts.merged_blocks;

Response contains an error:

ERROR:  column "block_distr" is of type smallint but expression is of type boolean
LINE 5:     block_distr,
            ^
HINT:  You will need to rewrite or cast the expression.
QUERY:  INSERT INTO ch_db_http.merged_blocks
SELECT
 ...

There is no error when explicitly converting BOOLEAN type to INTEGER type (block_distr::INTEGER).

INSERT INTO ch_db_http.merged_blocks
SELECT
    block_begin_time,
    block_date,
    block_distr::INTEGER,
    block_id
FROM data_marts.merged_blocks;

It is possible to add automatic BOOLEAN type conversion (to a type supported by Clickhouse) for INSERT INTO SELECT operations?

Thanks.

error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.

[ 4%] Built target lz4-lib
[ 6%] Built target clickhouse_fdw_sql
[ 9%] Built target cityhash-lib
[ 45%] Built target clickhouse-cpp-lib-static
[ 46%] Building CXX object src/CMakeFiles/clickhouse_fdw.dir/binary.cc.o
In file included from /usr/include/c++/5/cstdint:35:0,
from /xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/../base/input.h:4,
from /xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/../base/coded.h:3,
from /xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/column.h:3,
from /xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/nullable.h:3,
from /xxx/clickhouse_fdw/src/binary.cc:6:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
#error This file requires compiler and library support
^
In file included from /xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/../base/coded.h:3:0,
from /xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/column.h:3,
from /xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/nullable.h:3,
from /xxx/clickhouse_fdw/src/binary.cc:6:
/xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/../base/input.h:11:26: error: expected ‘;’ at end of member declaration
virtual ~InputStream() noexcept (false)
^
/xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/../base/input.h:11:38: error: expected identifier before ‘false’
virtual ~InputStream() noexcept (false)
^
/xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/../base/input.h:11:38: error: expected ‘,’ or ‘...’ before ‘false’
/xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/../base/input.h:11:43: error: ISO C++ forbids declaration of ‘noexcept’ with no type [-fpermissive]
virtual ~InputStream() noexcept (false)
^
/xxx/clickhouse_fdw/src/clickhouse-cpp/clickhouse/columns/../base/input.h:15:26: error: ‘uint8_t’ has not been declared
inline bool ReadByte(uint8_t* byte) {

Postgres array function is pushed down to the ClickHouse server

Postgres function array_position is pushed down to the ClickHouse server.

test=# explain verbose
select 1 as fld
from clickhouse_fdw.events_dist
where "custom.value"[array_position( "custom.name", 'stream')] in ('elreg', 'issues');

QUERY PLAN
--------------
 Foreign Scan on clickhouse_fdw.events_dist  (cost=0.00..0.00 rows=0 width=4)
   Output: 1
   Remote SQL: SELECT NULL FROM main.events_dist WHERE ((("custom.value"[array_position("custom.name", 'stream')]) IN ('elreg','issues')))
(3 rows)

Throws a DB:Exception error as a result

test=# select 1 as fld
from clickhouse_fdw.events_dist
where "custom.value"[array_position( "custom.name", 'stream')] in ('elreg', 'issues');

ERROR:  clickhouse_fdw: DB::Exception: Unknown function array_position
  • PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
  • ClickHouse server version 19.16.2 revision 54427
  • clickhouse_fdw 1.2

not support pg json data as where conditions.

I have a json string field (called udmap) in clickhouse, when I handle it as PG json usage (udmap::jsonb -> 'src_stage')::int8 = 1 in where conditions, then error:

SELECT * FROM event WHERE (udmap::jsonb -> 'src_stage')::int8 = 1

错误: clickhouse_fdw:Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 83: -> 'src_stage'), 'Nullable(Int64)') = 1)). Expected one of: NOT, AS, LIKE, AND, OR, IN, BETWEEN, alias, token, IS, NOT LIKE, NOT IN, GLOBAL IN, GLOBAL NOT IN, Comma, QuestionMark
QUERY:SELECT udmap FROM test_cdp.t_customer_event WHERE ((cast((CAST(udmap AS jsonb(0)) -> 'src_stage'), 'Nullable(Int64)') = 1))

Error on insert text that contains chr(10)

/* clickhouse */
CREATE TABLE table_test_text
(
calendar Date,
textfield Nullable(String)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(calendar)
ORDER BY (calendar)
SETTINGS index_granularity = 8192

/* postgres /
create schema click;
IMPORT FOREIGN SCHEMA "testdb" LIMIT TO ("table_test_text") FROM SERVER clickhouse_svr INTO click;
/
check select*/
select * from click.table_test_text

/* insert data*/
insert into click.table_test_text(calendar, textfield)
select now()::date, 'test'||chr(10)||'test';

Get Error:

SQL Error [2F000]: ERROR: clickhouse_fdw:Code: 38, e.displayText() = DB::Exception: Cannot parse date: value is too short: (at row 2)

Row 1:
Column 0,   name: calendar,  type: Date,             parsed text: "2020-03-12"
Column 1,   name: textfield, type: Nullable(String), parsed text: "test"

Row 2:
Column 0,   name: calendar,  type: Date,             ERROR: text "test<LINE FEED>" is not like Date


  Detail: query: INSERT INTO testdb.table_test_text(calendar, textfield) FORMAT TSV
2020-03-12	test
test

Is it possible to replace or escape char (LF) inside FDW converter?

Could not find PG_REGRESS

hi, I am trying to compile sources based on instructions, but getting such errors. any tips on how to solve it would be really helpful. could not find how to install this pg_regress

user@postgresbi:~/clickhouse_fdw/build$ cmake ..
-- Setting clickhouse_fdw build type -
CMake Error at CMakeLists.txt:51 (find_program):
  Could not find PG_REGRESS using the following names: pg_regress


-- Configuring incomplete, errors occurred!
See also "/home/user/clickhouse_fdw/build/CMakeFiles/CMakeOutput.log".

timezone support?

when changing time zone, for example:

select date_trunc('day', event_time AT TIME ZONE 'UTC')

will generate remote query like:

SELECT toStartOfDay(timezone('UTC', event_time))

and it failed with:

ERROR:  clickhouse_fdw:Code: 42, e.displayText() = DB::Exception: Number of arguments for function timezone doesn't match: passed 2, should be 0

is it possible to support time zone?

crash on IPv4 / IPv6 types on import foreign schema

create table t (t IPv4) engine=MergeTree() order by tuple();
IMPORT FOREIGN SCHEMA "default" LIMIT TO (t) FROM SERVER clickhouse_svr INTO public;

crashes. I guess the same with IPv6 data type. Using http driver.

ERROR: clickhouse_fdw: could not append data to column - std::bad_alloc

When I create Postgres to point to a table from Clickhouse using Postgres_fdw, I used sql "insert into Postgres_clickhouse_fdw select * from Postgres_table". When I insert a clickhouse from Postgres via FDW, The following error is reported when I make more than two connections
Binary mode is used.The current server still has the remaining memory space.
ERROR: clickhouse_fdw: could not append data to column - std::bad_alloc。

DateTime interval

Hi there!

Thanks for your job!

I tried to query time interval (for the whole day of today) and directly in CH it worked well, but through the ch_fdw it didn't work..

SELECT msgtime as X, 
count(msgtime) as Y 
FROM table
WHERE msgtime >= toStartOfDay(today()) AND msgtime < toStartOfDay(today()+1)
GROUP BY msgtime 
ORDER BY msgtime asc 
LIMIT 100;

it says that :

ERROR:  function today() does not exist
LINE 4: WHERE msgtime >= toStartOfDay(today()) AND msgtime < toStart...
                                      ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

Does the clickhouse_fdw support DateTime intervals?

Broke the connection when running 'insert into select' statement

I was trying to your examples also 'insert into select' statement.
but my connection in postgresql was broken when running 'insert into select' statement.
below is my proceeding.
Please kindly check this out, and give me any feedback through the comment in this document.

  • my env : centos 7 / postgresql12 / ClickHouse 20.6.4

*************************************** Clickhouse *****************************************
:) CREATE DATABASE test_database;

:) USE test_database;
:) CREATE TABLE tax_bills_nyc
(
bbl Int64
, owner_name String
, tax_class String
, tbea Float64
, bav Float64
, insertion_date DateTime MATERIALIZED now()
)
ENGINE = MergeTree
PARTITION BY tax_class
ORDER BY
(
owner_name
)
;

:) CREATE TABLE tax_bills
(
bbl bigint
, owner_name text
)
ENGINE = MergeTree
PARTITION BY bbl
ORDER BY
(
bbl
)
;

-- manual data uploading
cat tax_bills_nyc.csv | clickhouse-client --input_format_allow_errors_num=10 --query="INSERT INTO test_database.tax_bills_nyc FORMAT CSV"
*************************************** Clickhouse*****************************************

*************************************** PgSQL *****************************************

create server clickhouse_svr foreign data wrapper clickhouse_fdw OPTIONS (dbname 'test_database', driver 'binary' ,host '127.0.0.1');

create user MAPPING FOR CURRENT_USER server clickhouse_svr ;

IMPORT FOREIGN SCHEMA "default" FROM SERVER clickhouse_svr INTO public;

IMPORT FOREIGN SCHEMA "test_database" FROM SERVER clickhouse_svr INTO public;

\d

               List of relations

Schema | Name | Type | Owner
--------+--------------------+---------------+----------
public | pg_stat_statements | view | postgres
public | tax_bills | foreign table | postgres
public | tax_bills_nyc | foreign table | postgres
public | test_form | foreign table | postgres
(4 rows)

select * from tax_biils_nyc_cp ;

bbl     | owner_name | tax_class | tbea  |  bav   |   insertion_date    

------------+------------+-----------+-------+--------+---------------------
4000620001 | DVYA | d | 8961 | 80550 | 2020-11-18 03:36:50
1001200009 | LOXI | d | 72190 | 648900 | 2020-11-18 03:36:50
4157860094 | LROB | d | 13317 | 119700 | 2020-11-18 03:36:50
4123850237 | VYIE | d | 50 | 450 | 2020-11-18 03:36:50
4103150163 | WGZW | d | 2053 | 18450 | 2020-11-18 03:36:50
4123850237 | WGZW | d | 222 | 52413 | 2020-11-18 07:46:31
(6 rows)

insert into tax_bills ( select a.bbl , a.owner_name from tax_bills_nyc as a limit 1 ) ;

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q
*************************************** PgSQL *****************************************

Roadmap of clickhouse_fdw

Hi!
I'm interested in roadmap of clickhouse_fdw in next some months. It's important for realistic planning. When current limitations of clickhouse_fdw will be finalized?

aggregates with filters

in postgresql we have queries like sum(clicks) filter (where id = 42)
in clickhouse the same stuff can be done using If suffix

it would be cool if FDW could support that.

ClickHouse DateTime64 is a read-only in Postgres

ClickHouse DateTime64 is a read-only in Postgres, I can read it in Postgres, but cannot write it back:
"psql>insert into sss values(now());" -> "unexpected column type for TIMESTAMPOID: DateTime64(6)".
Probably it's just because there is no backward conversion from Postgres's timestamp to ClickHouse DateTime64 in "column_append" function in "src/binary.cc" , but it can be a challenge for implementation, because ClickHouse DateTime64 has a precision settings, e.g., it can be DateTime64(1) or DateTime64(3) or DateTime64(XX).
Thanks.

has() translated from IN operator suffers from performance problem for long scalar array

The clickhouse-fdw translates SELECT * FROM table WHERE col IN ('a', 'b','c') to
SELECT [columns] FROM table WHERE has(['a','b','c'], 'col')

However the has() function seems to do repeated table scans in clickhouse, which suffers from performance problem if the list grows long.

The function static void deparseScalarArrayOpExpr(ScalarArrayOpExpr *node, deparse_expr_cxt *context) in clickhousedb-deparse.c is responsible of wrapping the square bracket scalar array in has() function. I did an ugly patch for that and it worked for my simple case but is there a reason to use the has() function and square bracket instead of original IN operator?

crash if there are DateTime columns

Any tables in ClickHouse with data such as DateTime(time-zone) or DateTime64, e.g., DateTime('Europe/Moscow') or DateTime64(3,'UTC') follows to the crash when "IMPORT FOREIGN SCHEMA" in PostgreSQL (version 12.4):

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The current workaround is a view in ClickHouse with conversion to timestamp.

But probably it can be nice to add some reasonable exception message if some "unknown or bad" types are present.

INSERT INTO SELECT operation on large tables

In PostgreSQL there is a table data_marts.merged_blocks whose size is 8Gb
Attempting to perform the INSERT INTO SELECT operation produces the following error (SELECT from table PG. INSERT INTO at Clickhouse):

ERROR:  out of memory
DETAIL:  Cannot enlarge string buffer containing 1073741810 bytes by 32 more bytes.
CONTEXT:  SQL statement "INSERT INTO ch_db_http.merged_blocks
SELECT

The error occurs on side of the PostgreSQL.

Table data_marts.merged_blocks has 12 million rows. When using the operator LIMIT 1 000 000, it will turn out to fulfill the request.

expected comma in AggregateFunction

Hello
I'm using clickhouse_fdw with Postgres-12.0

Trying to connect it to clickhouse but everytime I'm getting the following error
ERROR: clickhouse_fdw: expected comma in AggregateFunction
No matter which FOREIGN SCHEMA I try to import...

Steps I did:

CREATE SERVER clickhouse_svr FOREIGN DATA WRAPPER clickhouse_fdw OPTIONS(dbname 'xxxx'',host 'rfb_clickhouse');

CREATE USER MAPPING FOR CURRENT_USER SERVER clickhouse_svr OPTIONS (user 'xxx', password 'xxx');

IMPORT FOREIGN SCHEMA "tax_bills" FROM SERVER clickhouse_svr INTO public;

Schema tax_bills comes just from Readme.md:

CREATE TABLE tax_bills
(
    bbl Int64,
    owner_name String
)
ENGINE = MergeTree
PARTITION BY bbl
ORDER BY bbl;```


How to nail the issue?

coalesce in the where clause

fdw for some reason does not push down coalesce in the where clause:
pg: explain verbose select count(*) from clickhouse.test where coalesce(id, 0) = 123:

Limit  (cost=0.00..0.01 rows=1 width=8)
  Output: (count(*))
  ->  Aggregate  (cost=0.00..0.01 rows=1 width=8)
        Output: count(*)
        ->  Foreign Scan on clickhouse.test  (cost=0.00..0.00 rows=0 width=0)
              ...
              Filter: (COALESCE(test.id, 0) = 123)
              Remote SQL: SELECT id FROM default.test

Version mismatch during CREATE EXTENSION on PG12.0

Background Information:
PostgreSQL Version: PostgreSQL 12.0
OS: Red Hat Enterprise Linux 7.7

I have a compiled from source in master branch on 24 Oct 2019 and installed into /usr/pgsql-12
Executing CREATE EXTENSION clickhouse-fdw; returns

ERROR: incompatible library "/usr/pgsql-12/lib/clickhouse_fdw.so": version mismatch
DETAIL: Server is version 12, library is version 11.
SQL state: XX000

INSERT operations to a Clickhouse table with field type of Nullable are not performed

In DB PostgreSQL created a table in the database and filled it with data. Some fields were filled with NULL values:

CREATE TABLE test.merged_blocks_test_null (
  block_begin_time INTEGER,
  block_date DATE NOT NULL,
  block_is_actual SMALLINT,
  block_volume SMALLINT
) 

INSERT INTO test.merged_blocks_test_null ("block_begin_time", "block_date", "block_is_actual", "block_volume")
VALUES 
  (1, E'2019-10-10', NULL, NULL),
  (2, E'2019-10-10', NULL, NULL),
  (3, E'2019-10-12', 1, 1);

Clickhouse created a table in the database:

CREATE TABLE merged_blocks_test_nullable
(
    block_begin_time Nullable(Int32), 
    block_date Date, 
    block_is_actual Nullable(UInt8), 
    block_volume Nullable(Int8)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(block_date)
PRIMARY KEY block_date
ORDER BY block_date
SETTINGS index_granularity = 8192

In the database, PostgreSQL created external table ch_db_http.merged_blocks_test_nullable using the command:
IMPORT FOREIGN SCHEMA "emma" FROM SERVER ch_emma_http INTO ch_db_http;

Then, when performing an INSERT INTO SELECT operation, an error occurs:

INSERT INTO ch_db_http.merged_blocks_test_nullable
SELECT * FROM test.merged_blocks_test_null

Presumably in the code section::
https://github.com/adjust/clickhouse_fdw/blob/b9ff511076a647fc92b3cab3a5efd51439bec860/src/clickhousedb_fdw.c#L1559-L1563

we made a temporary fix:

if (isnull)
{
	appendStringInfoString(&fmstate->sql, "\\N");
	continue;
}

This helped us import a table containing type fields Nullable.
Perhaps we did not take into account all the details when fixing, is it possible to check this error?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.