blynk-technologies / clickhouse4j Goto Github PK
View Code? Open in Web Editor NEWLighter and faster alternative for the official ClickHouse JDBC driver
License: Other
Lighter and faster alternative for the official ClickHouse JDBC driver
License: Other
Add settings support to Copy Manager.
Problem:
String query = "... where id in (?)";
statement.setObject(listOfIds);
Result:
String query = "... where id in ([1,2,3,4])";
Should be:
String query = "... where id in (1,2,3,4)";
https://github.com/blynkkk/clickhouse4j/pull/12/files#diff-d50ae10941af3f019b71b508c67bd34dR7
Condition should be faster
Inspired by https://github.com/yandex/clickhouse-jdbc/pull/370/files. However, we need make it consistent for all types, for chars and Objects as well.
Caused by: java.lang.Throwable: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected \t before: \\N\t1\t50\t\\N\t\\N\t\\N\t\\N\t9139\t6916\t9118\t482856\n2019-10-03\tv X21UD A\t1\t3.4.3.3\t10\t\\N\t1\t52\t\\N\t\\N\t\\N\t\\N\t195\t136\t180\t1261641\n2019-10-03\tv Y66i A\t1\t2.8.1.1\t15\t\\N\t1\t: (at row 1)
It seems that clickhouse4j parsed null
value in spark as \\N
Current approach:
try (Connection connection = dataSource.getConnection();
OutputStream outputStream = Files.newOutputStream(outputFile, TRUNCATE_EXISTING)) {
CopyManager copyManager = CopyManagerFactory.create(connection); //lightweight
copyManager.copyFromDb(query, outputStream);
}
Could be improved with:
try ( CopyManager copyManager = CopyManagerFactory.create(dataSource.getConnection());
OutputStream outputStream = Files.newOutputStream(outputFile, TRUNCATE_EXISTING)) {
copyManager.copyFromDb(query, outputStream);
}
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:3.0.0:check (validate) on project clickh
ouse4j: Failed during checkstyle execution: There are 55 errors reported by Checkstyle 6.18 with checkstyle.xml ruleset.
how to solve large amount of query result ?
my query result is huge, and get a java.lang.OutOfMemoryError
, how can I solve it?
java.lang.OutOfMemoryError
at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at cc.blynk.clickhouse.util.guava.StreamUtils.copy(StreamUtils.java:70)
at cc.blynk.clickhouse.http.DefaultHttpConnector.sendPostRequest(DefaultHttpConnector.java:182)
at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:62)
at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:74)
at cc.blynk.clickhouse.ClickHouseStatementImpl.sendRequest(ClickHouseStatementImpl.java:674)
at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:168)
at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:150)
at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:137)
at cc.blynk.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:88)
... 59 elided
Thanks very much.
Current ClickHouseProperties is a mess, we need to refactor it and make the usage simpler.
Every property should be part of driver documentation, not code.
Current flow:
select x, y from z;
var entry = new EntryDTO(x, y);
file.write(entry.toCSV())
Would be nice to have some kind of CopyManager analog that is present in postgres driver and download files directly instead of creating intermediate objects.
error code
scala> :paste
// Entering paste mode (ctrl-D to finish)
lv1.repartition(30)
.write
.mode(SaveMode.Append)
.format("jdbc")
.option("url", ckUrlDa)
.option("driver", driverClass4j)
//.option("numPartitions", "1")
.option("dbtable", "t_md_page_path_di_rep")
.option("user", userMe)
.option("password", passwordMe)
.option("fetchsize", fetchSize)
.option("batchsize", batchSize)
.save()
// Exiting paste mode, now interpreting.
scala> :paste
// Entering paste mode (ctrl-D to finish)
lv1.repartition(30)
.write
.mode(SaveMode.Append)
.format("jdbc")
.option("url", ckUrlDa)
.option("driver", driverClass4j)
//.option("numPartitions", "1")
.option("dbtable", "t_md_page_path_di_dis")
.option("user", userMe)
.option("password", passwordMe)
.option("fetchsize", fetchSize)
.option("batchsize", batchSize)
.save()
// Exiting paste mode, now interpreting.
cc.blynk.clickhouse.except.ClickHouseException: ClickHouse exception, code: 62, host: null, port: 0; Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 159: NOT NULL, "node_depth" INTEGER NOT NULL, "page_id_lv1" TEXT , "page_id_lv2" TEXT NOT NULL, "page_id_lv3" TEXT NOT NULL, "page_id_lv4" TEXT NOT NULL, "page_id_lv. Expected one of: DEFAULT, MATERIALIZED, ALIAS, COMMENT, CODEC, TTL, token, ClosingRoundBracket, Comma (version 19.15.3.6 (official build))
at cc.blynk.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:61)
at cc.blynk.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:31)
at cc.blynk.clickhouse.http.DefaultHttpConnector.checkForErrorAndThrow(DefaultHttpConnector.java:296)
at cc.blynk.clickhouse.http.DefaultHttpConnector.sendPostRequest(DefaultHttpConnector.java:175)
at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:62)
at cc.blynk.clickhouse.http.DefaultHttpConnector.post(DefaultHttpConnector.java:74)
at cc.blynk.clickhouse.ClickHouseStatementImpl.sendRequest(ClickHouseStatementImpl.java:674)
at cc.blynk.clickhouse.ClickHouseStatementImpl.executeUpdate(ClickHouseStatementImpl.java:198)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:692)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:89)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:218)
... 59 elided
Caused by: java.lang.Throwable: Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 159: NOT NULL, "node_depth" INTEGER NOT NULL, "page_id_lv1" TEXT , "page_id_lv2" TEXT NOT NULL, "page_id_lv3" TEXT NOT NULL, "page_id_lv4" TEXT NOT NULL, "page_id_lv. Expected one of: DEFAULT, MATERIALIZED, ALIAS, COMMENT, CODEC, TTL, token, ClosingRoundBracket, Comma (version 19.15.3.6 (official build))
at cc.blynk.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:56)
... 70 more
I am new to clickhouse and clickhouse4j.
And I can't see any helpful document about batch insert.
I think more example codes would be very helpful.
Thanks.
JSON is very crucial for high perforance applications. Right now driver doesn't support this. We need to fix.
It would be nice if CopyManager
could accept statement
or preparedStatement
.
Right now I have to use it like this:
copyManager.copyFromDb(((ClickHousePreparedStatement) ps).asSql(), zipOutputStream);
I want to interst Array(String)
type value into a table.
but it failed:
Cannot parse quoted string: expected opening quote: (at row 1)
['001|002|003','001|002|003']
but i can insert it successfully by cli
Hi,
We are testing ClickHouse with your connector and you did a great job with it. But we have a problem. When we run service in AWS as docker and try to insert a lot of records (2 billion) I don't know what is happening but connector gradually uses all ports in docker and never release it.
I try to close every connection in try catch and get a new connection for every batch insert. And I try to get new connection only if it was closed it was better because service uses fewer ports but still it was thousands of ports.
...
try( final Connection connection = dataSource.getConnection();
final PreparedStatement preparedStatement = connection.prepareStatement(
ActivitySqlGenerator.createInsertPreparedStatement(getDbAndTableName(), QuoteType.BACK_QUOTE)));
{
connection.setAutoCommit(false);
for (Activity activity: activities) {
ActivitySqlGenerator.bindInsert(1, preparedStatement, activity);
preparedStatement.addBatch();
}
preparedStatement.executeBatch();
preparedStatement.clearBatch();
connection.commit();
} catch (Exception e) {
logger.info("Failed to bulk insert.", e);
}
...
public ClickHouseConnection getConnection() throws SQLException {
if (connection == null || connection.isClosed()) {
this.connection = dataSource.getConnection();
}
return this.connection;
}
...
try
{
final Connection connection = this.getConnection();
final PreparedStatement preparedStatement = connection.prepareStatement(
ActivitySqlGenerator.createInsertPreparedStatement(getDbAndTableName(), QuoteType.BACK_QUOTE));
connection.setAutoCommit(false);
for (Activity activity: activities) {
ActivitySqlGenerator.bindInsert(1, preparedStatement, activity);
preparedStatement.addBatch();
}
preparedStatement.executeBatch();
preparedStatement.clearBatch();
connection.commit();
} catch (Exception e) {
logger.info("Failed to bulk insert.", e);
}
...
Can someone help? Is It known problem or do I something wrong?
Current: http://localhost:8123/?compress=1&database=local_reporting
Should be: http://localhost:8123?compress=1&database=local_reporting
Typical jdbc driver has embedded LRU cache for queries, so queries in most of the apps just parsed once and than reused. In that case parser optimization could be avoided.
Release to MVN Central
Should be int, not boolean
Right now it only things more complex (javadocs, examples, tets). At the moment we don't need this specialization.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.