GithubHelp home page GithubHelp logo

Comments (12)

VitaliyMF avatar VitaliyMF commented on July 25, 2024

Update: I've added default constructor for ClickHouseConnection / ClickHouseCommand and this was enough for me to use the connector with my data access library and for now DbProviderFactory is not really needed.

After all I tried to connect to CH from simple .net core console app. There was weird exception related to TcpClient.OpenAsync().RunSynchronously() and I've already provided a bugfix in my fork.
Finally, I was able to connect to CH server, run select query, but I got empty data reader (of course query returns rows when executed from clickhouse-client).

Here are my code snippet:

var conn = new ClickHouseConnection();
conn.ConnectionString = "Compress=False;Compressor=lz4;Host=localhost;Port=9000;Database=default;User=default;Password=";

var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT Source,ProductCode,COUNT(*) as __Count FROM ( SELECT * FROM precos ) as t GROUP BY Source,ProductCode";

conn.Open();
try {
	var rdr = cmd.ExecuteReader();
	while (rdr.Read()) {
		Console.WriteLine("{0}, {1}: {2}", rdr["Source"], rdr["ProductCode"], rdr["__Count"]);
	}
} finally {
	conn.Close();
}

@killwort do you have any idea why data reader doesn't have any rows?

from clickhouse-net.

killwort avatar killwort commented on July 25, 2024

If you could, please, pullrequest all your changes back here, especially if they're bugfixes. However, I tried using my driver with netcoreapp1.1 target without any problem (except for missing System.Data :) )

Regarging your problem with reader: you must iterate through all results in reader (do{}while(reader.NextResult())). This is needed 'cause CH's protocol and engine is designed in a way allowing the same query to return several cursors with different schemas (you can see it in the clickshouse client too when you do some non-aggregating query like "SELECT col1,col2 FROM table WHERE col3=1" the result will be output grouped by "blocks" on MergeTree date key). By the way, the first result is empty often.

After all, there's extension method ClickHouse.Ado.AdoExtensions.ReadAll allowing you to hide DbReader iteration implementation.

from clickhouse-net.

VitaliyMF avatar VitaliyMF commented on July 25, 2024

@killwort you're absolutely right, I was able to get a query result by calling "reader.NextResult()".
Nevertheless, this is absolutely unexpected behavior :-) because "NextResult" is used when several different result sets are returned (say, 2 selects, or stored procedure that returns results of 2 selects). In ADO.NET when single SELECT is executed only one result is expected.
I'm trying to use ClickHouse.Ado with data layer that able to work with any SQL compatible connector, so it handles data readers in usual way (= for single query, it just calls "Read").

I understand that this is caused by the nature of CH protocol; for now I will write special wrapper for ClickHouseDataReader that will iterate through all results sets and work with them as "single" result.
It would be nice if ClickHouse.Ado will support this mode from the box (maybe, it should be controlled with some special ClickHouseConnection property); I think that most users of ClickHouse.Ado will use it to execute single query and they will be surprised when reader returns 0 records without calling "NextResult" :-)

I can add this option if you point me out how to determine CH blocks that are actually result for the same query. Or this option can just iterate through all blocks with just "Read()", this is simplest variant of course. What do you think?

from clickhouse-net.

killwort avatar killwort commented on July 25, 2024

Well, clickhouse does not support multi queries per command roundtrip so all results may be safely grouped. However that does nothing to mitigate situations when blocks have different schemas (e.g. blocks from different shards or even different MergeTree date blocks may have different column order if it was not explicitly set by query).

As for conformance with usual ADO.NET ways IMO it could be dropped as clickhouse itself behaves in a way different from most databases. Its SQL dialect is incompatible with standards and it doesn't support any kind of data alteration after insert. Compatibility would simply cost too much to develop without any immediate profit, anyway why would one use CH as ORM back-end or any other automatic query builders? CH is designed to be performing in situations with bulk inserts and highly-aggregating OLAP queries, both of it require manual query tuning.

from clickhouse-net.

VitaliyMF avatar VitaliyMF commented on July 25, 2024

anyway why would one use CH as ORM back-end or any other automatic query builders?

let me explain a bit. I'm trying to use ClickHouse.Ado for BI tool connector. This tool can build pivot tables / pivot charts by dynamic configuration that is passed from UI (in other words, user able to select dimensions / measures), and unlike classic OLAP server this tool performs data aggregation on the fly (like ROLAP).

Technically, it produces simple SELECT .. GROUP BY only for columns that correspond to dimensions/measures needed for concrete pivot table. In this scenario ClickHouse SQL is quite enough standard; all dialect-specific things (functions, calculations) can be defined in the nested query SELECT ... FROM (SELECT ..)
Also this tool supports user-defined conditions: user can specify complex filter on UI like "(user_id=5 or user_id=6) and is_active=1". This condition is automatically translated to SQL with my NReco.Data library, and this just works fine with ClickHouse too. I've found a way how to use ClickHouse.Ado with NReco.Data without need of DbProviderFactory implementation (fortunately it defines it's own IDbFactory interface that uses only interfaces instead of base classes like DbConnection, DbCommand etc).

For now I've implemented special wrappers for ClickHouseCommand / ClickHouseDataReader that implement "read all" logic, and this solves my problem. If you decide that it would be nice to have this behavior in the connector code just let me know.

Regarding

e.g. blocks from different shards or even different MergeTree date blocks may have different column order if it was not explicitly set by query

it is possible to handle that somehow? I mean that ClickHouseDataReader in "ReadAll" mode can compare schema previous block schema with next one, and if order / columns set is different, align them (reorder to match order of previous block; return DBNull for columns that are missed in the next block but prevent in the previous). I guess this is untypical situation when blocks schema is different - after all, "clickhouse-client" returns result as single tabular data somehow?

from clickhouse-net.

killwort avatar killwort commented on July 25, 2024

I think its impossible to synchmonize schema across block with current reading implementation. Each protocol block contains its own header describing block structure there's no means to know future blocks' structures before you completely read previous block. Currently only one (current) block kept in client memory (though I still think it's a waste of resources as block may be quite big). The only possibility I see is to group sequential blocks with matching structure - in most cases you'll end up with big single block.
As for clickhouse-client behaviour - it is matching server engine behaviour. If your query translates to raw block output (i.e. no grouping constructs, no joins, no aggregation) result will be chunked (and it really is in clickhouse-client output). Elseway your front-end server (the one you're connecting to) do block grouping before outputting effectively eliminating chunking.

from clickhouse-net.

OlegStotsky avatar OlegStotsky commented on July 25, 2024

Wow, i`ve just spent the entire day trying to figure out why reader.Read() wasn't working. This library definitely needs documentation. I'm willing to help and cooperate since we're going to use it in our project. What do you think?

from clickhouse-net.

killwort avatar killwort commented on July 25, 2024

Well, this behaviour is described on readme.md: https://github.com/killwort/ClickHouse-Net#always-use-nextresult

If you wish to add some docs and/or implement new functionality feel free to make a pull request.

from clickhouse-net.

VitaliyMF avatar VitaliyMF commented on July 25, 2024

@OlegStotsky in my project I've wrapped ClickHouseDataReader with my own wrapper that transparently calls "NextResult":

class ReadAllDataReader : IDataReader {
	IDataReader Reader;
	internal ReadAllDataReader(IDataReader rdr) {
		Reader = rdr;
	}
	// proxy IDataReader properties/methods implementation
	
	public bool Read() {
		var res = Reader.Read();
		if (!res) {
			var hasNextResult = Reader.NextResult();
			if (hasNextResult)
				res = Reader.Read();
		}
		return res;
	}
}

@killwort possibly it is good idea add an option (enabled by default) to ClickHouseCommand that will force the same behavior?

from clickhouse-net.

manojkothapalli7 avatar manojkothapalli7 commented on July 25, 2024

@killwort Is there update on this?

from clickhouse-net.

bnuzhouwei avatar bnuzhouwei commented on July 25, 2024

DbProviderFactory is import for many ORM libs and data helpers, how to implement it?

from clickhouse-net.

killwort avatar killwort commented on July 25, 2024

Will be released with 2.0.0-preview.
Don't use for production right away, it is a major rewrite!

from clickhouse-net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.