GithubHelp home page GithubHelp logo

Comments (10)

heanlan avatar heanlan commented on June 2, 2024 3

Thanks @yanjunz97 and @yuntanghsu . For your questions on the rawQuery, rawSql, I have an answer now.

As Yun-Tang might not know, for the datasource plugin, we were previously using the vertamedia-clickhouse-grafana and then switch to grafana-clickhouse-datasource. grafana-clickhouse-datasource only introduced the rawSql, while
vertamedia-clickhouse-grafana introduced formattedQuery, query, and rawQuery . So in our dashboard JSONs, they now have all these four "Query", which makes it confusing. Sorry for not discovering this issue earlier. I'll delete those unused formattedQuery, query, and rawQuery from dashboard JSONs. Ref issue: grafana/clickhouse-datasource#144

So for grafana-clickhouse-datasource, the only input is rawSql. They do a relative complex parsing/translating on the rawSql and send to the datasource.

Another (possibly good) update is, the plugin team suggested we can retrieve the query results from api/ds/query . I'll spend time to verify whether that's what we want, as it definitely contradicts with what being told by the Grafana team 🙃
https://grafana.com/docs/grafana/latest/developers/http_api/data_source/#query-a-data-source


Update: I did some small tests and it seems to work overall. So the test pipeline will be:

  • call the dashboard api to retrieve the dashboard JSON
  • parse the JSON and select the queries(including all the query configuration, datasource uid) from it
  • call the query api to execute the query and get the result in a data frame form

from theia.

heanlan avatar heanlan commented on June 2, 2024 1

No, I mean is it possible to test these steps The datasource plugin takes the query as the input, send the query to the external datasource API, and return the query result data as the output to Grafana directly? Without rendering dashboards and comparing screenshots in a browser.

Understood. I don't see the plugin exposes such an interface to use. Let me open an issue on their repo to verify with them.

Comparing (a) talking to the clickhouse server and (b) talking to the clickhouse datasource plugin, I think the only difference is where do we do the query parsing/translating:

  • For (a), the input query should be in a form that the clickhouse server can understand. In the dashboard JSON file, it should be the rawQuery.

  • For (b), the input query is directly the query we wrote, including some macro like $timeFilter(). In the dashboard JSON file, it should be the sqlQuery. The datasource plugin will parse the query and translate it to the format that clickhouse server can understand, send request to clickhouse server.

One pending issue of (a) is, I found the rawQuery is not always aligned with the sqlQuery, I'm still waiting for the reply from the datasource plugin team.

from theia.

yuntanghsu avatar yuntanghsu commented on June 2, 2024 1

I think the queries regarding the table flows and other MVs should have no difference between distributed table and stand alone table? Does "most of the queries" refer to system tables for things like storage consumption? I think even if we introduce ClickHouse cluster, we still need to support both cases, i.e., everything should work for both stand alone ClickHouse and ClickHouse cluster.

Yes, only for system tables. Adding "cluster" can make sure we have same behavior for standalone ClickHouse and ClickHouse cluster.

from theia.

heanlan avatar heanlan commented on June 2, 2024

cc @dreamtalen @yanjunz97 @yuntanghsu @salv-orlando for discussion and evaluation. Any feedback would be much appreciated.

Updated the description by a little bit.

from theia.

dreamtalen avatar dreamtalen commented on June 2, 2024

If we want to verify the query result, one alternative is: Send request to Grafana dashboard API, get the dashboard JSON file, extract the query from the dashboard JSON, and run the query independently against the datasource.

This approach sounds acceptable to me, although the steps of executing queries and get back result in Grafana are not covered.

In our case is the grafana-clickhouse datasource. The datasource plugin takes the query as the input, send the query to the external datasource API, and return the query result data as the output to Grafana

Is it possible to run e2e tests for grafana-clickhouse datasource directly to cover this missing part in the above approach?

from theia.

heanlan avatar heanlan commented on June 2, 2024

Is it possible to run e2e tests for grafana-clickhouse datasource directly to cover this missing part in the above approach?

Are you referring to this e2e test https://github.com/grafana/clickhouse-datasource/tree/5ff43fc6d609912e3ae1163bf19699d43e03d1a8/cypress-e2e ? It is built with Cypress, needs to be run in a browser.

from theia.

dreamtalen avatar dreamtalen commented on June 2, 2024

Is it possible to run e2e tests for grafana-clickhouse datasource directly to cover this missing part in the above approach?

Are you referring to this e2e test https://github.com/grafana/clickhouse-datasource/tree/5ff43fc6d609912e3ae1163bf19699d43e03d1a8/cypress-e2e ? It is built with Cypress, needs to be run in a browser.

No, I mean is it possible to test these steps The datasource plugin takes the query as the input, send the query to the external datasource API, and return the query result data as the output to Grafana directly? Without rendering dashboards and comparing screenshots in a browser.

from theia.

yuntanghsu avatar yuntanghsu commented on June 2, 2024

I think the only difference between (a) and (b) is the $timeFilter(), which can be deleted? And we can just use the parameters in (a) for the $timeInterval,

One thing comes up to my mind is that after clickhouse cluster merged, not sure if the original query are still usable.

I'm sending query to clickhouse db to retrieve metrics and verify their values. But for most of the queries, I need to add "cluster" to query in order to all the data in shards. e.g. "cluster('{cluster}', INFORMATION_SCHEMA.COLUMNS)"
If you only use distributed engine table, I think it might not be an issue?

from theia.

yanjunz97 avatar yanjunz97 commented on June 2, 2024

I do not have much to add to the discussion. Regarding to choices between (a) and (b), (b) looks better to me, as it seems to include the datasource plugin into the whole e2e tests. But as it depends on the plugin API, let's wait to see the reply from them.

Besides, I'm curious about the fact that rawQuery is not always aligned with the sqlQuery. In this case, do we still get the result as we expected based on the sqlQuery we write? Does that mean rawQuery is not something the datasource plugin used to interact with the ClickHouse server? IMO it might be better if we can use what the plugin uses to talk to the ClickHouse server if we have to choose option (a).

from theia.

yanjunz97 avatar yanjunz97 commented on June 2, 2024

One thing comes up to my mind is that after clickhouse cluster merged, not sure if the original query are still usable.

I'm sending query to clickhouse db to retrieve metrics and verify their values. But for most of the queries, I need to add "cluster" to query in order to all the data in shards. e.g. "cluster('{cluster}', INFORMATION_SCHEMA.COLUMNS)" If you only use distributed engine table, I think it might not be an issue?

I think the queries regarding the table flows and other MVs should have no difference between distributed table and stand alone table? Does "most of the queries" refer to system tables for things like storage consumption? I think even if we introduce ClickHouse cluster, we still need to support both cases, i.e., everything should work for both stand alone ClickHouse and ClickHouse cluster.

from theia.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.