dalmatinerdb / dqe Goto Github PK
View Code? Open in Web Editor NEWDalmatinerDB Query Engine
License: MIT License
DalmatinerDB Query Engine
License: MIT License
I have now support of new queries, like
SELECT 'a'.'b'.* BUCKET 'x' LAST 6os
but when I try providing full path
SELECT 'a'.'b'.'c' BUCKET 'x' LAST 6os
I get errors. Crash logs says:
Error in process <0.245.0> on node '[email protected]' with exit value:
{[{reason,function_clause},{mfa,{dalmatiner_idx_handler,handle,2}},{stacktrace,[{dqe,needs_buckets,[{calc,[],{get,{<<"565c43872c136ca11e786758">>,[<<"c3bcee12-0680-4bf3-8237-f51b48330dd8">>,<<"base">>,<<"cpu">>,<<"idle">>]}}},[]],[{file,"/Users/Szarsti/wsp/dalmatiner-fe/_checkouts/dqe/src/dqe.erl"},{line,421}]},{lists,foldl,3,[{file,"lists.erl"},{line,1262}]},{dqe,prepare,1,[{file,"/Users/Szarsti/wsp/dalmatiner-fe/_checkouts/dqe/src/dqe.erl"},{line,124}]},{dqe,run,2,[{file,"/Users/Szarsti/wsp/dalmatiner-fe/_checkouts/dqe/src/dqe.erl"},{line,83}]},{timer,tc,3,[{file,"timer.erl"},{line,197}]},{dalmatiner_idx_handler,handle,2,[{file,"/Users/Szarsti/wsp/dalmatiner-fe/_build/default/lib/dalmatiner_frontend/src/dalmatiner_idx_handler.erl"},{line,28}]},{cowboy_handler,handler_handle,4,[{file,"/Users/Szarsti/wsp/dalmatiner-fe/_build/default/lib/cowboy/src/cowboy_handler.erl"},{line,111}]},{cowboy_protocol,execute,4,[{file,"/Users/Szarsti/wsp/dalmatiner-fe/_build/default/lib/cowboy/src/cowboy_protocol.erl"},{line,442}]}]},{req,[{socket,#Port<0.2190>},{transport,ranch_tcp},{connection,keepalive},{pid,<0.245.0>},{method,<<"GET">>},{version,'HTTP/1.1'},{peer,{{127,0,0,1},56363}},{host,<<"localhost">>},{host_info,undefined},{port,8081},{path,<<"/">>},{path_info,undefined},{qs,<<"q=SELECT%20%27c3bcee12-0680-4bf3-8237-f51b48330dd8%27.%27base%27.%27cpu%27.%27idle%27%20BUCKET%20%27565c43872c136ca11e786758%27%20LAST%2060s">>},{qs_vals,undefined},{bindings,[]},{headers,[{<<"host">>,<<"localhost:8081">>},{<<"user-agent">>,<<"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:42.0) Gecko/20100101 Firefox/42.0">>},{<<"accept">>,<<"application/x-msgpack">>},{<<"accept-language">>,<<"en-GB,en;q=0.5">>},{<<"accept-encoding">>,<<"gzip, deflate">>},{<<"referer">>,<<"http://localhost:8081/?query=SELECT%20%27c3bcee12-0680-4bf3-8237-f51b48330dd8%27.%27base%27.%27cpu%27.%27idle%27%20BUCKET%20%27565c43872c136ca11e786758%27%20LAST%2060s">>},{<<"connection">>,<<"keep-alive">>}]},{p_headers,[{<<"connection">>,[<<"keep-alive">>]}]},{cookies,undefined},{meta,[]},{body_state,waiting},{buffer,<<>>},{multipart,undefined},{resp_compress,false},{resp_state,waiting},{resp_headers,[]},{resp_body,<<>>},{onresponse,undefined}]},{state,undefined}],[{cowboy_protocol,execute,4,[{file,"/Users/Szarsti/wsp/dalmatiner-fe/_build/default/lib/cowboy/src/cowboy_protocol.erl"},{line,442}]}]}
I am looking into potential fix.
remove functions from the parser/lexer and instead have one generic form there.
We then resolve from a function table based on name, return type (realized/unrealized/histogram), types and number of arguments.
This will allow to add new functions without the need to change the lexper/parser
Functions derivate
and confidence
do not appear to work for a time window that is anything less than an hour, which could be due to the chunk mechanism in the new DQE functions.
This works:
SELECT confidence(m0) ALIAS 'fd4be66f-03-412e-8ec9-9f230927b629'.'base'.'network'.'eth0'.'packets_sent'
BUCKET 'fd' AS m0 BETWEEN "2015-11-01 00:00:00" AND "2015-11-01 01:00:00"
However, the following query does not return any results:
SELECT confidence(m0) ALIAS 'fd4be66f-03-412e-8ec9-9f230927b629'.'base'.'network'.'eth0'.'packets_sent'
BUCKET 'fd' AS m0 BETWEEN "2015-11-01 00:00:00" AND "2015-11-01 00:59:00"
The same applies to derivate
Imagine that indexer have already indexed some_known_metric
, but does not know nothing about some_unknown_metric
.
Now if you run a combined query:
SELECT avg(some_known_metric FROM bucket, some_unknown_metric FROM bucket)
it will work as one would expect. You will get average across all known metric.
But if you try to combine metrics that are not looked up properly
SELECT avg(some_unknown_metric FROM bucket)
You will get very confusing error message:
Combination functions can't have mix resolutions as children.
Which is not what one would expected, especially that aggregation functions in the same situation will throw message about missing data.
I think we should provide more consistent error message.
I manage to track down the error down to 'get_times_' function in https://github.com/dataloop/dqe/blob/master/src/dql_resolution.erl#L96. Because lookup returns empty 'Elements' list, 'Rmss' will be empty and thus not match case condition.
I am not entirely sure what we should do in this particular case. If we wanted to have it consistent with other parts of query system, we should change 'dql_resolution' module to not throw error, but carry on, just omitting empty execution branch.
On the other hand it would also make sense to treat failed lookup as an error and raise it early in 'dql_expand' module. 'Dql' already seems to be treating situation when no data can be fetch as an error. Raising those errors early, during expansion phase would allow us to provide more context, with details which lookups are failing, thus assisting user in fixing query.
I have spent already fair amount of time investigating the issue, so I can prepare some fix. I just need some advice which direction we should go and what would make more sense for architectural point of view.
When using normal aggregates it is guaranteed that the resulting emit will always have the same size, this means we can combine them without running into the issue that a upper level function receives data in different chunk size.
However combination functiosn do not guarantee this as when results on different terms come in in a wrong order (this can possibly happen!), the output can be the combination of more then 1 term.
This is a non issue if for normal aggregates but if a combination is on top of a combination there can be issues.
That said I'm not sure of a use case that would result in something like this but it's good to know the issue.
Right now DQE always tries to dedublicate the query, with large queries this in itself is expensive, a better way might be to only do that when we have a decent ammount (10%?) of doublicated get's (this is easy to determin in the flattened form.
When running a query against dimensions that end up having no elements the query hangs indefinetly
dalmatinerdb query-language says that
By default queries treat incoming data as a one second resolution, however this can be adjusted by passing a resolution section to the query. The syntax is: IN resolution:time.
But I always got an 400 and "Parser error in line 1: syntax error before: <<"IN">>" when I try to get data using KeyWord <<"IN">>.
My query is like this:
SELECT 'adb212ff-b56c-6b01-f262-9d807e'.'cpu'.'usage' BUCKET 'zone' LAST 30s IN 10
Then I view the file dql_lexer, keyword IN does not exist.
Is the doc outdate or there is another keyword for resolution?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.