ivoa / lyonetia Goto Github PK
View Code? Open in Web Editor NEWADQL grammar valdation
Home Page: http://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL
License: GNU General Public License v3.0
ADQL grammar valdation
Home Page: http://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL
License: GNU General Public License v3.0
It was agreed at the 2016 interop in Cape Town that we would agree on a definition for the recommended syntax for a cross match query, and this would be added to the 2.1-WD specification.
According to the following rules (v2.1), a DISTANCE
function can accept only POINT
functions and column references:
<distance_function> ::=
DISTANCE <left_paren>
<coord_value> <comma>
<coord_value>
<right_paren>
| DISTANCE <left_paren>
<numeric_value_expression> <comma>
<numeric_value_expression> <comma>
<numeric_value_expression> <comma>
<numeric_value_expression>
<right_paren>
<coord_value> ::= <point> | <column_reference>
So, it makes impossible to write the following ADQL query:
SELECT TOP 10 CENTROID(s_region),
DISTANCE(CENTROID(s_region), POINT('', 187.48, 2.05))
FROM ivoa.ObsCore
Since CENTROID
returns a POINT
, the above query should be correct.
This issue is a request to change the BNF so that a coord_value
includes the CENTROID
function:
<coord_value> ::= <point> | <column_reference> | <centroid>
coord_value
being used only by DISTANCE
, COORD1
and COORD2
, this modification would be totally harmless. Of course, any other suggestion of modification allowing the usage of CENTROID
in DISTANCE
is welcome.
--
This issue has been originally raised by @almicol and @vforchi on another GitHub issue in the gmantele/taplib repository.
In the current BNF the rules for ORDER BY and GROUP BY are different, and don't allow some complex constructs put forward by users.
Proposal is to change ORDER BY and GROUP BY to use the same rules, and expand the rules to allow column references, select fields, aliases and expressions in both ORDER BY and GROUP BY, with the caveat that the column reference, select field, aliase or expression must match a field from the SELECT list.
Using an alias defined in the SELECT list would be valid
SELECT
0.25*(0.5+FLOOR(LOG10(Mvir)/0.25)) AS log_mass,
COUNT(*) AS num
FROM
MDR1.bdmv
WHERE
snapnum=85
GROUP BY
log_mass
ORDER BY
log_mass
Using an expression defined in the SELECT list would be valid
SELECT
0.25*(0.5+FLOOR(LOG10(Mvir)/0.25)) AS log_mass,
COUNT(*) AS num
FROM
MDR1.bdmv
WHERE
snapnum=85
GROUP BY
0.25*(0.5+FLOOR(LOG10(Mvir)/0.25))
ORDER BY
0.25*(0.5+FLOOR(LOG10(Mvir)/0.25))
But using an expression that is not in the SELECT list would not be valid
SELECT
0.25*(0.5+FLOOR(LOG10(Mvir)/0.25)) AS log_mass,
COUNT(*) AS num
FROM
MDR1.bdmv
WHERE
snapnum=85
GROUP BY
FLOOR(LOG10(Mvir)/0.25) -- This does not match a SELECT field
ORDER BY
FLOOR(LOG10(Mvir)/0.25) -- This does not match a SELECT field
In addition, the rules should be updated to allow both a long form fully qualified column reference, or, where it is unique within the context of the query, a short form column name.
For a column reference, using the fully qualified table name would be valid:
SELECT
TOP 10
TAP_SCHEMA.columns.column_name
FROM
TAP_SCHEMA.columns
ORDER BY
TAP_SCHEMA.columns.column_name
Using the unqualified table name would be valid:
SELECT
TOP 10
columns.column_name
FROM
TAP_SCHEMA.columns
ORDER BY
columns.column_name
Using a table alias would be valid:
SELECT
TOP 10
cols.column_name
FROM
TAP_SCHEMA.columns AS cols
ORDER BY
cols.column_name
Using the short form column name would be valid, as long as it is unique within the context:
SELECT
TOP 10
column_name
FROM
TAP_SCHEMA.columns
ORDER BY
column_name
.
However, in each of these cases, the column reference in the ORDER BY or GROUP by clause MUST match a column reference defined in the SELECT list.
Mixing the short and longs forms would not be valid :
SELECT
TOP 10
column_name
FROM
TAP_SCHEMA.columns AS cols
ORDER BY
cols.column_name -- This does not match a SELECT field.
Mixing table name and table alias would not be valid :
SELECT
TOP 10
columns.column_name
FROM
TAP_SCHEMA.columns AS cols
ORDER BY
cols.column_name -- This does not match a SELECT field.
Email from Alberto Micol at ESO
Is there a specific reason why column names cannot start with an underscore?
I have some tables with such columns, and I cannot serve them to the community;
so, I wonder if that could be change…?
From an email discussion with Markus
I had to dig into the ADQL grammar in another matter, too: CENTROID.
For one, pgSphere so far doesn't support computing centroids (except
for circles and points, which is lame), so I suspect nobody supports
this properly. I, at least, don't, and now try to give a sensible
error message.
But while digging into this I noticed back then I've quitely
sanitised the grammar, and I think we should fix this "upstream" now.
Currently, we have
<centroid> ::= CENTROID <left_paren> <geometry_value_expression> <right_paren>
<geometry_value_expression> ::=
<value_expression_primary>
| <geometry_value_function>
<value_expression_primary> ::=
<unsigned_value_specification>
| <column_reference>
| <set_function_specification>
| <left_paren> <value_expression> <right_paren>
So, something like CENTROID(47839) or CENTROID(COUNT(*)) is
grammatically ok, and I suspect that was not what people had in mind.
I suspect what they wanted was to allow a column reference. So, I
think what we want is
<geometry_value_expression> ::=
<column_reference>
| <geometry_value_function>
The other alternatives in <value_expression_primary> don't really
make sense for any place <geometry_value_expression> turns up in.
On the other hand, what I've been running all these past years was
essentially
geometryExpression = box | point | circle | polygon | region
geometryValue = columnReference.copy()
centroid = (CaselessKeyword("CENTROID")("fName")
+ '(' + Args(geometryValueExpression) + ')')
geometryValueExpression = geometryExpression | geometryValue | centroid
I can't promise I really thought the inclusion of centroid in
geometryValueExpression, but I'd support including it in
<geometry_value_expression>, too. It's probably not deadly
important, but it'd allow stuff like
1=CONTAINS(CENTROID(coverage), CIRCLE('', 10, 2, 3))
or similar.
Working on TAP and ADQL in Trieste we (actually Sonia) found the following discrepancy/issue
ADQL seems to support a query like
SELECT "size"/2 AS half_size FROM TAP_SCHEMA.columns
while it seems not to support
SELECT "size" = 1 AS is_size_1 FROM TAP_SCHEMA.columns
At least Gregory's library (and TOPCAT, but IIRC TOPCAT uses Gregory's ADQL lib for validation) does so and it seems correct with respect to ADQL-2.0.
However in ADQL-2.1 the SELECT
clause allows a <value_expression>
that can be also a <boolean_value_expression>
. This latter, however, contains no <boolean_term>
and so disallows the above second query. That boolean_term would allow the SELECT
to contain expressions like the above, exactly like it is in a WHERE
clause (see <search_condition>
definition).
We are wondering whether this is the intended behaviour or if it fits into the BNF-issues' list.
There's also the <boolean_function> BNF term (pg.56 in ADQL-2.1) that is blank, to complicate the view.
I'd like your opinion however, just to know if we can imagine to use boolean valued virtual columns (e.g.
like an a = value AS vcol
) in the future or not (we already have possible alternatives).
From Marcus:
http://mail.ivoa.net/pipermail/dal/2016-July/007560.html
the table requires two arguments on both round() and trunc(), while
the grammar (sensibly) says that the second argument is optional
(what it doesn't say is that it defaults to 0, but we can pull that
from SQL92).
We need to add a paragraph on the expected behaviour of the mod(x,y) to be sure the sign is correctly preserved.
From the 2.0 errata:
http://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL-2_0-Err-2
Modulo operator where
M % N = R
with
R having the same sign as M
|R| is less than |N|
M = K * N + R for a given integer K
I've noticed that ADQL-2.1 offers a function lower(VARCHAR)->VARCHAR
.
Why not also having the function upper(VARCHAR)->VARCHAR
?
Both functions are supported in all DBMS that I have checked (Postgres, Oracle, MySQL, MS-SQLServer, SQLite, H2, ...).
If, yes, why not also supporting other very common string functions like length
(or strlen
), trim
, replace
, substr
, instr
, ...?
I am asking because now that my ADQL Library (v1.4, but still supporting only ADQL-2.0) really supports reserved keywords, a lot of users and implementers are complaining about the fact that it is not any more possible to have UDF named lower
or upper
(which are reserved SQL keywords).
In some way, it forces implementers to use a different name which will then make all TAP implementations have a different name for the same function. I do not think it is a good idea to encourage such thing.
Anyway, I am now speaking about these common string functions but I am quite sure that other so-called reserved keywords of ADQL-2.x forbid TAP implementers to have some common useful functions. Maybe it would worth checking these names and see if functions associated with these names exist in most/all DBMS and if yes, integrate them in ADQL-2.1 (or 2.2 if already too late for 2.1).
From Marcus:
http://mail.ivoa.net/pipermail/dal/2016-July/007560.html
the table requires an argument on rand(), whereas the grammar has
it (sensibly) optional [incidentally, if we touch rand's
description for 2.1, it might be smart to say what the seed value
is -- float? integer? any sizes I have to accept?]
TODO:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.