ivoa / lyonetia Goto Github PK

View Code? Open in Web Editor NEW

5.0 18.0 10.0 4.45 MB

ADQL grammar valdation

Home Page: http://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL

License: GNU General Public License v3.0

Python 5.18% Shell 4.00% Dockerfile 0.89% Java 89.93%

astronomy adql adql-parsers ivoa bnf

lyonetia's People

Contributors

Stargazers

Watchers

Forkers

aipescience kristinriebe molinaro-m zarquan connexcs tubbz-alt gmantele msdemlei gaybro8777 jontxu

lyonetia's Issues

Fix description for modulo operator

We need to add a paragraph on the expected behaviour of the mod(x,y) to be sure the sign is correctly preserved.

From the 2.0 errata:
http://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL-2_0-Err-2

Modulo operator where

M % N = R

with

R having the same sign as M
|R| is less than |N|
M = K * N + R for a given integer K

Allow names to start with underscore

Email from Alberto Micol at ESO

Is there a specific reason why column names cannot start with an underscore?
I have some tables with such columns, and I cannot serve them to the community;
so, I wonder if that could be change…?

Optional seond argument for round() and trunc()

From Marcus:
http://mail.ivoa.net/pipermail/dal/2016-July/007560.html

the table requires two arguments on both round() and trunc(), while
the grammar (sensibly) says that the second argument is optional
(what it doesn't say is that it defaults to 0, but we can pull that
from SQL92).

Addition of `UPPER`, `LOWER`, `STRLEN`, `SUBSTR`, ...

I've noticed that ADQL-2.1 offers a function lower(VARCHAR)->VARCHAR.
Why not also having the function upper(VARCHAR)->VARCHAR?

Both functions are supported in all DBMS that I have checked (Postgres, Oracle, MySQL, MS-SQLServer, SQLite, H2, ...).

If, yes, why not also supporting other very common string functions like length (or strlen), trim, replace, substr, instr, ...?

I am asking because now that my ADQL Library (v1.4, but still supporting only ADQL-2.0) really supports reserved keywords, a lot of users and implementers are complaining about the fact that it is not any more possible to have UDF named lower or upper (which are reserved SQL keywords).
In some way, it forces implementers to use a different name which will then make all TAP implementations have a different name for the same function. I do not think it is a good idea to encourage such thing.

Anyway, I am now speaking about these common string functions but I am quite sure that other so-called reserved keywords of ADQL-2.x forbid TAP implementers to have some common useful functions. Maybe it would worth checking these names and see if functions associated with these names exist in most/all DBMS and if yes, integrate them in ADQL-2.1 (or 2.2 if already too late for 2.1).

Grammar fix for DISTANCE

According to the following rules (v2.1), a DISTANCE function can accept only POINT functions and column references:

<distance_function> ::=
  DISTANCE <left_paren>
              <coord_value> <comma>
              <coord_value>
           <right_paren>
  | DISTANCE <left_paren>
               <numeric_value_expression> <comma>
               <numeric_value_expression> <comma>
               <numeric_value_expression> <comma>
               <numeric_value_expression>
             <right_paren>
 
<coord_value> ::= <point> | <column_reference>

So, it makes impossible to write the following ADQL query:

SELECT TOP 10 CENTROID(s_region),
              DISTANCE(CENTROID(s_region), POINT('', 187.48, 2.05))
FROM ivoa.ObsCore

Since CENTROID returns a POINT, the above query should be correct.

This issue is a request to change the BNF so that a coord_value includes the CENTROID function:

<coord_value> ::= <point> | <column_reference> | <centroid>

coord_value being used only by DISTANCE, COORD1 and COORD2, this modification would be totally harmless. Of course, any other suggestion of modification allowing the usage of CENTROID in DISTANCE is welcome.

--
This issue has been originally raised by @almicol and @vforchi on another GitHub issue in the gmantele/taplib repository.

boolean_term(s) in SELECT clause

Email from Marco [email protected] and Sonia [email protected]
2018-01-25

Working on TAP and ADQL in Trieste we (actually Sonia) found the following discrepancy/issue
ADQL seems to support a query like

SELECT "size"/2 AS half_size FROM TAP_SCHEMA.columns

while it seems not to support

SELECT "size" = 1 AS is_size_1 FROM TAP_SCHEMA.columns

At least Gregory's library (and TOPCAT, but IIRC TOPCAT uses Gregory's ADQL lib for validation) does so and it seems correct with respect to ADQL-2.0.

However in ADQL-2.1 the SELECT clause allows a <value_expression> that can be also a <boolean_value_expression>. This latter, however, contains no <boolean_term> and so disallows the above second query. That boolean_term would allow the SELECT to contain expressions like the above, exactly like it is in a WHERE clause (see <search_condition> definition).

We are wondering whether this is the intended behaviour or if it fits into the BNF-issues' list.
There's also the <boolean_function> BNF term (pg.56 in ADQL-2.1) that is blank, to complicate the view.

I'd like your opinion however, just to know if we can imagine to use boolean valued virtual columns (e.g.
like an a = value AS vcol) in the future or not (we already have possible alternatives).

Grammar fix for CENTROID

From an email discussion with Markus

I had to dig into the ADQL grammar in another matter, too: CENTROID.

For one, pgSphere so far doesn't support computing centroids (except
for circles and points, which is lame), so I suspect nobody supports
this properly. I, at least, don't, and now try to give a sensible
error message.

But while digging into this I noticed back then I've quitely
sanitised the grammar, and I think we should fix this "upstream" now.

Currently, we have

<centroid> ::= CENTROID <left_paren> <geometry_value_expression> <right_paren>

<geometry_value_expression> ::=
     <value_expression_primary>
   | <geometry_value_function>

<value_expression_primary> ::=
       <unsigned_value_specification>
     | <column_reference>
     | <set_function_specification>
     | <left_paren> <value_expression> <right_paren>

So, something like CENTROID(47839) or CENTROID(COUNT(*)) is
grammatically ok, and I suspect that was not what people had in mind.
I suspect what they wanted was to allow a column reference. So, I
think what we want is

<geometry_value_expression> ::=
     <column_reference>
   | <geometry_value_function>

The other alternatives in <value_expression_primary> don't really
make sense for any place <geometry_value_expression> turns up in.

On the other hand, what I've been running all these past years was
essentially

geometryExpression = box | point | circle | polygon | region

geometryValue = columnReference.copy()

centroid = (CaselessKeyword("CENTROID")("fName")
    + '(' + Args(geometryValueExpression) + ')')

geometryValueExpression = geometryExpression | geometryValue | centroid

I can't promise I really thought the inclusion of centroid in
geometryValueExpression, but I'd support including it in
<geometry_value_expression>, too. It's probably not deadly
important, but it'd allow stuff like

1=CONTAINS(CENTROID(coverage), CIRCLE('', 10, 2, 3))

or similar.

Refactor the rules for ORDER BY and GROUP BY

In the current BNF the rules for ORDER BY and GROUP BY are different, and don't allow some complex constructs put forward by users.

Proposal is to change ORDER BY and GROUP BY to use the same rules, and expand the rules to allow column references, select fields, aliases and expressions in both ORDER BY and GROUP BY, with the caveat that the column reference, select field, aliase or expression must match a field from the SELECT list.

Using an alias defined in the SELECT list would be valid

    SELECT
        0.25*(0.5+FLOOR(LOG10(Mvir)/0.25)) AS log_mass,
        COUNT(*) AS num
    FROM
        MDR1.bdmv
    WHERE
        snapnum=85
    GROUP BY
        log_mass
    ORDER BY
        log_mass

Using an expression defined in the SELECT list would be valid

    SELECT
        0.25*(0.5+FLOOR(LOG10(Mvir)/0.25)) AS log_mass,
        COUNT(*) AS num
    FROM
        MDR1.bdmv
    WHERE
        snapnum=85
    GROUP BY
        0.25*(0.5+FLOOR(LOG10(Mvir)/0.25))
    ORDER BY
        0.25*(0.5+FLOOR(LOG10(Mvir)/0.25))

But using an expression that is not in the SELECT list would not be valid

    SELECT
        0.25*(0.5+FLOOR(LOG10(Mvir)/0.25)) AS log_mass,
        COUNT(*) AS num
    FROM
        MDR1.bdmv
    WHERE
        snapnum=85
    GROUP BY
        FLOOR(LOG10(Mvir)/0.25) -- This does not match a SELECT field
    ORDER BY
        FLOOR(LOG10(Mvir)/0.25) -- This does not match a SELECT field

In addition, the rules should be updated to allow both a long form fully qualified column reference, or, where it is unique within the context of the query, a short form column name.

For a column reference, using the fully qualified table name would be valid:

    SELECT
        TOP 10
        TAP_SCHEMA.columns.column_name
    FROM
        TAP_SCHEMA.columns
    ORDER BY
        TAP_SCHEMA.columns.column_name

Using the unqualified table name would be valid:

    SELECT
        TOP 10
        columns.column_name
    FROM
        TAP_SCHEMA.columns
    ORDER BY
        columns.column_name

Using a table alias would be valid:

    SELECT
        TOP 10
        cols.column_name
    FROM
        TAP_SCHEMA.columns AS cols
    ORDER BY
        cols.column_name

Using the short form column name would be valid, as long as it is unique within the context:

    SELECT
        TOP 10
        column_name
    FROM
        TAP_SCHEMA.columns
    ORDER BY
        column_name

.
However, in each of these cases, the column reference in the ORDER BY or GROUP by clause MUST match a column reference defined in the SELECT list.

Mixing the short and longs forms would not be valid :

    SELECT
        TOP 10
        column_name
    FROM
        TAP_SCHEMA.columns AS cols
    ORDER BY
        cols.column_name -- This does not match a SELECT field.

Mixing table name and table alias would not be valid :

    SELECT
        TOP 10
        columns.column_name
    FROM
        TAP_SCHEMA.columns AS cols
    ORDER BY
        cols.column_name -- This does not match a SELECT field.

Define seed value for rand()

From Marcus:
http://mail.ivoa.net/pipermail/dal/2016-July/007560.html

the table requires an argument on rand(), whereas the grammar has
it (sensibly) optional [incidentally, if we touch rand's
description for 2.1, it might be smart to say what the seed value 
is -- float? integer?  any sizes I have to accept?]

TODO:

Check how the random seed is implemented in Cosmopterix
Define what the seed means in ADQL
Check we can implement it in Cosmopterix

Recommended cross match syntax

It was agreed at the 2016 interop in Cape Town that we would agree on a definition for the recommended syntax for a cross match query, and this would be added to the 2.1-WD specification.

ivoa / lyonetia Goto Github PK

lyonetia's People

Contributors

Stargazers

Watchers

Forkers

lyonetia's Issues

Email from Marco [email protected] and Sonia [email protected] 2018-01-25

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Email from Marco [email protected] and Sonia [email protected]
2018-01-25