GithubHelp home page GithubHelp logo

ivoa / lyonetia Goto Github PK

View Code? Open in Web Editor NEW
5.0 18.0 10.0 4.45 MB

ADQL grammar valdation

Home Page: http://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL

License: GNU General Public License v3.0

Python 5.18% Shell 4.00% Dockerfile 0.89% Java 89.93%
astronomy adql adql-parsers ivoa bnf

lyonetia's People

Contributors

gmantele avatar jd-au avatar jontxu avatar msdemlei avatar stvoutsin avatar zarquan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lyonetia's Issues

Recommended cross match syntax

It was agreed at the 2016 interop in Cape Town that we would agree on a definition for the recommended syntax for a cross match query, and this would be added to the 2.1-WD specification.

Grammar fix for DISTANCE

According to the following rules (v2.1), a DISTANCE function can accept only POINT functions and column references:

<distance_function> ::=
  DISTANCE <left_paren>
              <coord_value> <comma>
              <coord_value>
           <right_paren>
  | DISTANCE <left_paren>
               <numeric_value_expression> <comma>
               <numeric_value_expression> <comma>
               <numeric_value_expression> <comma>
               <numeric_value_expression>
             <right_paren>
 
<coord_value> ::= <point> | <column_reference>

So, it makes impossible to write the following ADQL query:

SELECT TOP 10 CENTROID(s_region),
              DISTANCE(CENTROID(s_region), POINT('', 187.48, 2.05))
FROM ivoa.ObsCore

Since CENTROID returns a POINT, the above query should be correct.

This issue is a request to change the BNF so that a coord_value includes the CENTROID function:

<coord_value> ::= <point> | <column_reference> | <centroid>

coord_value being used only by DISTANCE, COORD1 and COORD2, this modification would be totally harmless. Of course, any other suggestion of modification allowing the usage of CENTROID in DISTANCE is welcome.

--
This issue has been originally raised by @almicol and @vforchi on another GitHub issue in the gmantele/taplib repository.

Refactor the rules for ORDER BY and GROUP BY

In the current BNF the rules for ORDER BY and GROUP BY are different, and don't allow some complex constructs put forward by users.

Proposal is to change ORDER BY and GROUP BY to use the same rules, and expand the rules to allow column references, select fields, aliases and expressions in both ORDER BY and GROUP BY, with the caveat that the column reference, select field, aliase or expression must match a field from the SELECT list.

Using an alias defined in the SELECT list would be valid

    SELECT
        0.25*(0.5+FLOOR(LOG10(Mvir)/0.25)) AS log_mass,
        COUNT(*) AS num
    FROM
        MDR1.bdmv
    WHERE
        snapnum=85
    GROUP BY
        log_mass
    ORDER BY
        log_mass

Using an expression defined in the SELECT list would be valid

    SELECT
        0.25*(0.5+FLOOR(LOG10(Mvir)/0.25)) AS log_mass,
        COUNT(*) AS num
    FROM
        MDR1.bdmv
    WHERE
        snapnum=85
    GROUP BY
        0.25*(0.5+FLOOR(LOG10(Mvir)/0.25))
    ORDER BY
        0.25*(0.5+FLOOR(LOG10(Mvir)/0.25))

But using an expression that is not in the SELECT list would not be valid

    SELECT
        0.25*(0.5+FLOOR(LOG10(Mvir)/0.25)) AS log_mass,
        COUNT(*) AS num
    FROM
        MDR1.bdmv
    WHERE
        snapnum=85
    GROUP BY
        FLOOR(LOG10(Mvir)/0.25) -- This does not match a SELECT field
    ORDER BY
        FLOOR(LOG10(Mvir)/0.25) -- This does not match a SELECT field

In addition, the rules should be updated to allow both a long form fully qualified column reference, or, where it is unique within the context of the query, a short form column name.

For a column reference, using the fully qualified table name would be valid:

    SELECT
        TOP 10
        TAP_SCHEMA.columns.column_name
    FROM
        TAP_SCHEMA.columns
    ORDER BY
        TAP_SCHEMA.columns.column_name

Using the unqualified table name would be valid:

    SELECT
        TOP 10
        columns.column_name
    FROM
        TAP_SCHEMA.columns
    ORDER BY
        columns.column_name

Using a table alias would be valid:

    SELECT
        TOP 10
        cols.column_name
    FROM
        TAP_SCHEMA.columns AS cols
    ORDER BY
        cols.column_name

Using the short form column name would be valid, as long as it is unique within the context:

    SELECT
        TOP 10
        column_name
    FROM
        TAP_SCHEMA.columns
    ORDER BY
        column_name

.
However, in each of these cases, the column reference in the ORDER BY or GROUP by clause MUST match a column reference defined in the SELECT list.

Mixing the short and longs forms would not be valid :

    SELECT
        TOP 10
        column_name
    FROM
        TAP_SCHEMA.columns AS cols
    ORDER BY
        cols.column_name -- This does not match a SELECT field.

Mixing table name and table alias would not be valid :

    SELECT
        TOP 10
        columns.column_name
    FROM
        TAP_SCHEMA.columns AS cols
    ORDER BY
        cols.column_name -- This does not match a SELECT field.

Allow names to start with underscore

Email from Alberto Micol at ESO

Is there a specific reason why column names cannot start with an underscore?
I have some tables with such columns, and I cannot serve them to the community;
so, I wonder if that could be change…?

Grammar fix for CENTROID

From an email discussion with Markus

I had to dig into the ADQL grammar in another matter, too: CENTROID.

For one, pgSphere so far doesn't support computing centroids (except
for circles and points, which is lame), so I suspect nobody supports
this properly. I, at least, don't, and now try to give a sensible
error message.

But while digging into this I noticed back then I've quitely
sanitised the grammar, and I think we should fix this "upstream" now.

Currently, we have

<centroid> ::= CENTROID <left_paren> <geometry_value_expression> <right_paren>

<geometry_value_expression> ::=
     <value_expression_primary>
   | <geometry_value_function>

<value_expression_primary> ::=
       <unsigned_value_specification>
     | <column_reference>
     | <set_function_specification>
     | <left_paren> <value_expression> <right_paren>

So, something like CENTROID(47839) or CENTROID(COUNT(*)) is
grammatically ok, and I suspect that was not what people had in mind.
I suspect what they wanted was to allow a column reference. So, I
think what we want is

<geometry_value_expression> ::=
     <column_reference>
   | <geometry_value_function>

The other alternatives in <value_expression_primary> don't really
make sense for any place <geometry_value_expression> turns up in.

On the other hand, what I've been running all these past years was
essentially

geometryExpression = box | point | circle | polygon | region

geometryValue = columnReference.copy()

centroid = (CaselessKeyword("CENTROID")("fName")
    + '(' + Args(geometryValueExpression) + ')')

geometryValueExpression = geometryExpression | geometryValue | centroid

I can't promise I really thought the inclusion of centroid in
geometryValueExpression, but I'd support including it in
<geometry_value_expression>, too. It's probably not deadly
important, but it'd allow stuff like

1=CONTAINS(CENTROID(coverage), CIRCLE('', 10, 2, 3))

or similar.

boolean_term(s) in SELECT clause

Email from Marco [email protected] and Sonia [email protected]
2018-01-25

Working on TAP and ADQL in Trieste we (actually Sonia) found the following discrepancy/issue
ADQL seems to support a query like

SELECT "size"/2 AS half_size FROM TAP_SCHEMA.columns

while it seems not to support

SELECT "size" = 1 AS is_size_1 FROM TAP_SCHEMA.columns

At least Gregory's library (and TOPCAT, but IIRC TOPCAT uses Gregory's ADQL lib for validation) does so and it seems correct with respect to ADQL-2.0.

However in ADQL-2.1 the SELECT clause allows a <value_expression> that can be also a <boolean_value_expression>. This latter, however, contains no <boolean_term> and so disallows the above second query. That boolean_term would allow the SELECT to contain expressions like the above, exactly like it is in a WHERE clause (see <search_condition> definition).

We are wondering whether this is the intended behaviour or if it fits into the BNF-issues' list.
There's also the <boolean_function> BNF term (pg.56 in ADQL-2.1) that is blank, to complicate the view.

I'd like your opinion however, just to know if we can imagine to use boolean valued virtual columns (e.g.
like an a = value AS vcol) in the future or not (we already have possible alternatives).

Addition of `UPPER`, `LOWER`, `STRLEN`, `SUBSTR`, ...

I've noticed that ADQL-2.1 offers a function lower(VARCHAR)->VARCHAR.
Why not also having the function upper(VARCHAR)->VARCHAR?

Both functions are supported in all DBMS that I have checked (Postgres, Oracle, MySQL, MS-SQLServer, SQLite, H2, ...).

If, yes, why not also supporting other very common string functions like length (or strlen), trim, replace, substr, instr, ...?

I am asking because now that my ADQL Library (v1.4, but still supporting only ADQL-2.0) really supports reserved keywords, a lot of users and implementers are complaining about the fact that it is not any more possible to have UDF named lower or upper (which are reserved SQL keywords).
In some way, it forces implementers to use a different name which will then make all TAP implementations have a different name for the same function. I do not think it is a good idea to encourage such thing.

Anyway, I am now speaking about these common string functions but I am quite sure that other so-called reserved keywords of ADQL-2.x forbid TAP implementers to have some common useful functions. Maybe it would worth checking these names and see if functions associated with these names exist in most/all DBMS and if yes, integrate them in ADQL-2.1 (or 2.2 if already too late for 2.1).

Define seed value for rand()

From Marcus:
http://mail.ivoa.net/pipermail/dal/2016-July/007560.html

the table requires an argument on rand(), whereas the grammar has
it (sensibly) optional [incidentally, if we touch rand's
description for 2.1, it might be smart to say what the seed value 
is -- float? integer?  any sizes I have to accept?]

TODO:

  • Check how the random seed is implemented in Cosmopterix
  • Define what the seed means in ADQL
  • Check we can implement it in Cosmopterix

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.