tryghost / gql Goto Github PK

View Code? Open in Web Editor NEW

39.0 10.0 22.0 121 KB

Filter query language for working with Ghost's API

License: MIT License

JavaScript 96.27% Lex 2.00% Yacc 1.73%

gql's Introduction

Deprecated

GQL is deprecated. It was replaced by NQL.

Issues should be raised here now.

GQL is still used in Ghost <= 2.7.1. We won't archive this repository yet in case we have to apply security fixes.

GQL

GQL stands for 'Ghost Query Language'

The aim is to provide a simple gmail or github filter-like syntax for specifying conditions, whilst being flexible and powerful enough to support the majority of 'where' expressions available in SQL.

GQL itself is parsed and expanded out into a JSON object which can be used to build queries in SQL (and probably No SQL).

Example:

The GQL expression featured:true+tags.count:>10

Would be converted to the following JSON object:

{statements: [
    {prop: "featured", op: "=", value: true},
    {prop: "tags.count", op: ">", value: 10, func: "and"}
]}

And via Knex, would be further converted to the following SQL:

where "featured" = true and "tags"."count" > 10

Inside of Ghost, this syntax is accepted via the filter parameter when browsing resources in our JSON API.

What's in the box?

This repository comes in three parts:

the language parsing functionality, providing gql.parse()
a set of lodash-like tools for processing the JSON objects returned
some currently Ghost-specific helpers for converting the JSON objects into SQL via knex's query builder

The intention is to eventually move all of the Ghost-specific code and replace it with generic query-building code for Knex and perhaps also a bookshelf plugin. It should also be possible to provide other interfaces, e.g. a direct conversion to SQL or NoSQL query formats.

Usage

Knex:

var filters = gql.parse('featured:true+tags.count:>10');
gql.knexify(knex('myTable'), filters);

Bookshelf:

var filters = gql.parse('featured:true+tags.count:>10');
myBookshelfModel.forge().query(function (qb) {
  gql.knexify(qb, filters);
});

To get raw SQL via Knex:

var filters = gql.parse('featured:true+tags.count:>10');
var myTable = knex('myTable');
gql.knexify(myTable, filters);
return myTable.toQuery();

Statement processing

GQL also supported grouped statements, e.g. author:joe+(tag:photo,image:-null)

Which result in nested statements like this:

{statements: [
 {op: "!=", value: "joe", prop: "author"},
 {group: [
    {op: "=", value: "photo", prop: "tag"},
    {op: "IS NOT", value: null, prop: "image", func: "or"}
  ], func: "and"}
]}

And which should result in the following SQL:

where "author"."slug" != "joe" and ("posts"."featured" = true or "posts"."image" is not null);

As the JSON returned by GQL is not always a simple set of objects, performing an operation on every statement requires a recursive loop. GQL provides tools for this:

eachStatement
findStatement
matchStatement
mergeStatements
rejectStatements
printStatements

There are currently two ways that you 'could' use these functions externally (e.g. in Ghost) and in the vein of naming things is hard, I can't decide which I prefer.

You could do:

var _ = require('lodash');
_.mixin(require('ghost-gql').json);

_.eachStatement(statements...);

Or you could do

var gql = require('ghost-gql');
gql.json.eachStatement(statements...);

For now you'll need to use the inline docs which explain how to use each function.

Syntax

The full spec can be found in TryGhost/Ghost#5604 - I will move this eventually.

How and why

GQL exists because we needed a very simple filter syntax that could be passed as a string in either a method call, a URL, or a handlebars helper attribute. The concept was originally proposed in TryGhost/Ghost#5463 (comment) and then later spec'd more fully in TryGhost/Ghost#5604. The syntax created works well no matter whether the API is being called internally or externally.

The two-step conversion process from GQL -> JSON -> SQL exists for flexibility. This library can and will handle the whole process, but with the JSON step in the middle and the lodash style tools for processing the JSON, it is possible to perform various operations on the JSON, for example, filtering out unsafe conditions.

Also it's possible to implement conversion from the JSON format to SQL either via knex or without it, as well as to no-SQL JSON-like query formats.

The conversion from GQL -> JSON is performed via a JISON parser. JISON is an amazing tool that allows you to easily specify the rules for a language in a JavaScript like syntax, and it creates the parser for you.

In the /src/ folder is a .l and a .y file used by JISON to generate the parser. gql.l is the lexer or tokenizer that defines all of the symbols that GQL can understand. gql.y is the grammar, it defines the rules about in what order the symbols must appear. If you make changes to gql.l or gql.y, you'll need to run npm run build in order to generate a new version of the parser in /dist/.

Copyright & License

Copyright (c) 2015-2020 Ghost Foundation - Released under the MIT license. Ghost and the Ghost Logo are trademarks of Ghost Foundation Ltd. Please see our trademark policy for info on acceptable usage.

gql's People

Contributors

Stargazers

Watchers

Forkers

tribemedia acburdine prayagverma mickael-van-der-beek laran kleopatra999 rmoorman kirrg001 dipakdotyadav jiajun serenader2014 aileen bethanyr walkthunder naz isabella232

gql's Issues

Upgrade knex dependency - test restructure

Knex is pinned to 0.8.6 and used inside of this repo purely for testing that the "knexify" adapter is able to convert from GQL to a query builder object and ultimately to the correct SQL.

The tests in this library rely on being able to do require('knex')({}), passing no config, and getting a fallback sql statement that reflects SQLite3, which used to be Ghost's default.

However, in later versions of knex, it is now required to pass in a client, e.g. require('knex')({client: 'sqlite3'}) or require('knex')({client: 'mysql'}).

This means that the tests fail after upgrading.

Using sqlite3 should make the tests work again, but that seems wrong as MySQL is now the default in Ghost. Therefore it seems like there needs to be some restructuring of the tests to make sure that we get the correct output for MySQL and SQLite3.

This is therefore a slightly bigger task than all the other dependency upgrades.

Revisit & improve the `knexify` concept.

This is a brain dump of thoughts on the currently very temporary knexify module that exists in #2.

Currently, the 'knexifier' is just a mish-mash of functions which provide glue between the JSON format output from the GQL parser, and a knex query builder instance. The key part is the buildWhere function which organises the filter JSON into a set of query builder calls. I envisage we need a similar buildJoin function to do the same for joins, but I'm not sure if that truly should live in GQL.

(currently it's a dirty hack in Ghost's core/server/models/base/utils.js similar to the existing 'query' behaviour that handles joins for the existing implementation of tag/author filtering)

For now, knexify contains a bunch of contextual information that it uses to do its thing, and it needs a whole load more - like which attributes are permitted on a model, what their valid values are, and so on.

All of this is information that is already encoded inside of Ghost's model layer, so having it splatted in here is just a duplication for convenience sake whilst we're building all this fancy stuff out and trying to connect the dots between GQL and Ghost.

Long term, we should be grabbing all of the contextual information we need from Ghost's models, in a uniform and predictable way. This suggests that what we want to hook into with GQL is bookshelf, rather than knex.

I believe that the best approach for doing this might be to provide a bookshelf plugin as part of GQL, one that can hook into any bookshelf models and produce the same effect, providing those models expose the same functions or properties.

Short term, this might be too time-consuming, and it may be best to continue duplicating the information inside GQL in the most useful format, whilst the API in Ghost is also extended out, and revisit doing this in a better way in the future.

A question for right now is: is there some sort of middle ground? E.g. restructuring knexify as a bookshelf plugin, but still using hard-coded contextual information for now, or perhaps providing both the utils like buildWhere and buildJoin in a 'knexify' module, and having a second bookshelf plugin module to hook in?

I'm loosely aiming to make GQL useful to other people, although it's not a priority, for now the main priority is getting it both working AND highly testable.

Some interesting things, mostly for later, that I'm not sure how to do as a bookshelf plugin are:

provide the allowed attributes - Ghost models have their own permittedAttributes function - can we include that in the plugin in some way?
provide information about relations - I haven't yet found a way to get a list of what relations a model has from bookshelf itself. It may be that this is possible, or that it needs to become possible as part of bookshelf, or that we need to expose a permittedJoins type function as well
specify which relations should have counts and how they should behave
specify special rules for particular queries (see the 'Special Rules' section here)

JSON API

To flesh out changes from #17.

I'm going to think of this as more of a conversation than a full spec in one shot.

I've written up docs over in the wiki where I did this work. In the wiki I gave examples to show what the GQL string syntax for a part of a query looks like along with the equivalent JSON and SQL. If you read the wiki with those goggles on while thinking about the JSON API it may seem more relevant. I'll summarize some here though.

I'll use the term filter because I think we have some context around that term. And when I refer to a filter I'll be speaking about a JSON object (which could have been created by calling gql.parse() on a string).

A filter is an array of JSON objects, each of which represents a condition. In the special case that a filter contains only a single condition it will be a single JSON object and not an array of objects.

Conditions

The conditions in a filter are related by boolean logic, either AND or OR. AND conditions look like {title: 'Hello World!'} while OR conditions looks like {$or: {title:Hello World!'}}. When the filter is an array of objects it's expected that the first element in the array is NOT an $or condition.

Each condition is made up of a property, a comparison operator and a value.

The simplest condition is an equals condition which is simply {title: Home}.

Other types of conditions are all written in a similar way. For example: less than is written like this {amount: {$lt: 5}}. A like comparison is written like this: {title: {$like: 'om'}}. Greater than or equals looks like this: {created_at: {$gte: '2016-01-01'}}.

See the wiki doc for comparison operators for the full set of operators.

Conditions containing an array for the value are interpreted as IN conditions. They will be converted to IN clauses when passed through gql.applyTo(...).

Negated Conditions

Conditions can be negated. All negative conditions are mapped in the following way: {$not: {featured: true}}.

Remember that the $not is always outside the property, condition and value. It's done this way because it allows the same string syntax (for GQL strings) and JSON structure for negation of nested filters as well as single conditions.

Nested Filters (operator precedence)

Conditions can be grouped together with their own relative boolean logic and then that set of conditions can be nested within another set of conditions. The nested set of conditions would be represented as an array. That would look like this: [{author_id: 5}, [{featured: true}, {$or: {created_at: {$gt: '2016-01-01'}}}]]. In this example the second element in the outer array is also an array which represents a clause containing two conditions related to one another by OR boolean logic.

Single character literals do not work

Originally reported in TryGhost/Ghost#8433

To reproduce:

On a local blog...
Create a post with the title A, the slug will be a. Publish the post.
Create a post with the title AB, the slug will be ab. Publish the post.
Visit the frontend of the blog to grab the client secret.
Open postman
Try doing a get request for http://localhost:2368/ghost/api/v0.1/posts?client_id=ghost-frontend&client_secret=[client-secret]&filter=slug:ab - see that it works
Try doing a get request for http://localhost:2368/ghost/api/v0.1/posts?client_id=ghost-frontend&client_secret=[client-secret]&filter=slug:a - see that it doesn't work

I believe this is due to the plus sign here: https://github.com/TryGhost/GQL/blob/master/src/gql.l#L22

The badcharsincnot regex matches a single character, and then the second group matches a second character. Therefore there must be at least 2 chars for a match.

Fixing this may have other implications, if so, we should close this issue and instead, add a rule that literals must be at least 2 characters to the documentation, provide tests to demo this limitation + if possible, improve the error message.

Revisit inconsistent true/false/null vs literal behaviour

Currently, there's a single commented out test in the 'Initial commit' PR:

https://github.com/TryGhost/GQL/pull/2/files#diff-40ba893e5d10c536e8f08afc0dceb9e6R84

Currently the Lexer treats true_thing as a literal, but true-thing as a true, a not and a literal. This slight inconsistency is somewhat annoying, and could be considered to be a bug, however I'm not sure how/if it can be mitigated and whether it matters enough to spend time on.

Note: surrounding both in quotes results in them correctly being treated as a string value - the string form should always behave slightly better than the literal form, I think that's to be expected?

Do we document that including true, false or null in literals is bad form / may have unexpected results, do we change the lexer to reliably treat these values as bad when in a literal, or is this a kink that can be ironed out?

GQL Spec

The detailed spec for the GQL package can be found here: TryGhost/Ghost#5604

Many-to-many conjunctive (AND operator) queries

This issue was already reported by @ErisDS on the main TryGhost/Ghost repository, issue reference:

TryGhost/Ghost#6158.

I actually really needed this feature to make my company's use-case work so I went ahead and implemented it in this branch of my fork:

Mickael-van-der-Beek/GQL#many-to-many-conjunctive-queries

And here's the diff compared to the current master branch:

https://github.com/TryGhost/GQL/compare/master...Mickael-van-der-Beek:many-to-many-conjunctive-queries?expand=1

As you can see the implementation is absolutely disgusting and that's also why I'm not trying to submit it as a PR. I just thought that maybe if someone worked on it in the future, they could use it a reference starting point.

A few thoughts about this implementation in no particular order:

it uses the slower SQL aggregation way of solving many-to-many queries (as discussed in the original issue as well as the StackOverflow answer)
it hard codes the SQL grouping column in the having() call
it assumes (as a shitty heuristic) that the WHERE ... IN () query is done on the column that is most used
will remove any other mentions of the chosen column from the query
has to use a module-global variable to store the having parameters as a workaround to scope limitations in the arguments passed to processFilter() (this one is not too important and can be fixed easily)

But it the end it passes the test suite and seems to work well. For the use-case we had (which was a hack in itself using Ghost's tagging system), neither Ghost nor Ghost's API are public facing so that meant that this solution was sufficient for now.

After thinking about the problem at hand during the day, I don't think that it's possible to automatically generate the SQL query if the parent application doesn't hint at which column specifically will be used for the WHERE ... IN () clause. Also, the HAVING clause has a hard dependency on the GROUPed column(s) which means that for Ghost's use-case where query composition is done it will be hard to implement it in a non-hardcoded way if GQL stays in it's current form.

Improve error messages returned from Lexer & Parser

Currently, the error output has not been modified at all - and is just the standard output from JISON. The output when an error occurs looks like this:

When there's an error in the Lexer (this means a symbol cannot be interpreted as meaning something)

Lexical error on line 1. Unrecognized text.
slug:-=
------^

When there's an error in the Parser (this means a all the symbols are understood, but something appears in an unexpected place)

Parse error on line 1:
slug:->
------^
Expecting 'LBRACKET', 'NULL', 'TRUE', 'FALSE', 'NUMBER', 'LITERAL', 'STRING', got 'GT'

These error messages could use some improvement.

First of all, there will only ever be one line, so it makes no sense to include the 'on line 1' part. Instead it would be more useful to say either nothing, or perhaps 'at character x', even though the diagram shows which character is causing the error.

Secondly, I don't know that it's particularly helpful to say Lexical error or Parse error - hardly anyone is going to get a benefit from this.

Instead I think the IDEAL forms of error should be something like:

Query Error: unrecognized text "=" in filter at char 7
slug:-=
------^

and

Query error, unexpected 'GT' in filter at char 7
slug:->
------^
Expecting one of: 'LBRACKET', 'NULL', 'TRUE', 'FALSE', 'NUMBER', 'LITERAL', 'STRING'

E.g. I think the error messages should both mention that they are Query or Filter errors (which is clearer?), one should mention 'unrecognized' and the other 'unexpected', but the different here is minor.

If possible, the Lexer error should also include the unrecognised symbol as shown.

If possible, we should say which char the error is at, or say nothing at all about that.

How:

I believe it is possible to add custom error handling to JISON parsers, by overriding the parseError method as shown here: https://gist.github.com/GerHobbelt/e18f2ef8ee575d4ff49d#file-parser-wrapper-js-L49

Doing this should allow for the modification of the text in the messages, but it doesn't give access to extra information about which char is the bad one - especially not for the Lexer error. It may be possible to parse this info out of the error message, but not sure it's worth it.

Maybe worth proposing an improvement to the hash returned from the Lexer error in JISON?

Filter for "not empty" columns

Hi @ErisDS

I'm trying to filter posts based on a column (called video) where I want all the posts where the video column is not empty. Actually, I tried using the video:-null approach, but that won't work for posts where I removed a given video, because the column is empty now instead of null.

Can't find a way to filter only for posts where a given column value isn't empty.

Can you please help?

Parser drops statement if it is and AND after OR

If we have the following GQL - author:joe,author:doe+tag:photo, when parsing, GQL will drop the last clause, resulting into:

{
   statements:[
      {
         op:'=',
         value:'joe',
         prop:'author'
      },
      {
         op:'=',
         value:'doe',
         prop:'author',
         func:'or'
      }
   ]
}

But if you'll go opposite way, by swapping it tag:photo+author:doe,author:joe, it will parse correctly:

{
   statements:[
      {
         op:'=',
         value:'photo',
         prop:'tag'
      },
      {
         op:'=',
         value:'doe',
         prop:'author',
         func:'and'
      },
      {
         op:'=',
         value:'joe',
         prop:'author',
         func:'or'
      }
   ]
}

Code used

const gql = require('./lib/gql');

console.log('author:joe,author:doe+tag:photo', gql.parse('author:joe,author:doe+tag:photo'));
console.log('author:joe,author:doe+(tag:photo)', gql.parse('author:joe,author:doe+(tag:photo)'));
console.log('(tag:photo)+author:doe,author:joe', gql.parse('(tag:photo)+author:doe,author:joe'));
console.log('tag:photo+author:doe,author:joe', gql.parse('tag:photo+author:doe,author:joe'));

Is it a bug, or intended behavior? Any thoughts?

Thank you!

Better string support (double quotes or escaped singles)

GQL works mostly on literals - strings that don't need to be quoted because they are obviously strings.

E.g. tag:photo, we don't need to do tag:'photo', it's totally redundant!

However, if a string contains a character that has another purpose in the GQL language, we have to use quotes to make it clear the whole thing is intended as a string.

E.g. in a date string published_at:<2017-09-01 12:45:12, both the space and the colons mean that we cannot use a literal, and we need to use quotes e.g. published_at:<'2017-09-01 12:45:12'.

This works absolutely wonderfully if the filter lives in a URL encoded URL string, or in JSON, but poses a problem in javascript code:

Example from the prev/next helper in Ghost:

  apiOptions = {
            include: 'author,tags',
            order: 'published_at ' + order,
            limit: 1,
            filter: "slug:-" + slug + "+published_at:" + op + "'" + publishedAt + "'", // jscs:ignore
        }

Normally we would use single quotes around the code, but we can't because this would clash with GQL.

In other languages, you'd switch to using double quotes, but GQL doesn't support this.

If we switch to using template literals:

  apiOptions = {
            filter: `slug:-${slug}+published_at:${op}'${publishedAt}'`, // jscs:ignore
}

The single quotes end up escaped like \' inside the string, which GQL also doesn't understand!!!

TODO:

At least support escaped single quotes
Consider supporting double quotes

Make resourceContext dynamic

Right now if Ghost wants to enable filtering for any other table, it's impossible, because the tables (post, user, tag) are static in GQL. The consequence is that, GQL will crash.

See https://github.com/TryGhost/GQL/blob/master/lib/context.js.

Temporary file that contains things that shouldn't be in this module
^ It says, it was a temporary solution anyway?

I would suggest we forward all available tables from Ghost to GQL. GQL can then add on top the the possible propAliases.

One possible solution could be to transform GQL into a class to instantiate all possible resource contexts.

e.g.

new GQL({tables: [.....]})

Date handling for GQL

At the moment, there is little-to-no consideration for date handling in GQL. There are a few mentions of dates in TryGhost/Ghost#5604. In particular the post-processing section under Implementation talks about handling dates after the parser is finished.

Date-like values can realistically appear in different forms:

A Number could also be a JavaScript timestamp
A Literal could be a valid ISO 8601 date like 2015-07-27T14:54:21
A String could contain a valid date in many different forms

It is very difficult to distinguish a date from another value at the lexer level, or to determine whether a String was intended to be a date or a string. The best way we have, therefore, to detect dates, is to assume that any value provided for a date-based property is a date.

That is, if the property name is created_at, updated_at, published_at, last_login or any other known dates and the value is a Number, a Literal or a String, we should try to turn the value into a date using new Date(). This provides us with a well understood set of recognised date formats, and should mean that passing dates from handlebars inside a Ghost theme will work.

E.g. {{#get "posts" filter="published_at: >'{{post.published_at}}'"}} (note the date is wrapped in single-quotes), without any date handling, will create a query in the form:

select * from "posts" where "posts"."published_at" > 'Sat Jun 27 2015 13:02:12 GMT+0100 (BST)'

And SQL doesn't know what to do with that date.

Instead we need to ensure that if we have a date that can be understood by new Date, then SQL is always passed a date in a format it understands.

We also need to ensure this works for all the supported SQL types: SQLite, MySQL and pg.

Support for empty literals, strings and groups

See: TryGhost/Ghost#9258 & TryGhost/Casper#382

I've seen this pop up a few times. If you use dynamic queries, there's no guarantee that you will get anything output, and then GQL will error.

Short term the solution is to put guards around dynamic queries, but I think we can do better in GQL.

Add linting

Would be good to get the same linting in place here as we have in Ghost.