GithubHelp home page GithubHelp logo

evgskv / logica Goto Github PK

View Code? Open in Web Editor NEW
1.7K 37.0 89.0 5.42 MB

Logica is a logic programming language that compiles to SQL. It runs on Google BigQuery, PostgreSQL and SQLite.

Home Page: https://logica.dev

License: Apache License 2.0

Python 10.29% Lex 2.47% Jupyter Notebook 86.93% PicoLisp 0.06% Common Lisp 0.19% Vim Script 0.06%
datalog sql bigquery language prolog-implementation prolog logica postgresql sqlite trino

logica's People

Contributors

albinkc avatar artemworld avatar cuishuang avatar dee-guo avatar ericway1024 avatar evgskv avatar jesse-greene avatar ksyusha123 avatar odanoburu avatar rabraham avatar rvalek avatar sefgit avatar skvortsoves avatar smdesai avatar themplarer avatar wdduncan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

logica's Issues

Databricks SQL support

Based on recent work for clients, I for one would think support for Databricks SQL would be a huge deal. Seems to be used a little bit of everywhere.

I recently learned Databricks has teamed up with dbt to enable using their SQL templating tech for simplifying writing transformation logic, but Logica would be on a completely different level in my opinion.

Re-digest predicates from DB

I tried to pass one predicate to another with the following scripts on local PC

%%logica Child

@Engine("sqlite");

@AttachDatabase("mydata", "my_database.db");
@Dataset("ParentTable");

Parent(..r) :- mydata.ParentTable(..r);

Child(x):- Parent(child:x);

The Child table is supposed to be created successfully, but I encountered the error below

Running predicate: Child
--- SQL ---
SELECT
  JSON_EXTRACT(mydata_ParentTable, "$.child") AS col0
FROM
  mydata.ParentTable AS mydata_ParentTable
[ Error ] Error while executing SQL:
Execution failed on sql 'SELECT
  JSON_EXTRACT(mydata_ParentTable, "$.child") AS col0
FROM
  mydata.ParentTable AS mydata_ParentTable': no such column: mydata_ParentTable

PyCharm syntax highlighting

This is how syntax highlighting could be added into PyCharm:
https://www.jetbrains.com/help/pycharm/tutorial-using-textmate-bundles.html

In order to add syntax highlighting for a new file, it should be a TextMate Bundle.

According https://github.com/EvgSkv/logica/blob/main/syntax/vscode/DEV.md
src/logica.tmLanguage.yaml file is already written in TextMate grammar.

Probably, we could convert existing file to TextMate Bundle, so it would be possible to add syntax highlighting in PyCharm.
@EvgSkv , what do you think?

Import search path

In version 1.3.14, the import resolves relative to the current directory. We are trying to deploy Logica files embedded with our Python package, so they can be anywhere on sys.path. Would it be possible to change the parser so that a list of directories (like sys.path) could be provided for resolving import?

Sqlite: StructuredTest returns string instead of json object?

%%logica StructureTest
@Engine("sqlite");
StructureTest(a: {x: 1, y: 2, z: { w: "hello", v: "world"}});
StructureTest(a: {x: 3, y: 4, z: { w: "bonjour", v: "monde"}});
print(StructureTest['a'].values[0])
print(type(StructureTest['a'].values[0]))
print(StructureTest['a'].values[0]['z']['w']) # <===== This is the line in the tutorial that fails
{"x":1,"y":2,"z":{"w":"hello","v":"world"}}
<class 'str'>

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-14-dbaa4016c4ed> in <module>()
      2 print(StructureTest['a'].values[0])
      3 print(type(StructureTest['a'].values[0]))
----> 4 print(StructureTest['a'].values[0]['z']['w'])

TypeError: string indices must be integers

Improve parser whitespace robustness

This predicate parses:

Human(name: "Socrates", iq: 250, year_of_birth: -470, location: "Athens");

but this one doesn't, because it contains a tab:

Human(name: "Socrates", iq: 250, year_of_birth: -470, location: "Athens"	);

(example inspired by the Logica Postgres quick start)

tested with

logica test.l print Human

where test.l has the contents of the source blocks above.

EDIT: I hadn't initially realized that the problem was solely caused by tabs.

ResourceWarning: unclosed file

Version 1.3.14

We are using the Logica library in a Python application and are seeing the following warnings when running unittests for our Python application:

/usr/local/lib/python3.6/dist-packages/logica/parser_py/parse.py:1111: ResourceWarning: unclosed file <_io.TextIOWrapper name='***' mode='r' encoding='UTF-8'>
  file_content = open(file_path).read()

This is not blocking us from using Logica.

I would be willing to put up a PR with a fix (if desired).

Addressing issues of SQL

Hi Evgeny!
I'm a little bit involved into PostgreSQL development, so I decided to take a look on Logica.
Seems like an interesting piece of research, thanks!

There is a good post summarizing developers complains about SQL [0]. How do you think, does Logica address these issues?
Here's the list of SQL problems:
lack of proper orthogonality — SQL is hard to compose;
lack of compactness — SQL is a large language;
lack of consistency — SQL is inconsistent in syntax and semantics;
poor system cohesion — SQL does not integrate well enough with application languages and protocols.

Thanks!

[0] https://www.edgedb.com/blog/we-can-do-better-than-sql/

Make IN work in SQLite

Example of broken program:

logica - run Q <<<'Q(x) :- x in [1,3,5], Constraint(x in [1,2,3]); @Engine("sqlite");'

sqlite3.OperationalError: no such table: UNNEST

Too many references to "advised": max 65535

Hi,

When running the following script

%%logica Lowest_Common_Ancestors_with_Name
@Engine("sqlite");

@AttachDatabase("mgdb","mgdb.db");
@Dataset("advised");
@Dataset("person");
@Dataset("dissertation");

Adv_Stu(advisor:, student:author) :- Advised(did:x, advisor:),Dissertation(did:y, author:), x=y;

@Recursive(Anc,33);
Anc(ancestor:advisor, student:m) distinct :- Adv_Stu(advisor:, student:m), m=63244;
Anc(ancestor:x, student:) distinct:- Adv_Stu(advisor:x, student:y),Anc(ancestor:y, student:);

Anc(ancestor:advisor, student:l) distinct :- Adv_Stu(advisor:, student:l), l=119280;
Anc(ancestor:x, student:) distinct:- Adv_Stu(advisor:x, student:y),Anc(ancestor:y, student:);

Common_Ancestors(pid_1:m, pid_2:l, anc_id: x) :- Anc(ancestor:x, student:m), 
Anc(ancestor:x, student:l), m=63244,l=119280 ;

Not_Lowest_Common_Ancestors(pid_1:m, pid_2:l, anc_id: x) :- Common_Ancestors(pid_1:m, pid_2:l, anc_id: x), 
Common_Ancestors(pid_1:m, pid_2:l, anc_id: y), Adv_Stu(advisor:x, student:y),m=63244,l=119280;

Lowest_Common_Ancestors_with_Name(pid:x, name:name) :- Common_Ancestors(pid_1:m, pid_2:l, anc_id: x),
~Not_Lowest_Common_Ancestors(pid_1:m, pid_2:l, anc_id: x), Person(pid:x,name:name),m=63244,l=119280;

I will encounter an error below

---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/pandas/io/sql.py:2200, in SQLiteDatabase.execute(self, sql, params)
   2199 try:
-> 2200     cur.execute(sql, *args)
   2201     return cur

OperationalError: too many references to "advised": max 65535

The above exception was the direct cause of the following exception:

DatabaseError                             Traceback (most recent call last)
Cell In[13], line 1
----> 1 get_ipython().run_cell_magic('logica', 'Lowest_Common_Ancestors_with_Name', '\n@Engine("sqlite");\n\n@AttachDatabase("mgdb","mgdb.db");\n@Dataset("advised");\n@Dataset("person");\n@Dataset("dissertation");\n\nAdv_Stu(advisor:, student:author) :- Advised(did:x, advisor:),Dissertation(did:y, author:), x=y;\n\n@Recursive(Anc,33);\nAnc(ancestor:advisor, student:m) distinct :- Adv_Stu(advisor:, student:m), m=63244;\nAnc(ancestor:x, student:) distinct:- Adv_Stu(advisor:x, student:y),Anc(ancestor:y, student:);\n\nAnc(ancestor:advisor, student:l) distinct :- Adv_Stu(advisor:, student:l), l=119280;\nAnc(ancestor:x, student:) distinct:- Adv_Stu(advisor:x, student:y),Anc(ancestor:y, student:);\n\nCommon_Ancestors(pid_1:m, pid_2:l, anc_id: x) :- Anc(ancestor:x, student:m), \nAnc(ancestor:x, student:l), m=63244,l=119280 ;\n\nNot_Lowest_Common_Ancestors(pid_1:m, pid_2:l, anc_id: x) :- Common_Ancestors(pid_1:m, pid_2:l, anc_id: x), \nCommon_Ancestors(pid_1:m, pid_2:l, anc_id: y), Adv_Stu(advisor:x, student:y),m=63244,l=119280;\n\nLowest_Common_Ancestors_with_Name(pid:x, name:name) :- Common_Ancestors(pid_1:m, pid_2:l, anc_id: x),\n~Not_Lowest_Common_Ancestors(pid_1:m, pid_2:l, anc_id: x), Person(pid:x,name:name),m=63244,l=119280;\n')

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2478, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2476 with self.builtin_trap:
   2477     args = (magic_arg_s, cell)
-> 2478     result = fn(*args, **kwargs)
   2480 # The code below prevents the output from being displayed
   2481 # when using magics with decodator @output_can_be_silenced
   2482 # when the last Python token in the expression is a ';'.
   2483 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/colab_logica.py:128, in logica(line, cell)
    126 @register_cell_magic
    127 def logica(line, cell):
--> 128   Logica(line, cell, run_query=True)

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/colab_logica.py:273, in Logica(line, cell, run_query)
    269   else:
    270     raise Exception('Logica only supports BigQuery, PostgreSQL and SQLite '
    271                     'for now.')   
--> 273   result_map = concertina_lib.ExecuteLogicaProgram(
    274     executions, sql_runner=sql_runner, sql_engine=engine)
    276 for idx, predicate in enumerate(predicates):
    277   t = result_map[predicate]

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/common/concertina_lib.py:270, in ExecuteLogicaProgram(logica_executions, sql_runner, sql_engine, display_mode)
    267   sql_runner(preamble, sql_engine, is_final=False)
    269 concertina = Concertina(config, engine, display_mode=display_mode)
--> 270 concertina.Run()
    271 return engine.final_result

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/common/concertina_lib.py:97, in Concertina.Run(self)
     95 def Run(self):
     96   while self.actions_to_run:
---> 97     self.RunOneAction()

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/common/concertina_lib.py:90, in Concertina.RunOneAction(self)
     88 self.running_actions |= {one_action}
     89 self.UpdateDisplay()
---> 90 self.engine.Run(self.action[one_action].get('action', {}))
     91 self.running_actions -= {one_action}
     92 self.complete_actions |= {one_action}

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/common/concertina_lib.py:32, in ConcertinaQueryEngine.Run(self, action)
     30   print('Running predicate:', predicate, end='')
     31 start = datetime.datetime.now()
---> 32 result = self.sql_runner(action['sql'], action['engine'],
     33                          is_final=(predicate in self.final_predicates))
     34 end = datetime.datetime.now()
     35 if self.print_running_predicate:

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/colab_logica.py:173, in SqliteRunner.__call__(self, sql, engine, is_final)
    172 def __call__(self, sql, engine, is_final):
--> 173   return RunSQL(sql, engine, self.connection, is_final)

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/colab_logica.py:160, in RunSQL(sql, engine, connection, is_final)
    158     print(sql)
    159     ShowError("Error while executing SQL:\n%s" % e)
--> 160     raise e
    161   return None
    162 else:

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/logica/colab_logica.py:153, in RunSQL(sql, engine, connection, is_final)
    150 try:
    151   if is_final:
    152     # For final predicates this SQL is always a single statement.
--> 153     return pandas.read_sql(sql, connection)
    154   else:
    155     connection.executescript(sql)

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/pandas/io/sql.py:633, in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize, dtype_backend, dtype)
    631 with pandasSQL_builder(con) as pandas_sql:
    632     if isinstance(pandas_sql, SQLiteDatabase):
--> 633         return pandas_sql.read_query(
    634             sql,
    635             index_col=index_col,
    636             params=params,
    637             coerce_float=coerce_float,
    638             parse_dates=parse_dates,
    639             chunksize=chunksize,
    640             dtype_backend=dtype_backend,  # type: ignore[arg-type]
    641             dtype=dtype,
    642         )
    644     try:
    645         _is_table_name = pandas_sql.has_table(sql)

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/pandas/io/sql.py:2264, in SQLiteDatabase.read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize, dtype, dtype_backend)
   2253 def read_query(
   2254     self,
   2255     sql,
   (...)
   2262     dtype_backend: DtypeBackend | Literal["numpy"] = "numpy",
   2263 ) -> DataFrame | Iterator[DataFrame]:
-> 2264     cursor = self.execute(sql, params)
   2265     columns = [col_desc[0] for col_desc in cursor.description]
   2267     if chunksize is not None:

File ~/opt/miniconda3/envs/logica/lib/python3.11/site-packages/pandas/io/sql.py:2212, in SQLiteDatabase.execute(self, sql, params)
   2209     raise ex from inner_exc
   2211 ex = DatabaseError(f"Execution failed on sql '{sql}': {exc}")
-> 2212 raise ex from exc

DatabaseError: Execution failed on sql ....: too many references to "advised": max 65535

Support MySQL

We should add support for running programs on MySQL database.

Python eDSL

It would be really nice if there was a Python eDSL so that I could write queries in Python, which I have much better tooling for. 😄

PostgreSQL: syntax error at or near "AS" for tutorial StructureTest

Hi,
The following code errors out for psql.

# Setup
# Install Logica.
!pip install logica

# Install postgresql server.
!sudo apt-get -y -qq update
!sudo apt-get -y -qq install postgresql
!sudo service postgresql start

# Prepare database for Logica.
!sudo -u postgres psql -c "CREATE USER logica WITH SUPERUSER"
!sudo -u postgres psql -c "ALTER USER logica PASSWORD 'logica';"
!sudo -u postgres psql -U postgres -c 'CREATE DATABASE logica;'

# Connect to the database.
from logica import colab_logica
from sqlalchemy import create_engine
import pandas
engine = create_engine('postgresql+psycopg2://logica:[email protected]', pool_recycle=3600);
connection = engine.connect();
colab_logica.SetDbConnection(connection)
%%logica StructureTest

@Engine("psql");


StructureTest(a: {x: 1, y: 2, z: { w: "hello", v: "world"}});
StructureTest(a: {x: 3, y: 4, z: { w: "bonjour", v: "monde"}});

The Error:

The following query is stored at StructureTest_sql variable.
-- Initializing PostgreSQL environment.
set client_min_messages to warning;
create schema if not exists logica_test;

SELECT * FROM (
  
    SELECT
      STRUCT(1 AS x, 2 AS y, STRUCT('hello' AS w, 'world' AS v) AS z) AS a
   UNION ALL
  
    SELECT
      STRUCT(3 AS x, 4 AS y, STRUCT('bonjour' AS w, 'monde' AS v) AS z) AS a
  
) AS UNUSED_TABLE_NAME  ;

StructureTest

Running predicate: StructureTest

---------------------------------------------------------------------------

ProgrammingError                          Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
   1760                     self.dialect.do_execute_no_params(
-> 1761                         cursor, statement, context
   1762                     )

20 frames

ProgrammingError: syntax error at or near "AS"
LINE 4:       STRUCT(1 AS x, 2 AS y, STRUCT('hello' AS w, 'world' AS...
                       ^


The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/default.py in do_execute_no_params(self, cursor, statement, context)
    718 
    719     def do_execute_no_params(self, cursor, statement, context=None):
--> 720         cursor.execute(statement)
    721 
    722     def is_disconnect(self, e, connection, cursor):

ProgrammingError: (psycopg2.ProgrammingError) syntax error at or near "AS"
LINE 4:       STRUCT(1 AS x, 2 AS y, STRUCT('hello' AS w, 'world' AS...
                       ^

[SQL: SELECT * FROM (
  
    SELECT
      STRUCT(1 AS x, 2 AS y, STRUCT('hello' AS w, 'world' AS v) AS z) AS a
   UNION ALL
  
    SELECT
      STRUCT(3 AS x, 4 AS y, STRUCT('bonjour' AS w, 'monde' AS v) AS z) AS a
  
) AS UNUSED_TABLE_NAME  ]
(Background on this error at: http://sqlalche.me/e/14/f405)

Connecting to a SQLite/PostgreSQL database in a script

Logica is a wonderful alternative to SQL, and I would like to connect to SQLite/PostgreSQL database in a l script for data query. However currently there seems to be limited support for this.

Connection to a SQLite database file seems to have been already implemented but has gone undocumented. We could use the @AttachDatabase annotation:

@Engine("sqlite");
@AttachDatabase("db", "databaseFile.db");
Table(..r) :- `(db.tableName)`(..r);

and then run this script in local command line. But this usage does not appear anywhere in the tutorial and examples.

We have an example of connecting to a PosgreSQL database in ipynb files with the help of sqlalchemy library:

# Connect to the database.
from logica import colab_logica
from sqlalchemy import create_engine
import pandas
engine = create_engine('postgresql+psycopg2://logica:[email protected]', pool_recycle=3600);
connection = engine.connect();
colab_logica.SetDbConnection(connection)

which is great, but I'm not sure if this is possible in a script program.

Benchmark performance

It would be great to do a somewhat rigorous analysis of whether there are performance issues when query is written in Logica vs SQL.

FileNotFoundError: [Errno 2] No such file or directory: 'bq'

Whenever example command logica primes.l run Prime is executed, error is produced:
FileNotFoundError: [Errno 2] No such file or directory: 'bq'

It happens in docker container ubuntu:latest with python3.8.5 installed

Please see the log output:

root@d0094b59f905:/home# logica primes.l run Prime
Traceback (most recent call last):
  File "/usr/local/bin/logica", line 8, in <module>
    sys.exit(run_main())
  File "/usr/local/lib/python3.8/dist-packages/logica/logica.py", line 184, in run_main
    main(sys.argv)
  File "/usr/local/lib/python3.8/dist-packages/logica/logica.py", line 157, in main
    p = subprocess.Popen(['bq', 'query',
  File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'bq'

Engineers example broken under sqlite

Code:

%%logica Engineers
@Engine("sqlite");
Employee(name: "Alice", role: "Product Manager");
Employee(name: "Bob", role: "Engineer");
Employee(name: "Caroline", role: "Engineer");
Employee(name: "David", role: "Data Scientist");
Employee(name: "Eve", role: "Data Scientist");
Engineers(..r) :- Employee(..r), r.role == "Engineer";

Error:

OperationalError: no such column: Employee

I know that at this stage meaningful error messages may be a tall order but if it could perhaps print out the SQL statement it should be fairly easy to see what went wrong.

Also, sincerely thank you for this project: it is inspired.

Add checks of predicate arguments

It will take time to implement type inference.
In the meanwhile we could have a simpler check: when a predicate is called (including databases) and arguments are used, verify that the predicate has corresponding columns.

Imports in an imported file cannot be found

Version 1.3.14.15.92

Imports in an imported file cannot be found if they are relative to a path in import_root. They can only be found if they are relative to the directory Logica is initially called from.

To reproduce using integration_tests/import_root_test.l change the imports in integration_tests/import_tests/canada.l to be relative to the import_root used for the test:

import usa.ProductUSA;
import usa.SoftwareEngineer;

The test fails to find usa.l event though usa.l is on the import_root given to the test.

Expected ‘undefined’ error, but Logica compiles fine

A bit of context: I was translating TPC-H's 10th query, and forgot to put into scope the value of l_orderkey from the lineitem table. Logica thus took the l_orderkey == o_orderkey to mean an assignment, while I meant it as an equality test.

That Logica makes no syntactic distinction between the equality and assignment operators is explained in the Appendix of the Logica tutorial, but maybe this example shows why this can be confusing; if we had different operators the error would be clearer and easier to report to the user. Another option is to detect that l_orderkey goes unused, and therefore either it's a spurious assignment of which we might warn the user, or else it was meant as an equality test, but for some reason (a user mistake in this example) it can't be performed as intended.

@Engine("sqlite");

Query(c_custkey:, c_name:,
      revenue? += l_extendedprice * (1 - l_discount),
      c_acctbal:, n_name:, c_address:, c_phone:, c_comment:) distinct :-
  customer(c_custkey:, c_name:, c_acctbal:, c_address:,
           c_phone:, c_comment:, c_nationkey:),
  orders(o_orderkey:, o_custkey:, o_orderdate:),
  lineitem(l_extendedprice:, l_discount:, l_returnflag:),
  nation(n_nationkey:, n_name:),
  c_custkey == o_custkey,
  l_orderkey == o_orderkey,
  o_orderdate >= Date(cutoff_date),
  o_orderdate < Date(cutoff_date, "+3 month"),
  l_returnflag == "R",
  c_nationkey == n_nationkey,
  cutoff_date == "1993-10-01";
@OrderBy(Query, "revenue desc");
@Limit(Query, 20);

Improve tutorial

I am very curious about experience of newcomers with the tutorial. I am sure it can be improved to make onboarding easier.

CamelCase weirdness

👋 Hello! I'm having great fun playing around with Logica. Thanks for the project.

I run into something a bit strange and was wondering if it is a bug.

I have a fact and rule like so:

Tuple(namespace: "document", id: "1", rel: "writer", user: "id", userId: "badger", user_namespace: "", user_object_id: "");

List(namespace:, id:, rel:, user:, userId:, user_namespace:, user_object_id:) :-
    Tuple(namespace:, id:, rel:, user:, userId:, user_namespace:, user_object_id:);

and if I try to run this as is, it returns a parsing error:

Parsing:
List(namespace:, id:, rel:, user:, userId:, user_namespace:, user_object_id:) :-
    Tuple(namespace:, id:, rel:, user:, userId:, user_namespace:, user_object_id:)

[ Error ] Could not parse expression of a value.

If I parametrize userId (with an x) as in the following:

Tuple(namespace: "document", id: "1", rel: "writer", user: "id", userId: "badger", user_namespace: "", user_object_id: "");

List(namespace:, id:, rel:, user:, userId: x, user_namespace:, user_object_id:) :-
    Tuple(namespace:, id:, rel:, user:, userId: x, user_namespace:, user_object_id:);

then it works fine:

(env) $ logica zanzibar.l run List
+-----------+----+--------+--------+----------+----------------+----------------+
| namespace | id | rel    | user   | userId   | user_namespace | user_object_id |
+-----------+----+--------+--------+----------+----------------+----------------+
| document  | 1  | writer | id     | badger   |                |                |
+-----------+----+--------+--------+----------+----------------+----------------+

Similarly, if I don't parametrize, but use snake case user_id then it also works fine.

So, should I always just stick with snake_case?

Thanks for any insights 🙌

Documentation on -> operator

The tutorial uses at one point the -> operator.

# Finding most popular name for each year.
@OrderBy(TopNameByYear, "year");
TopNameByYear(year) ArgMax= name -> NameCountByYear(name:, year:);

While this seems to be a variant on the functional aggregation method, it's not documented anywhere. Is there a more complete language reference that can be linked?

Thanks!

How to translate case when?

In bigquery and firebase data.

SELECT user_pseudo_id,
 event_timestamp ,
CASE event_name
  WHEN 'screen_view' THEN 
         event_name || (SELECT param.value.string_value FROM UNNEST(event_params) AS param WHERE 
        param.key="firebase_screen_class") 
else event_name
END as name
 FROM `bq.x.x` 

order by event_timestamp

I can't find example for document In Logica tutorial.ipynb .

Can we support raw sql in logica? So we can write complicate sql in logica.

how to express array_agg(x order by t)?

In such query

GroupEvents1(user_pseudo_id, events? List= custom_event_name) distinct :-
    Events(user_pseudo_id:, custom_event_name:, event_timestamp:);

I want to get events with order by event_timestamp still find no answer in document?

It will be the same situation in Functional aggregation:

GroupEvents2(user_pseudo_id) List=  custom_event_name  :-
    Events(user_pseudo_id:, custom_event_name:, event_timestamp:);

hi! he did work in big quary?

hi, I have a question, does it work in big guary and will it work in the future? I have not figured it out yet to test it, but if it really works in the google cloud console, I will definitely start testing it in the near future

how to launch a subset query?

hi, do logica have subset function? I want to compare one set to another, and return a boolean value showing whether the two sets are equal or subset?
eg:
SUBSET({ }, { }) = TRUE
SUBSET({ 1, 2, 3 }, { }) = TRUE
SUBSET({ 1, 2 }, { 1, 2 }) = TRUE
SUBSET({ 1, 2, 3 }, { 1, 2 }) = TRUE
SUBSET({ 1, 3, 5 }, { 1, 2 }) = FALSE
many thanks!

How to use logica in MacOS?

Thanks for everyone to create this project.
I want to learn logica, but i don't known how to install logica.
I visit the README.md file, install Google Cloud SDK and clone https://github.com/EvgSkv/logica.
But i don't know how to use logica to run file.
Thank your very much.

PostgreSQL support

Most of logica works with PostgreSQL except for the following list:

  • Output of structures is not supported. E.g. with "psql" engine currently you can't do
    Q(contact: {name: "John", email: "[email protected]"});
    This is because PostgreSQL requires pre-defining types before using the structs and this is not implemented.
  • ArgMaxK is ignoring the limit, so picking a few maximal values from a very large set is impossible. We don't know how to fix this, as PostgreSQL does not support LIMIT in ARRAY_AGG.

Originally posted by @EvgSkv in #20 (comment)

Number of arguments of `TimestampSub()`

Since BigQuery's TIMESTAMP_SUB() has two arguments, I assume that TimestampSub() also has two arguments. However, the following predicate will result in an error.

A(t: TimestampSub(CurrentDate("Asia/Tokyo"), SqlExpr("INTERVAL 9 HOUR",{})));
Compiling:
A(t: TimestampSub(CurrentDate("Asia/Tokyo"), SqlExpr("INTERVAL 9 HOUR",{})))

[ Error ] Built-in function TimestampSub takes (3, 3) arguments, but 2 arguments were given.

Occasional race condition during compilation

Hi @EvgSkv!

Problem statement:

I am noticing a rather infrequent crash during compilation:

  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 457, in __init__
    rules = self.UnfoldRecursion(rules)
  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 506, in UnfoldRecursion
    annotations = Annotations(rules, {})
  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 138, in __init__
    self.annotations = self.ExtractAnnotations(
  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 396, in ExtractAnnotations
    ql = expr_translate.QL(Thrower(), ThrowException, ThrowException,
  File "/.venv/lib/python3.9/site-packages/logica/compiler/expr_translate.py", line 146, in __init__
    self.built_in_functions = copy.deepcopy(self.bulk_functions)
  File "/usr/local/lib/python3.9/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.9/copy.py", line 229, in _deepcopy_dict
    for key, value in x.items():
RuntimeError: dictionary changed size during iteration

(the line numbers might be a bit shifted because I caught this on a slightly outdated compiler version -- this shouldn't relevant here).

This is coming from inside the builtin copy.deepcopy, and indicates that the number of keys of the dict in self.bulk_functions was modified while it was being copied.
I am getting this in a python process that may concurrently run several compilation (LogicaProgram(parsed_logica, user_flags=flags).FormattedPredicateSql(...)) threads.

So, the guess is that one thread crashes during copy.deepcopy while another is somehow modified the same dict.


Sentiment:

I know that compiler thread-safety is nowhere guaranteed, so I would totally understand if you chose to disregard issue.

That said, I chose to post here because I am a little dumbfounded: I see this error periodically happening on this threaded service, but I can't reproduce it directly. I am also out of ideas for now as to what causes it, because going through the code I reach the following conclusions:

  1. If compilation in thread 2 is modifying the dict in self.bulk_functions that is simultaneously being copied in another compilation in thread 1, then the thread 2 is at a stage of compilation that is before or after making it's own copy.deepcopy.
  2. It can't be AFTER, because once thread 2 compilation has made it's own deepcopy, it will operate on it and will not touch the original.
  3. All operations that compiler does to the bulk_functions dict BEFORE making a deep copy must've already been done, inside the thread 1 compilation (because it has reached copy.deepcopy and crashed), and looking at the code I see that once bulk_functions is populated (prior to copying), it will not be changed.

So, I can't even put a finger or a specific place where thread 2 might be mutating bulk_functions during it's own compilation to cause a crash in thread 1's compilation at copy.deepcopy.

What am I missing? Do you have an idea where I should be looking? Does the reasoning make sense? Is the whole direction wrong?

P.S.

While analysing the above, I was trying to understand the relevance of CleanDictionary, as I see it happens sometimes, but I never actually reach deletion of any keys when I try to go step-by-step with the debugger. Either way, it shouldn't be related to the above issue because CleanDictionary operates on the deepcopy after it has been made, so it can't touch the original dict, right?

Any ideas on this subject would be appreciated!
Sorry for a tl;dr...

Graph did not work

I am using the current packages within Python 3.11.3 conda env

logica == 1.3.141592
graphviz == 0.20.1
jupyterlab == 3.5.3

and I want to execute

from logica.common import graph
graph.DirectedGraph(dataset)

In JupyterLab, I didn't receive any feedback. However, I did encounter an error in the console.

Uncaught ReferenceError: require is not defined
    at <anonymous>:2:5
    at t.attachWidget (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1594563)
    at t.insertWidget (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1593914)
    at j._insertOutput (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:925250)
    at j.onModelChanged (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:922427)
    at m (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1560565)
    at Object.l [as emit] (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1560225)
    at e.emit (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1558470)
    at c._onListChanged (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:918850)
    at m (jlab_core.6144df8e3fe03e33f880.js?v=6144df8e3fe03e33f880:2:1560565)

@EvgSkv Could you provide suggestions on how to address this error? Thanks : )

Support SparkSQL

Spark support would be really nice since it would allow to process big datasets on on-premise deployments.

External Functors

Functors in Logica are functions that map named tuples of predicates to a predicate.
Currently the only available functors are substitutions. Any predicate can work as a functor.

External Functors would be arbitrary executables that are running in the middle of Logica program execution. They would create new database tables from the ones that Logica already produced. Here is a toy high level example of what application of such
function would look like:

# LargestConnectedComponent takes in a graph and produces a graph
# that consists of a largest connected component.
# Note that this functor is in fact easy to implement in Logica, we use this example
# just because it's an intuitive problem.
@ExternalFunctor(
    LargestConnectedComponent,
    binary: "/path/to/extract_component");

# Building a traveling graph. Assuming Trips table that has source and destination of trips.
# There is an edge in traveling graph if there are more than 1000 trips between the cities.
TravelVolume(source, target) += 1 :- Trips(source:, target:);
@Ground(TravelEdge);
TravelEdge(source, target) :- TravelVolume(source, target) > 1000;

# This functor call results in execution of command like:
# /path/to/extract_component --args='{"Graph": "TravelEdge"} --dataset="logica_test"
# logica_test is a default dataset where Logica writes tables.
# The binary would read the logica_test.Graph dataset and write logica_test.LargestAreaGraph dataset.
# Logica would then associate logica_test.LargestAreaGraph table with LargestAreaGraph
# predicate. 
LargestAreaGraph := LargestConnectedComponent(Graph: TravelEdge);

NumCitiesInLargestComponent() Count= city :-
  city in [source, target], LargestAreaGraph(source, trarget);

To implement this we would need to implement new execution engine that would be doing these calls.

Estimated work time to implement: 3 - 5 weeks.

Even numbers (in tutorial) does not work for sqlite

Hi,
If I run this, I don't get even numbers.

%%logica Even
@Engine("sqlite");

Z(x) :- x in Range(20);

Even(x) :- Z(x), Z(x / 2);
print('Even numbers:', ', '.join(map(str, Even['col0'])))

Even numbers: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19

This is the generated SQL:

The following query is stored at Even_sql variable.
SELECT
  x_4.value AS col0
FROM
  JSON_EACH((select json_group_array(n) from (with recursive t as(select 0 as n union all select n + 1 as n from t where n + 1 < 20) select n from t))) as x_4, JSON_EACH((select json_group_array(n) from (with recursive t as(select 0 as n union all select n + 1 as n from t where n + 1 < 20) select n from t))) as x_6
WHERE
  (x_6.value = ((x_4.value) / (2)));

STRUCT columns

I have a table with a STRUCT column, e.g. STRUCT<x INT64, y INT64>. It looks like Logica variables can only be alphanumeric plus underscore. Is there a way to refer to the fields of a STRUCT? I tried using backticks for the variable name.

Add type inference

We would like to infer types of arguments of the predicates, including reading types of database columns.
Then we can display a native error for the user, rather then compiling an SQL and having user debug SQL error.

Integrating DuckDB

Hi,
I'm interested in using DuckDB(https://duckdb.org/) as a data source for Logica and I was wondering what would be the steps to integrate it. It's like SQLite for analytics and is a subset of the PostgreSQL syntax.

We could use sqlalchemy but 1) I don't know how to proceed after and 2) I think the ORM design of sqlalchemy may make it slow if we are returning a lot of data(just an hypothesis based on duckdb/duckdb#305)

So the question is two fold.

  1. If we use the sqlalchemy interface, the setup code would be something like this.
# Install Logica.
!pip install logica
!pip install duckdb
!pip install duckdb-engine



# Connect to the database.
from logica import colab_logica
from sqlalchemy import create_engine

engine = create_engine('duckdb:///:memory:');
connection = engine.connect();
colab_logica.SetDbConnection(connection)

but how do I use it after that?

  1. If I don't want to use the sqlachemy interface, how would I go about doing that? The general usage is:
import duckdb
con = duckdb.connect(database=':memory:', read_only=False) #can also use a file like sqlite
# create a table
con.execute("CREATE TABLE items(item VARCHAR, value DECIMAL(10,2), count INTEGER)")
# insert two items into the table
con.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")

# retrieve the items again
con.execute("SELECT * FROM items")
print(con.fetchall())
# [('jeans', 20.0, 1), ('hammer', 42.2, 2)]

Sqlite: Tutorial. OperationalError: no such function: APPROX_COUNT_DISTINCT

Hi,

%%logica OfficeColumn,Offices,EmployeesPerOffice,EmployeesOfOffice
@Engine("sqlite");
Employee(name: "Alice", role: "Product Manager", office: "SEA");
Employee(name: "Bob", role: "Engineer", office: "SEA");
Employee(name: "Caroline", role: "Engineer", office: "LAX");
Employee(name: "David", role: "Data Scientist", office: "LAX");
Employee(name: "Eve", role: "Data Scientist", office: "SEA");

OfficeColumn(office:) :- Employee(office:);

Offices(office:) distinct :- Employee(office:);

EmployeesPerOffice(office:,
                  num_employees? += 1,
                  role_count? Count=role) distinct :-
  Employee(name: unused_name, office:, role:);
  
EmployeesOfOffice(office:, employees? List= name) distinct :-
  Employee(name:, office:);
  

The logs are:

Running predicate: Offices
Running predicate: OfficeColumn
Running predicate: EmployeesOfOffice
Running predicate: EmployeesPerOffice

---------------------------------------------------------------------------

OperationalError                          Traceback (most recent call last)

<ipython-input-20-9c03d875318d> in <module>()
----> 1 get_ipython().run_cell_magic('logica', 'OfficeColumn,Offices,EmployeesPerOffice,EmployeesOfOffice', '@Engine("sqlite");\nEmployee(name: "Alice", role: "Product Manager", office: "SEA");\nEmployee(name: "Bob", role: "Engineer", office: "SEA");\nEmployee(name: "Caroline", role: "Engineer", office: "LAX");\nEmployee(name: "David", role: "Data Scientist", office: "LAX");\nEmployee(name: "Eve", role: "Data Scientist", office: "SEA");\n\nOfficeColumn(office:) :- Employee(office:);\n\nOffices(office:) distinct :- Employee(office:);\n\nEmployeesPerOffice(office:,\n                  num_employees? += 1,\n                  role_count? Count=role) distinct :-\n  Employee(name: unused_name, office:, role:);\n  \nEmployeesOfOffice(office:, employees? List= name) distinct :-\n  Employee(name:, office:);\n  ')

8 frames

/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2115             magic_arg_s = self.var_expand(line, stack_depth)
   2116             with self.builtin_trap:
-> 2117                 result = fn(magic_arg_s, cell)
   2118             return result
   2119 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in logica(line, cell)
    118 @register_cell_magic
    119 def logica(line, cell):
--> 120   Logica(line, cell, run_query=True)
    121 
    122 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in Logica(line, cell, run_query)
    239       sql_runner = RunSQL
    240     result_map = concertina_lib.ExecuteLogicaProgram(
--> 241       executions, sql_runner=sql_runner, sql_engine=engine)
    242 
    243   for idx, predicate in enumerate(predicates):

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in ExecuteLogicaProgram(logica_executions, sql_runner, sql_engine)
    215 
    216   concertina = Concertina(config, engine)
--> 217   concertina.Run()
    218   return engine.final_result

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in Run(self)
     76   def Run(self):
     77     while self.actions_to_run:
---> 78       self.RunOneAction()
     79 
     80   def ActionColor(self, a):

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in RunOneAction(self)
     69     self.running_actions |= {one_action}
     70     self.UpdateDisplay()
---> 71     self.engine.Run(self.action[one_action].get('action', {}))
     72     self.running_actions -= {one_action}
     73     self.complete_actions |= {one_action}

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in Run(self, action)
     18       print('Running predicate:', predicate)
     19       result = self.sql_runner(action['sql'], action['engine'],
---> 20                                is_final=(predicate in self.final_predicates))
     21       if predicate in self.final_predicates:
     22         self.final_result[predicate] = result

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in __call__(self, sql, engine, is_final)
    159   # TODO: Sqlite runner should not be accepting an engine.
    160   def __call__(self, sql, engine, is_final):
--> 161     return RunSQL(sql, engine, self.connection, is_final)
    162 
    163 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in RunSQL(sql, engine, connection, is_final)
    142   elif engine == 'sqlite':
    143     statements = parse.SplitRaw(sql, ';')
--> 144     connection.executescript(sql)
    145     if is_final:
    146       return pandas.read_sql(statements[-1], connection)

OperationalError: no such function: APPROX_COUNT_DISTINCT

Sqlite: Custom Aggregation function returning none.

Hi,
Re: the following code in the tutorial.

@Engine("sqlite");
AnonymizedCodeContribution(cl_lengths: [110, 220, 405], org: "ads");
AnonymizedCodeContribution(cl_lengths: [30, 51, 95], org: "ads");
AnonymizedCodeContribution(cl_lengths: [10, 20, 1000], org: "games");

HarmonicMean(x) = Sum(1) / Sum(1 / x);

OrgStats(
    org:,
    mean_cl_size? Avg= cl_size,
    harmonic_mean_cl_size? HarmonicMean= cl_size) distinct :-
  AnonymizedCodeContribution(cl_lengths:, org:),
  cl_size in cl_lengths;

The output from BigQuery has values for harmonic_mean_cl_size



  | org | mean_cl_size | harmonic_mean_cl_size
-- | -- | -- | --
ads | 151.833333 | 75.402468
games | 343.333333 | 19.867550


but when I run the above code for sqlite, I get none for the same column



  | org | mean_cl_size | harmonic_mean_cl_size
-- | -- | -- | --
ads | 151.833333 | None
games | 343.333333 | None


Generated SQL Query:

The following query is stored at OrgStats_sql variable.
WITH t_0_AnonymizedCodeContribution AS (SELECT * FROM (
  
    SELECT
      JSON_ARRAY(110, 220, 405) AS cl_lengths,
      'ads' AS org
   UNION ALL
  
    SELECT
      JSON_ARRAY(30, 51, 95) AS cl_lengths,
      'ads' AS org
   UNION ALL
  
    SELECT
      JSON_ARRAY(10, 20, 1000) AS cl_lengths,
      'games' AS org
  
) AS UNUSED_TABLE_NAME  )
SELECT
  AnonymizedCodeContribution.org AS org,
  AVG(x_5.value) AS mean_cl_size,
  ((SUM(1)) / (SUM(((1) / (x_5.value))))) AS harmonic_mean_cl_size
FROM
  t_0_AnonymizedCodeContribution AS AnonymizedCodeContribution, JSON_EACH(AnonymizedCodeContribution.cl_lengths) as x_5
GROUP BY org;

Tutorial: Possibly outdated documentation on intentional error (but it doesn't error out)

Hi,
In the tutorial it mentions,
Code:

%%logica Engineers
@Engine("psql");

Employee(name: "Alice", role: "Product Manager", office: "SEA");
Employee(name: "Bob", role: "Engineer", office: "SEA");
Employee(name: "Caroline", role: "Engineer", office: "LAX");
Employee(name: "David", role: "Data Scientist", office: "LAX");
Employee(name: "Eve", role: "Data Scientist", office: "SEA");

IsEngineer(record) :- record.role == "Engineer";
  
Engineers(..r) :- Employee(..r), IsEngineer(r);

image

I don't get any error :) as mentioned above. Also if you see the above screenshot(and in the tutorial which I didn't run), there is a successful output. I feel like I'm missing something.

Sqlite: whole row syntax gives error. "OperationalError: no such column"

Hi,

%%logica Engineers

@Engine("sqlite");

Employee(name: "Alice", role: "Product Manager");
Employee(name: "Bob", role: "Engineer");
Employee(name: "Caroline", role: "Engineer");
Employee(name: "David", role: "Data Scientist");
Employee(name: "Eve", role: "Data Scientist");

Engineers(..r) :- Employee(..r), r.role == "Engineer";  # <====== This fails
# Engineers(name:, role:) :- Employee(name:, role:) , role == "Engineer"  <===== This works
The following query is stored at Engineers_sql variable.
WITH t_0_Employee AS (SELECT * FROM (
  
    SELECT
      'Alice' AS name,
      'Product Manager' AS role
   UNION ALL
  
    SELECT
      'Bob' AS name,
      'Engineer' AS role
   UNION ALL
  
    SELECT
      'Caroline' AS name,
      'Engineer' AS role
   UNION ALL
  
    SELECT
      'David' AS name,
      'Data Scientist' AS role
   UNION ALL
  
    SELECT
      'Eve' AS name,
      'Data Scientist' AS role
  
) AS UNUSED_TABLE_NAME  )
SELECT
  Employee.*
FROM
  t_0_Employee AS Employee
WHERE
  (JSON_EXTRACT(Employee, "$.role") = 'Engineer');

Engineers

Running predicate: Engineers

---------------------------------------------------------------------------

OperationalError                          Traceback (most recent call last)

<ipython-input-19-db61baf7c59b> in <module>()
----> 1 get_ipython().run_cell_magic('logica', 'Engineers', '\n@Engine("sqlite");\n\nEmployee(name: "Alice", role: "Product Manager");\nEmployee(name: "Bob", role: "Engineer");\nEmployee(name: "Caroline", role: "Engineer");\nEmployee(name: "David", role: "Data Scientist");\nEmployee(name: "Eve", role: "Data Scientist");\n\nEngineers(..r) :- Employee(..r), r.role == "Engineer";\n# Engineers(name:, role:) :- Employee(name:, role:) , role == "Engineer"')

8 frames

/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2115             magic_arg_s = self.var_expand(line, stack_depth)
   2116             with self.builtin_trap:
-> 2117                 result = fn(magic_arg_s, cell)
   2118             return result
   2119 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in logica(line, cell)
    118 @register_cell_magic
    119 def logica(line, cell):
--> 120   Logica(line, cell, run_query=True)
    121 
    122 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in Logica(line, cell, run_query)
    239       sql_runner = RunSQL
    240     result_map = concertina_lib.ExecuteLogicaProgram(
--> 241       executions, sql_runner=sql_runner, sql_engine=engine)
    242 
    243   for idx, predicate in enumerate(predicates):

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in ExecuteLogicaProgram(logica_executions, sql_runner, sql_engine)
    215 
    216   concertina = Concertina(config, engine)
--> 217   concertina.Run()
    218   return engine.final_result

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in Run(self)
     76   def Run(self):
     77     while self.actions_to_run:
---> 78       self.RunOneAction()
     79 
     80   def ActionColor(self, a):

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in RunOneAction(self)
     69     self.running_actions |= {one_action}
     70     self.UpdateDisplay()
---> 71     self.engine.Run(self.action[one_action].get('action', {}))
     72     self.running_actions -= {one_action}
     73     self.complete_actions |= {one_action}

/usr/local/lib/python3.7/dist-packages/logica/common/concertina_lib.py in Run(self, action)
     18       print('Running predicate:', predicate)
     19       result = self.sql_runner(action['sql'], action['engine'],
---> 20                                is_final=(predicate in self.final_predicates))
     21       if predicate in self.final_predicates:
     22         self.final_result[predicate] = result

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in __call__(self, sql, engine, is_final)
    159   # TODO: Sqlite runner should not be accepting an engine.
    160   def __call__(self, sql, engine, is_final):
--> 161     return RunSQL(sql, engine, self.connection, is_final)
    162 
    163 

/usr/local/lib/python3.7/dist-packages/logica/colab_logica.py in RunSQL(sql, engine, connection, is_final)
    142   elif engine == 'sqlite':
    143     statements = parse.SplitRaw(sql, ';')
--> 144     connection.executescript(sql)
    145     if is_final:
    146       return pandas.read_sql(statements[-1], connection)

OperationalError: no such column: Employee

Aggregating an Ordered Set in BigQuery — ARRAY_AGG(DISTINCT ... ORDER BY ...) is not support! 😭

I'm trying to generate a set ordered by a another value, but I'm really struggling to pull it off. I'm not sure how to work around BigQuery's limitations, after trying quite a few attempts. Without doing some clever work around for BigQuery not supporting ARRAY_AGG(DISTINCT ... ORDER BY ...), the only other way I see is JOINing another select after the GROUP BY is preformed, but I'm also struggling to figure out how to pull that off in Logica as well. Anyways, love the project! Overall it's been really cool! 🚀

Code

Here's the most straight forward attempt

%%logica Test

@Engine("bigquery");

OrderedSet(a) = SqlExpr("ARRAY_AGG(DISTINCT {value} ORDER BY {arg})", {value: a.value, arg: a.arg});

TestOrderedSet(g:, ids_by_t? Array= t -> id) distinct :-
  Data(g:, id:, t:);

Data(g: 1, id: 1, t: 5);
Data(g: 1, id: 2, t: 4);
Data(g: 1, id: 2, t: 4);
Data(g: 1, id: 3, t: 3);
Data(g: 1, id: 3, t: 3);
Data(g: 1, id: 4, t: 2);
Data(g: 1, id: 5, t: 1);

Data(g: 2, id: 10, t: 1);
Data(g: 2, id: 10, t: 1);
Data(g: 2, id: 20, t: 2);
Data(g: 2, id: 20, t: 0);
Data(g: 2, id: 30, t: 3);

Expected(g: 1, ids_by_t: [5, 4, 3, 2, 1]);
Expected(g: 2, ids_by_t: [20, 10, 30]);
Expected(g: 3, ids_by_t: [20, 10, 30]);

Test(g:, actual:, expected:) :-
  TestOrderedSet(g:, ids_by_t: actual),
  Expected(g:, ids_by_t: expected);

Output

And the error 😢

BadRequest: 400 An aggregate function that has both DISTINCT and ORDER BY arguments can only ORDER BY expressions that are arguments to the function at [77:39]

                   -----Query Job SQL Follows-----                   

    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:WITH t_2_Data AS (SELECT * FROM (
   2:  
   3:    SELECT
   4:      1 AS g,
   5:      1 AS id,
   6:      5 AS t
   7:   UNION ALL
   8:  
   9:    SELECT
  10:      1 AS g,
  11:      2 AS id,
  12:      4 AS t
  13:   UNION ALL
  14:  
  15:    SELECT
  16:      1 AS g,
  17:      2 AS id,
  18:      4 AS t
  19:   UNION ALL
  20:  
  21:    SELECT
  22:      1 AS g,
  23:      3 AS id,
  24:      3 AS t
  25:   UNION ALL
  26:  
  27:    SELECT
  28:      1 AS g,
  29:      3 AS id,
  30:      3 AS t
  31:   UNION ALL
  32:  
  33:    SELECT
  34:      1 AS g,
  35:      4 AS id,
  36:      2 AS t
  37:   UNION ALL
  38:  
  39:    SELECT
  40:      1 AS g,
  41:      5 AS id,
  42:      1 AS t
  43:   UNION ALL
  44:  
  45:    SELECT
  46:      2 AS g,
  47:      10 AS id,
  48:      1 AS t
  49:   UNION ALL
  50:  
  51:    SELECT
  52:      2 AS g,
  53:      10 AS id,
  54:      1 AS t
  55:   UNION ALL
  56:  
  57:    SELECT
  58:      2 AS g,
  59:      20 AS id,
  60:      2 AS t
  61:   UNION ALL
  62:  
  63:    SELECT
  64:      2 AS g,
  65:      20 AS id,
  66:      0 AS t
  67:   UNION ALL
  68:  
  69:    SELECT
  70:      2 AS g,
  71:      30 AS id,
  72:      3 AS t
  73:  
  74:) AS UNUSED_TABLE_NAME  ),
  75:t_0_TestOrderedSet AS (SELECT
  76:  Data.g AS g,
  77:  ARRAY_AGG(DISTINCT Data.id ORDER BY Data.t) AS ids_by_t
  78:FROM
  79:  t_2_Data AS Data
  80:GROUP BY g),
  81:t_3_Expected AS (SELECT * FROM (
  82:  
  83:    SELECT
  84:      1 AS g,
  85:      ARRAY[5, 4, 3, 2, 1] AS ids_by_t
  86:   UNION ALL
  87:  
  88:    SELECT
  89:      2 AS g,
  90:      ARRAY[20, 10, 30] AS ids_by_t
  91:   UNION ALL
  92:  
  93:    SELECT
  94:      3 AS g,
  95:      ARRAY[20, 10, 30] AS ids_by_t
  96:  
  97:) AS UNUSED_TABLE_NAME  )
  98:SELECT
  99:  TestOrderedSet.g AS g,
 100:  TestOrderedSet.ids_by_t AS actual,
 101:  Expected.ids_by_t AS expected
 102:FROM
 103:  t_0_TestOrderedSet AS TestOrderedSet, t_3_Expected AS Expected
 104:WHERE
 105:  (Expected.g = TestOrderedSet.g)
    |    .    |    .    |    .    |    .    |    .    |    .    |

Generate SQL

And the generated SQL for the lazy

WITH t_2_Data AS (SELECT * FROM (
  
    SELECT
      1 AS g,
      1 AS id,
      5 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      2 AS id,
      4 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      2 AS id,
      4 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      3 AS id,
      3 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      3 AS id,
      3 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      4 AS id,
      2 AS t
   UNION ALL
  
    SELECT
      1 AS g,
      5 AS id,
      1 AS t
   UNION ALL
  
    SELECT
      2 AS g,
      10 AS id,
      1 AS t
   UNION ALL
  
    SELECT
      2 AS g,
      10 AS id,
      1 AS t
   UNION ALL
  
    SELECT
      2 AS g,
      20 AS id,
      2 AS t
   UNION ALL
  
    SELECT
      2 AS g,
      20 AS id,
      0 AS t
   UNION ALL
  
    SELECT
      2 AS g,
      30 AS id,
      3 AS t
  
) AS UNUSED_TABLE_NAME  ),
t_0_TestOrderedSet AS (SELECT
  Data.g AS g,
  ARRAY_AGG(DISTINCT Data.id ORDER BY Data.t) AS ids_by_t
FROM
  t_2_Data AS Data
GROUP BY g),
t_3_Expected AS (SELECT * FROM (
  
    SELECT
      1 AS g,
      ARRAY[5, 4, 3, 2, 1] AS ids_by_t
   UNION ALL
  
    SELECT
      2 AS g,
      ARRAY[20, 10, 30] AS ids_by_t
   UNION ALL
  
    SELECT
      3 AS g,
      ARRAY[20, 10, 30] AS ids_by_t
  
) AS UNUSED_TABLE_NAME  )
SELECT
  TestOrderedSet.g AS g,
  TestOrderedSet.ids_by_t AS actual,
  Expected.ids_by_t AS expected
FROM
  t_0_TestOrderedSet AS TestOrderedSet, t_3_Expected AS Expected
WHERE
  (Expected.g = TestOrderedSet.g);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.