totalhack / zillion Goto Github PK
View Code? Open in Web Editor NEWMake sense of it all. Semantic data modeling and analytics with a sprinkle of AI. https://totalhack.github.io/zillion/
License: MIT License
Make sense of it all. Semantic data modeling and analytics with a sprinkle of AI. https://totalhack.github.io/zillion/
License: MIT License
Currently a Warehouse can only be safely saved if it was created with reference to a config file and the Warehouse config was not changed in memory after init (if so it would be out of sync with the referenced file). This works OK in use cases where a config file is the master structure of the Warehouse and you aren't editing a Warehouse in memory, but it would be better if the Warehouse could reconstruct and save the current active config back to a specified file path in save() even if it wasn't created from a config file/url.
One caveat is if the Warehouse was created from a remote config file there may be no way to post changes back, so it would only be able to save a local file config without additional changes to try to support pushing warehouse changes to other locations (remote files, git, s3, etc).
One way or another this process should be cleaned up. Either the door needs to be closed on in-memory editing of a Warehouse config (a file is the only possible master and all changes go through that) or we need to support reconstructing and saving a config from the current Warehouse settings.
Currently using entity-attribute-value tables (see example image) requires creating views for each attribute and putting those views in your warehouse config as individual dimension tables. It would be nice if zillion had a new "attribute" table type that could automatically adjust the warehouse definition based on the attributes that are supported.
To support this we'd need to:
A more flexible method to chat with the warehouse including the ability to execute other warehouse methods besides just executing reports (which execute_text
only does). Example might be asking for a list of dimensions in table X, etc.
Currently only a parent-child or parent-sibling (for dimensions only) join model is supported in Zillion. This keeps things simpler, works for many analytics use-cases, and helps prevent introduction of "bad joins" that can throw off aggregation. It's worth investigating how many-to-many relationships might be supported via bridge tables.
Sometimes it's useful to filter one report based on the results of another. This theoretically would not be that hard to implement, something like [("some_field", "in_report", <report_id>)]
and a sub-report is spawned to get that result first. The sub report would have to have "some_field" as a dimension.
Zillion nlp allows using normal text to execute sql.
result = wh.execute_text("sales for Partner A")
print(result.df) # Pandas DataFrame
The above example shows a case of using the nlp capability to execute a query.
The execute_text
callable should be constructing an SQL command before it's executed.
Wondering if there if the raw SQL can be exposed before executing it.
Here is some pseudocode of what I'm thinking.
q = wh.create_nlp_q("sales Partner A")
print(q)
# SELECT sales from ...
Steampipe allows querying a variety of APIs as SQL, and provides sqlite extensions to help. In theory it should be possible to represent each API source as a zillion config and run reports against it as you would any other sql datasource. It would be nice if there was already metadata we could use to convert to zillion configs for each datasource. Also if there are any tools that can make the sqlite extension installation more seamless, so that it might be automatically handled if the extension is missing at warehouse init time.
https://steampipe.io/
https://til.simonwillison.net/sqlite/steampipe
Currently supported technical computations are defined in configs.py and added to TECHNICAL_CLASS_MAP which maps the technical name to a computation class. That map could be updated in place but it would be better to provide an API to manage which technical computations are supported.
A user questioned why we need the restriction of defining tables as either metric/fact tables or dimension tables. This organization is currently important to zillion's understanding of how to appropriately form queries and to somewhat following the data warehousing ideas outlined by Kimball. I think it's worth investigating a more flexible model, perhaps controlled/allowed by a mode flag in the zillion config, that doesn't require you label your tables as metric or dimension tables in your config and instead determines how the tables must act dynamically to satisfy a report request. In other words, when a metric is requested from a table treat it like a metric table for the scope of that report request, etc.
Disclaimer: this may be a bad idea. It's possible this could make it too easy for users to have zillion put together bad queries/reports based on a poorly defined warehouse structure. But if there are cases where this would be valuable then I would lean towards letting users utilize this at their own risk, assuming I can manage to fully understand and explain the caveats and gotchas.
Currently the datasource level queries just group by the dimensions numerically (i.e. group by 1,2,3 if there are 3 dimensions in use). There are cases where you might want to be able to customize the clause that gets used here, such as adjusting/coercing column collation in MySQL at query time. The flexibility should maybe be limited here though, as screwing up the group by logic when trying to do something more complex could lead to unexpected behavior/output that might be hard to diagnose at first.
Following the example tutorial from the docs, running into an error.
System:
Windows 10
Python3.11
qdrant docker-compose file being used
Code I ran:
from zillion import Warehouse
wh = Warehouse(config="https://raw.githubusercontent.com/totalhack/zillion/master/examples/example_wh_config.json")
Error
No ZILLION_CONFIG specified, using default settings
Traceback (most recent call last):
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 3366, in _wrap_pool_connect
return fn()
^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 327, in connect
return _ConnectionFairy._checkout(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 894, in _checkout
fairy = _ConnectionRecord.checkout(pool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 493, in checkout
rec = pool._do_get()
^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\impl.py", line 256, in _do_get
return self._create_connection()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 273, in _create_connection
return _ConnectionRecord(self)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 388, in __init__
self.__connect()
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 690, in __connect
with util.safe_reraise():
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\util\langhelpers.py", line 70, in __exit__
compat.raise_(
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\util\compat.py", line 211, in raise_
raise exception
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 686, in __connect
self.dbapi_connection = connection = pool._invoke_creator(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\create.py", line 574, in connect
return dialect.connect(*cargs, **cparams)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\default.py", line 598, in connect
return self.dbapi.connect(*cargs, **cparams)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: unable to open database file
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\yeman_s1h20q2\Yemane\zill\main.py", line 1, in <module>
from zillion import Warehouse
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\zillion\__init__.py", line 21, in <module>
from .datasource import DataSource
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\zillion\datasource.py", line 30, in <module>
from zillion.field import (
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\zillion\field.py", line 18, in <module>
from zillion.model import zillion_engine, DimensionValues
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\zillion\model.py", line 51, in <module>
zillion_metadata.create_all(zillion_engine)
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\sql\schema.py", line 4930, in create_all
bind._run_ddl_visitor(
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 3232, in _run_ddl_visitor
with self.begin() as conn:
^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 3148, in begin
conn = self.connect(close_with_result=close_with_result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 3320, in connect
return self._connection_cls(self, close_with_result=close_with_result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 96, in __init__
else engine.raw_connection()
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 3399, in raw_connection
return self._wrap_pool_connect(self.pool.connect, _connection)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 3369, in _wrap_pool_connect
Connection._handle_dbapi_exception_noconnection(
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 2203, in _handle_dbapi_exception_noconnection
util.raise_(
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\util\compat.py", line 211, in raise_
raise exception
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 3366, in _wrap_pool_connect
return fn()
^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 327, in connect
return _ConnectionFairy._checkout(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 894, in _checkout
fairy = _ConnectionRecord.checkout(pool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 493, in checkout
rec = pool._do_get()
^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\impl.py", line 256, in _do_get
return self._create_connection()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 273, in _create_connection
return _ConnectionRecord(self)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 388, in __init__
self.__connect()
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 690, in __connect
with util.safe_reraise():
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\util\langhelpers.py", line 70, in __exit__
compat.raise_(
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\util\compat.py", line 211, in raise_
raise exception
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\pool\base.py", line 686, in __connect
self.dbapi_connection = connection = pool._invoke_creator(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\create.py", line 574, in connect
return dialect.connect(*cargs, **cparams)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\yeman_s1h20q2\Yemane\zill\.venv\Lib\site-packages\sqlalchemy\engine\default.py", line 598, in connect
return self.dbapi.connect(*cargs, **cparams)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/14/e3q8)
I am using Pydigger to monitor recent uploads to PyPI that don't have any Continuous Integration (CI) system configured. A CI system can greatly improve the development experience by providing quick feedback to the developers and contributors, even for a toy or experimental project. As my contribution to open source (see why)I try to contribute a simple CI configuration to these projects to get started.
I've started to work on adding GitHub Action, I am going to report the issues here as I encounter them.
Currently natural language interfaces are limited to using existing field definitions. It might be useful to also allow a more direct form of querying that can produce arbitrary datasource level SQL formulas if an appropriate field doesn't exist to satisfy a request.
The NLP features leverage langchain under the hood, and I think there are example out there of using it to make a natural language interface to editing a DataFrame (which the Report has for output). It might be interesting to support further modifying the report data via natural language, or editing and re-running the report with natural language (assuming we don't cover that in a warehouse-level chat interface).
Currently there are methods to add a new Datasource
to an existing Warehouse
object but not to add a new table to an existing Datasource
. This would allow for more flexibility in how ad hoc tables can be combined with existing data. The current workaround is to just add any tables you want to stick around to your config file and recreate the Warehouse
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.