tokern / data-lineage Goto Github PK
View Code? Open in Web Editor NEWGenerate and Visualize Data Lineage from query history
Home Page: https://tokern.io/data-lineage/
License: MIT License
Generate and Visualize Data Lineage from query history
Home Page: https://tokern.io/data-lineage/
License: MIT License
changed the CATALOG_PASSWORD,CATALOG_USER, CATALOG_DB, CATALOG_HOST accordingly and ran this command docker-compose -f tokern-lineage-engine.yml up.
Throwing me an error
return self.dbapi.connect(*cargs, **cparams)
tokern-data-lineage | File "/opt/pysetup/.venv/lib/python3.8/site-packages/psycopg2/init.py", line 122, in connect
tokern-data-lineage | conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
tokern-data-lineage | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "-xxxxxxx.amazonaws.com" to address: Temporary failure in name resolution
As per the lines below, job_execution.status
will always be JobExecutionStatus.SUCCESS
. Is this intentional?
https://github.com/tokern/data-lineage/blob/master/data_lineage/server.py#L311-L313
It would be neat
Also select *
queries are not supported. I file a new issue for that.
Originally posted by @vrajat in #42 (comment)
I was able to successfully load catalog using dbcat but I'm geting the following error when I tried to parse queries using a file in json format(I also tried the given test file)
File "~/Python/3.8/lib/python/site-packages/data_lineage/parser/init.py", line 124, in parse
name = str(hash(sql))
TypeError: unhashable type: 'dict'
Here's line 124:
data-lineage/data_lineage/parser/__init__.py
Line 124 in f347484
Code executed:
from dbcat import catalog_connection
from data_lineage.parser import parse_queries, visit_dml_query
import json
with open("queries2.json", "r") as file:
queries = json.load(file)
catalog_conf = """
catalog:
user:test
password: t@st
host: 127.0.0.1
port: 5432
database: postgres
"""
catalog = catalog_connection(catalog_conf)
parsed = parse_queries(queries)
visited = visit_dml_query(catalog, parsed)
Hello!
i use this docs
https://tokern.io/docs/data-lineage/installation
1 curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose-demodb/docker-compose.yml -o docker-compose.yml
2 docker-compose up -d
3 get error
ERROR: In file './docker-compose.yml', the services name 404 must be a quoted string, i.e. '404'.
Trying to follow: https://tokern.io/docs/data-lineage/example
... to get data lineage from snowflake
Using this example...
from data_lineage import Parser
parser = Parser(docker_address)
for query in queries:
print(query)
parser.parse(**query, source=source)
I get (using 0.8.3 of data-lineage, 3.8 of python in an isolated venv): cannot import name 'Parser' from 'data_lineage'
Looks like it's scanning now but getting lots of
tokern-catalog | 2021-10-14 13:52:05.252 UTC [36] DETAIL: Key (source_id, name)=(142, foo) already exists.
tokern-catalog | 2021-10-14 13:52:05.252 UTC [36] STATEMENT: INSERT INTO schemata (name, source_id) VALUES ('foo', 142) RETURNING schemata.id
tokern-catalog | 2021-10-14 13:52:08.597 UTC [36] ERROR: duplicate key value violates unique constraint "unique_schema_name"
tokern-catalog | 2021-10-14 13:52:08.597 UTC [36] DETAIL: Key (source_id, name)=(142, bar) already exists.
tokern-catalog | 2021-10-14 13:52:08.597 UTC [36] STATEMENT: INSERT INTO schemata (name, source_id) VALUES ('bar', 142) RETURNING schemata.id
tokern-catalog | 2021-10-14 13:52:08.675 UTC [36] ERROR: duplicate key value violates unique constraint "unique_schema_name"
... which could be because we have the same schema names across different databases. Might be able to ignore this because we're only concerned with one database.
Another issue though (can open a sep ticket if desired) is that while the above keeps running, I get a 504 Server Error: Gateway Time-out for url: http://127.0.0.1:8000/api/v1/catalog/scanner
- different error than the gunicorn one from previous... is this nginx timing out now instead of gunicorn?
Originally posted by @peteclark3 in #75 (comment)
Hi,
i am trying to connect data_lineage to an external postgresql database but am receiving 400 BAD Request error when calling /api/v1/catalog/sources end point. we deployed data_lineage using docker. Following the example, we pass:
edw_db = { "username": "<external postgres username>", "password": "somepassw0rd|", "uri": "<external postgresql hostname>", "port": "<external postgresql port>", "database": "<external postgresql database>" }
to
source = catalog.add_source(name="edw", source_type="postgresql", **edw_db)
but it seems we get the error here. Please help.
I am working the docker compose file. When I select the output node , I meed the only input linked to it in column_datalineage table should be highlighted , not all the inputs linked to the load node
Hi,
I have deployed tokern data lineage using docker as stated in your documentations. I connect data lineage to an external postgresql database so I overwritten the CATALOG_PASSWORD, CATALOG_USER, CATALOG_DB and CATALOG_HOST variable.
I have written a python script which grabs all queries logged in postgresql and parse it to tokern data lineage. However, I keep getting the following error:
503 Server Error: Service Unavailable for url: https://<host ip>:8000/api/v1/catalog/sources
this is calling the following module:
source = catalog.add_source(name="edw", source_type="postgresql", **edw_db)
The logs do not throw any error even when log level is set to debug. Please help. Thank You.
Currently it doesn't appear that the dml_visitor will walk through the common table expressions to build the lineage. Am I interpreting this wrong? Within vistor.py line 45 and 61 both visit the "with clause". There doesn't seem to be any functionality for handling the commontableexpr or ctes within the parsed statments. This causes any statements with ctes to throw an error when calling parse_queries, as no table is found when attempting to bind a CTE in a FROM clause.
It would be nice to have support of more databases. May be a generic JDBC way or including pluggable 3rd party drivers.
Any possibility to look out about Google BigQuery?
BigQuery already have query history feature, and we can retrieve it from bigquery logs exported table.
Hi,
I am working on a POC and have created a metadata for teradata i will convert that metadata to the demo mock json which was in kedro viz modular example. Json will be all formatted and parsed already. I just need quickbits how to load this json and display in kedro-viz.
Sorry to bother everyone but am in hurry i have all the metadata and regarding kedro and kedro-viz i learned yesterday only so a quick shortcut cheat code kind of thing I need.
Thanks,
from data_lineage import Analyze, Catalog
...
Traceback (most recent call last):
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/cobolbaby/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
cli.main()
File "/home/cobolbaby/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
run()
File "/home/cobolbaby/.vscode/extensions/ms-python.python-2021.11.1422169775/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/media/cobolbaby/data/ubuntu/opt/workspace/git/lineage/analyse.py", line 3, in <module>
from data_lineage import Analyze, Catalog
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/site-packages/data_lineage/__init__.py", line 10, in <module>
from dbcat.catalog.models import JobExecutionStatus
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/site-packages/dbcat/__init__.py", line 7, in <module>
from dbcat.catalog import Catalog
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/site-packages/dbcat/catalog/__init__.py", line 3, in <module>
from .catalog import Catalog
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/site-packages/dbcat/catalog/catalog.py", line 9, in <module>
from dbcat.catalog.models import (
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/site-packages/dbcat/catalog/models.py", line 5, in <module>
from snowflake.sqlalchemy import URL
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/site-packages/snowflake/sqlalchemy/__init__.py", line 25, in <module>
from . import base, snowdialect
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/site-packages/snowflake/sqlalchemy/base.py", line 17, in <module>
from .custom_commands import AWSBucket, AzureContainer, ExternalStage
File "/opt/workspace/anaconda2/envs/tf21/lib/python3.6/site-packages/snowflake/sqlalchemy/custom_commands.py", line 14, in <module>
from sqlalchemy.sql.roles import FromClauseRole
ModuleNotFoundError: No module named 'sqlalchemy.sql.roles'
This is probably due to SQLAlchemy's requirement of mysqlclient, but when doing
pip install data-lineage
The following is seen
Collecting mysqlclient<3,>=1.3.6
Using cached mysqlclient-2.1.1.tar.gz (88 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [16 lines of output]
/bin/sh: mysql_config: command not found
/bin/sh: mariadb_config: command not found
/bin/sh: mysql_config: command not found
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/private/var/folders/th/yz4tb0ss5t3_4df1xnfrkg3r0000gn/T/pip-install-auypdvbk/mysqlclient_42a825d5ee084d6686c16912ef8320cc/setup.py", line 15, in <module>
metadata, options = get_config()
File "/private/var/folders/th/yz4tb0ss5t3_4df1xnfrkg3r0000gn/T/pip-install-auypdvbk/mysqlclient_42a825d5ee084d6686c16912ef8320cc/setup_posix.py", line 70, in get_config
libs = mysql_config("libs")
File "/private/var/folders/th/yz4tb0ss5t3_4df1xnfrkg3r0000gn/T/pip-install-auypdvbk/mysqlclient_42a825d5ee084d6686c16912ef8320cc/setup_posix.py", line 31, in mysql_config
raise OSError("{} not found".format(_mysql_config_path))
OSError: mysql_config not found
mysql_config --version
mariadb_config --version
mysql_config --libs
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
Installing mysql client fixes it.
Since you are using SQLAlchemy, this is out of your hands but this issue is to suggest maybe add a note to that effect in the docs?
calling
analyze.analyze(**{"query":query}, source=dl_source, start_time=datetime.now(), end_time=datetime.now())
with a large query, I get a "request too long" - seems that even though it is POSTing, it's still appending the query to the URL, thus the request fails.. e.g.
tokern-data-lineage-visualizer | 10.10.0.1 - - [14/Oct/2021:14:39:00 +0000] "POST /api/v1/analyze?query=ANY_REALLY_LONG_QUERY_HERE
After fixing the issue #33 it is still failing. Note that I am using Snowflake.
import datetime
end_time = datetime.datetime.now()
start_time = end_time - datetime.timedelta(days=7)
query = f"""
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('{start_time.isoformat()}'),
end_time_range_end=>to_timestamp_ltz('{end_time.isoformat()}')));
"""
cursors = conn.execute_string(
sql_text=query
)
queries = []
for cursor in cursors:
for row in cursor:
print(f"{row[0]}")
queries.append(row[0])
This shows query history as follows.
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-25T13:46:32.544154'),
end_time_range_end=>to_timestamp_ltz('2021-05-02T13:46:32.544154')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-25T13:46:20.237862'),
end_time_range_end=>to_timestamp_ltz('2021-05-02T13:46:20.237862')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-25T13:45:18.371513'),
end_time_range_end=>to_timestamp_ltz('2021-05-02T13:45:18.371513')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-25T13:44:27.187499'),
end_time_range_end=>to_timestamp_ltz('2021-05-02T13:44:27.187499')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-24T07:25:55.213431'),
end_time_range_end=>to_timestamp_ltz('2021-05-01T07:25:55.213431')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-24T07:25:23.433387'),
end_time_range_end=>to_timestamp_ltz('2021-05-01T07:25:23.433387')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-24T07:10:29.311609'),
end_time_range_end=>to_timestamp_ltz('2021-05-01T07:10:29.311609')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-24T07:03:48.882660'),
end_time_range_end=>to_timestamp_ltz('2021-05-01T07:03:48.882660')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-24T07:02:13.962780'),
end_time_range_end=>to_timestamp_ltz('2021-05-01T07:02:13.962780')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-24T07:02:03.205936'),
end_time_range_end=>to_timestamp_ltz('2021-05-01T07:02:03.205936')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-01 00:00:00 +0800'),
end_time_range_end=>to_timestamp_ltz('2021-05-01 00:00:00 +0800')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-30 23:59:59 +0800'),
end_time_range_end=>to_timestamp_ltz('2021-04-26 00:00:00 +0800')));
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('2021-04-30 23:59:59 +0800'),
end_time_range_end=>to_timestamp_ltz('2021-04-6 00:00:00 +0800')));
put file:☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺.csv @-/staged;
PUT file:☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺.csv @-/staged;
COPY INTO tmp1
FROM @ANALYTICS_CUSTOM_PH/tmp1.csv
FILE_FORMAT = (
TYPE = CSV SKIP_HEADER = 1
);
LIST @ANALYTICS_CUSTOM_PH;
LIST @ANALYTICS_CUSTOM;
CREATE OR REPLACE TABLE tmp1(a INT, b STRING);
SHOW GRANTS TO USER identifier('"YOHEI"');
SELECT * FROM identifier('"SINGLIFE"."ANALYTICS_CUSTOM"."TMP1"') LIMIT 100;
COPY INTO tmp1
FROM @ANALYTICS_CUSTOM/tmp1.csv
FILE_FORMAT = (
TYPE = CSV SKIP_HEADER = 1
);
COPY INTO tmp1
FROM @ANALYTICS_CUSTOM/tmp1.csv
HEADER;
COPY INTO tmp1
FROM @ANALYTICS_CUSTOM/tmp1.csv
HEADER = TRUE;
COPY INTO tmp1
FROM @ANALYTICS_CUSTOM/tmp1.csv
FILE_FORMAT = (TYPE = CSV)
HEADER = TRUE;
COPY INTO tmp1
FROM @ANALYTICS_CUSTOM/tmp1.csv
FILE_FORMAT = (TYPE = CSV HEADER = TRUE);
COPY INTO tmp1
FROM @ANALYTICS_CUSTOM/tmp1.csv;
LIST @ANALYTICS_CUSTOM;
SHOW STAGES LIKE 'ANALYTICS_CUSTOM_PH' IN SCHEMA "SINGLIFE"."ANALYTICS_CUSTOM_PH"
SHOW STAGES LIKE 'ANALYTICS_CUSTOM' IN SCHEMA "SINGLIFE"."ANALYTICS_CUSTOM"
DESCRIBE STAGE "SINGLIFE"."ANALYTICS_CUSTOM"."ANALYTICS_CUSTOM"
DESCRIBE STAGE "SINGLIFE"."ANALYTICS_CUSTOM_PH"."ANALYTICS_CUSTOM_PH"
ALTER STAGE "SINGLIFE"."ANALYTICS_CUSTOM_PH"."ANALYTICS_CUSTOM_PH" SET URL = 's3://singlife-data-pf-sandbox-dev/analytics_custom_ph/'
ALTER STAGE "SINGLIFE"."ANALYTICS_CUSTOM"."ANALYTICS_CUSTOM" SET URL = 's3://singlife-data-pf-sandbox-dev/analytics_custom/'
SHOW GRANTS OF ROLE "DATA_ENGINEERING_ADVANCED"
SHOW GRANTS OF ROLE "DATA_ANALYST_ADVANCED_PH"
SHOW GRANTS OF ROLE "DATA_ANALYST_BASE_PH"
SHOW GRANTS OF ROLE "DATA_ANALYST_ADVANCED"
SHOW GRANTS OF ROLE "SNOWPIPE"
SHOW GRANTS OF ROLE "DEPLOY_ADMIN"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."LANDING_REALTIME"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."ANALYTICS"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_CUSTOM_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_CUSTOM_PH"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."ANALYTICS"
SHOW FUTURE GRANTS IN SCHEMA "SINGLIFE"."ANALYTICS"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_CUSTOM_PH"
SHOW GRANTS ON SCHEMA "SINGLIFE"."ANALYTICS_CUSTOM_PH"
Then trying to parse queries
from data_lineage.parser import parse_queries, visit_dml_queries
# Parse all queries
parsed = parse_queries(queries)
# Visit the parse trees to extract source and target queries
visited = visit_dml_queries(catalog, parsed)
# Create a graph and visualize it
from data_lineage.parser import create_graph
graph = create_graph(catalog, visited)
import plotly
plotly.offline.iplot(graph.fig())
Got this error.
---------------------------------------------------------------------------
ParseError Traceback (most recent call last)
<ipython-input-12-151c67ea977c> in <module>
2
3 # Parse all queries
----> 4 parsed = parse_queries(queries)
5
6 # Visit the parse trees to extract source and target queries
/opt/conda/lib/python3.8/site-packages/data_lineage/parser/__init__.py in parse_queries(queries)
17
18 def parse_queries(queries: List[str]) -> List[Parsed]:
---> 19 return [parse(query) for query in queries]
20
21
/opt/conda/lib/python3.8/site-packages/data_lineage/parser/__init__.py in <listcomp>(.0)
17
18 def parse_queries(queries: List[str]) -> List[Parsed]:
---> 19 return [parse(query) for query in queries]
20
21
/opt/conda/lib/python3.8/site-packages/data_lineage/parser/node.py in parse(sql, name)
319 if name is None:
320 name = str(hash(sql))
--> 321 node = AcceptingNode(parse_sql(sql))
322
323 return Parsed(name, node)
pglast/parser.pyx in pglast.parser.parse_sql()
ParseError: syntax error at or near "table", at location 24
$ data_lineage --catalog-user xxx --catalog-password yyy
Traceback (most recent call last):
File "/opt/homebrew/bin/data_lineage", line 5, in <module>
from data_lineage.__main__ import main
File "/opt/homebrew/lib/python3.9/site-packages/data_lineage/__main__.py", line 7, in <module>
from data_lineage.server import create_server
File "/opt/homebrew/lib/python3.9/site-packages/data_lineage/server.py", line 5, in <module>
import flask_restless
File "/opt/homebrew/lib/python3.9/site-packages/flask_restless/__init__.py", line 22, in <module>
from .manager import APIManager # noqa
File "/opt/homebrew/lib/python3.9/site-packages/flask_restless/manager.py", line 24, in <module>
from flask import Blueprint
File "/opt/homebrew/lib/python3.9/site-packages/flask/__init__.py", line 14, in <module>
from jinja2 import escape
File "/opt/homebrew/lib/python3.9/site-packages/jinja2/__init__.py", line 12, in <module>
from .environment import Environment
File "/opt/homebrew/lib/python3.9/site-packages/jinja2/environment.py", line 25, in <module>
from .defaults import BLOCK_END_STRING
File "/opt/homebrew/lib/python3.9/site-packages/jinja2/defaults.py", line 3, in <module>
from .filters import FILTERS as DEFAULT_FILTERS # noqa: F401
File "/opt/homebrew/lib/python3.9/site-packages/jinja2/filters.py", line 13, in <module>
from markupsafe import soft_unicode
ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/opt/homebrew/lib/python3.9/site-packages/markupsafe/__init__.py)
Looks like that was removed in 2.1.0. You may want to specify markupsafe==2.0.1
.
I'm adding a snowflake source as follows.. where sf_db_name
is my database name e.g. snowfoo
(verified in debugger)...
source = catalog.add_source(name=f"sf1_{time.time_ns()}", source_type="snowflake", database=sf_db_name, username=sf_username, password=sf_password, account=sf_account, role=sf_role, warehouse=sf_warehouse)
... but when it goes to scan, it looks like the code thinks my database name is 'prod':
tokern-data-lineage | sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 002003 (02000): SQL compilation error:
tokern-data-lineage | Database 'PROD' does not exist or not authorized.
tokern-data-lineage | [SQL:
tokern-data-lineage | SELECT
tokern-data-lineage | lower(c.column_name) AS col_name,
tokern-data-lineage | c.comment AS col_description,
tokern-data-lineage | lower(c.data_type) AS col_type,
tokern-data-lineage | lower(c.ordinal_position) AS col_sort_order,
tokern-data-lineage | lower(c.table_catalog) AS database,
tokern-data-lineage | lower(c.table_catalog) AS cluster,
tokern-data-lineage | lower(c.table_schema) AS schema,
tokern-data-lineage | lower(c.table_name) AS name,
tokern-data-lineage | t.comment AS description,
tokern-data-lineage | decode(lower(t.table_type), 'view', 'true', 'false') AS is_view
tokern-data-lineage | FROM
tokern-data-lineage | prod.INFORMATION_SCHEMA.COLUMNS AS c
tokern-data-lineage | LEFT JOIN
tokern-data-lineage | prod.INFORMATION_SCHEMA.TABLES t
tokern-data-lineage | ON c.TABLE_NAME = t.TABLE_NAME
tokern-data-lineage | AND c.TABLE_SCHEMA = t.TABLE_SCHEMA
tokern-data-lineage | ;
tokern-data-lineage | ]
tokern-data-lineage | (Background on this error at: http://sqlalche.me/e/13/f405)
.. I'm trying to look through the tokern code repos to see where the disconnect might be happening, but not sure yet...
I tried using the latest docker file..when I tried to execute the sample notebook, it gave me the following error:
Traceback (most recent call last):
File "~/Packages/User/lin_test.py", line 27, in <module>
source = catalog.add_source(name="dev", source_type="redshift", **wikimedia_db)
File "~/Library/Python/3.8/lib/python/site-packages/data_lineage/__init__.py", line 379, in add_source
payload = self._post(path="sources", data=data, type="sources")
File "~/Library/Python/3.8/lib/python/site-packages/data_lineage/__init__.py", line 202, in _post
response.raise_for_status()
File "/Library/Python/3.8/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: http://127.0.0.1:8000/api/v1/catalog/sources
2021/07/09 16:45:09 [error] 25#25: *12 connect() failed (113: Host is unreachable) while connecting to upstream, client:
import psycopg2
File "/opt/pysetup/.venv/lib/python3.8/site-packages/psycopg2/__init__.py", line 51, in <module>
from psycopg2._psycopg import ( # noqa
ImportError: libpq.so.5: cannot open shared object file: No such file or directory
On a side note, I'm planning on doing the server implementation without using docker, I thought I'll try to see if I can achieve using the docker file first.
Originally posted by @siva-mudiyanur in #57 (comment)
HI team
We are planning to run Data Lineage tool against AWS Redshift to generate column level lineage.
In the demo we can see that the only option to connect to source database is by using username and password which is not inline with our security policy hence raising this request here for any options like tokens of AWS KMS that we can use to login to AWS redshift without using username and password in python code.
Also is there any option where we do NOT login to source database and generate lineage from the queries file ,like can we download the DDL and tool uses that DDL for query parsing rather than using an active connection to the source database.
Appreciate your response on this and please let me know if more information is required on this.
Hi, I'm trying to install the 0.8 version on a docker that runs on Debian Buster and when it runs the pip command to install it prints the following warning/error:
#12 9.444 Collecting data-lineage==0.8.0 (from -r /project/requirements.txt (line 25))
#12 9.466 Could not find a version that satisfies the requirement data-lineage==0.8.0 (from -r /project/requirements.txt (line 25)) (from versions: 0.1.2, 0.2.0, 0.3.0, 0.5.1, 0.5.2, 0.6.0, 0.7.0)
#12 9.541 No matching distribution found for data-lineage==0.8.0 (from -r /project/requirements.txt (line 25))
Is this normal behavior? Do I have to add something before trying to install?
The docker image doesn’t run successfully. And I can’t run init or runserver command
python:3.8.1-slim is old and has 21 security vulnerabilites.
At least consider swapping to python:3.8-slim.
When I add my snowflake DB for scanning, using this bit of code (with the values replaced as per my snowflake database):
from data_lineage import Catalog
catalog = Catalog(docker_address)
# Register wikimedia datawarehouse with data-lineage app.
source = catalog.add_source(name="wikimedia", source_type="postgresql", **wikimedia_db)
# Scan the wikimedia data warehouse and register all schemata, tables and columns.
catalog.scan_source(source)
... I get
tokern-data-lineage-visualizer | 2021/10/08 21:51:40 [error] 34#34: *1 upstream prematurely closed connection while reading response header from upstream, client: 10.10.0.1, server: , request: "POST /api/v1/catalog/scanner HTTP/1.1", upstream: "http://10.10.0.3:4142/api/v1/catalog/scanner", host: "127.0.0.1:8000"
... I think it's because snowflake isn't returning fast enough... but I'm not sure. Tried updating the warehouse size to large to make the scan faster, but getting the same thing. Seems like it times out pretty fast... at least for my large database. Any ideas?
Python 3.8.0 in an isolated venv, 0.8.3 data-lineage. Thanks for this package!
I am trying to use this example:
https://tokern.io/docs/data-lineage/queries
... first issue... this bit of code looks like it's just going to fetch a single row from the query history from snowflake:
queries = []
with connection.get_cursor() as cursor:
cursor.execute(query)
row = cursor.fetchone()
while row is not None:
queries.append(row[0])
... is this intended? Note that it's using .fetchone()
Then.. second issue... when I go back to the example here: https://tokern.io/docs/data-lineage/example
I see this bit of code...
analyze = Analyze(docker_address)
for query in queries:
print(query)
analyze.analyze(**query, source=source, start_time=datetime.now(), end_time=datetime.now())
... what does the queries
array look like? Or better yet, what does the single query
item look like? Above it, in the example, it looks to be a JSON payload....
with open("queries.json", "r") as file:
queries = json.load(file)
.... but I've no idea what the payload is supposed to look like.
I've tried 8 different ways of passing this **query
variable into analyze(...)
- using the results from the snowflake example on https://tokern.io/docs/data-lineage/queries - but I can never seem to get it right. Either I get an error saying that ** expects a mapping when I use strings or tuples (which is fine, but what's the mapping the function expects?) - or I get an error in the API console itself like
tokern-data-lineage | raise ValueError('Bad argument, expected a ast.Node instance or a tuple')
tokern-data-lineage | ValueError: Bad argument, expected a ast.Node instance or a tuple
.. could we get a more concrete snowflake example, or at the bare minimum please indicate what the query variable is supposed to look like?
Note that I am also trying to inspect the unit tests and use those as examples, but still not getting very far.
Thanks for this package!
Followed the example and did a docker-compose up -d and got the error
Traceback (most recent call last):
File "urllib3/connectionpool.py", line 670, in urlopen
File "urllib3/connectionpool.py", line 392, in _make_request
File "http/client.py", line 1255, in request
File "http/client.py", line 1301, in _send_request
File "http/client.py", line 1250, in endheaders
File "http/client.py", line 1010, in _send_output
File "http/client.py", line 950, in send
File "docker/transport/unixconn.py", line 43, in connect
FileNotFoundError: [Errno 2] No such file or directory
Are the instructions correct?
Opening this issue to let everyone know till it gets fixed...the installation for pglast fails requiring a xxhash.h file. Here's the link to issue and how to resolve it: lelit/pglast#82
Please feel free to close if you think its inappropriate
Trying out a demo, I tried to run catalog.scan_source(source)
. But that does not exist. After some digging, it looks like this works:
from data_lineage import Scan
Scan('http://127.0.0.1:8000').start(source)
Please fix the demo pages.
Couple of issues:
Traceback (most recent call last):
File "~/Library/Application Support/Sublime Text 3/Packages/User/lin_test.py", line 46, in <module>
catalog.update_source(source,schema)
File "~/Library/Python/3.8/lib/python/site-packages/data_lineage/__init__.py", line 489, in update_source
attributes=payload["attributes"],
KeyError: 'attributes'
Code executed:
source = catalog.get_source("rs")
schema = catalog.get_schema("rs", "test")
catalog.update_source(source,schema)
Last line from visualizer log:
10.10.0.1 - - [16/Jul/2021:15:53:08 +0000] "POST /api/v1/parse?query=<query>&source_id=7 HTTP/1.1" 200 181 "-" "python-requests/2.25.1"
*removed the query as its pretty long
Originally posted by @siva-mudiyanur in #57 (comment)
I tried run exapmple
https://github.com/tokern/data-lineage/blob/master/api_example.ipynb
Could you tell me how i can fix it
Thank you
Steps to reproduce:
This can't currently be run on ARM hardware without being in emulation mode. Given Mac M1 and AWS EC2 Graviton, it would be nice to have ARM support.
I'm trying to run the data lineage wikimedia demo but I'm running into an error:
Traceback (most recent call last):
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/georgebezerra/.vscode/extensions/ms-python.python-2021.12.1559732655/pythonFiles/lib/python/debugpy/main.py", line 45, in
cli.main()
File "/Users/georgebezerra/.vscode/extensions/ms-python.python-2021.12.1559732655/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
run()
File "/Users/georgebezerra/.vscode/extensions/ms-python.python-2021.12.1559732655/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str("main"))
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/georgebezerra/Dev/demo.py", line 19, in
source = catalog.add_source(name="wikimedia", source_type="postgresql", **wikimedia_db)
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/site-packages/data_lineage/init.py", line 319, in add_source
payload = self._post(path="sources", data=data, type="sources")
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/site-packages/data_lineage/init.py", line 144, in _post
return response.json()["data"]
KeyError: 'data'
The Docker piece seems to be running fine except of the tokern worker who is returning the following message:
/docker-entrypoint.sh: 11: exec: rq: not found
This is running on macbook pro with M1 chip.
Hi, I am trying to parse query history from Snowflake on Jupyter notebook.
data lineage version 0.3.0
!pip install snowflake-connector-python[secure-local-storage,pandas] data-lineage
import datetime
end_time = datetime.datetime.now()
start_time = end_time - datetime.timedelta(days=7)
query = f"""
SELECT query_text
FROM table(information_schema.query_history(
end_time_range_start=>to_timestamp_ltz('{start_time.isoformat()}'),
end_time_range_end=>to_timestamp_ltz('{end_time.isoformat()}')));
"""
cursors = conn.execute_string(
sql_text=query
)
queries = []
for cursor in cursors:
for row in cursor:
print(row[0])
queries.append(row[0])
from data_lineage.parser import parse_queries, visit_dml_queries
# Parse all queries
parsed = parse_queries(queries)
# Visit the parse trees to extract source and target queries
visited = visit_dml_queries(catalog, parsed)
# Create a graph and visualize it
from data_lineage.parser import create_graph
graph = create_graph(catalog, visited)
import plotly
plotly.offline.iplot(graph.fig())
Then I got this error. Would you help me find the root cause?
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-33-151c67ea977c> in <module>
----> 1 from data_lineage.parser import parse_queries, visit_dml_queries
2
3 # Parse all queries
4 parsed = parse_queries(queries)
5
ImportError: cannot import name 'parse_queries' from 'data_lineage.parser' (/opt/conda/lib/python3.8/site-packages/data_lineage/parser/__init__.py)
We are having a requirement to parse the SQL queries that has MERGE operations - is there a plan to have a functionality to incorporate this ?
I see that this error log got appended on lineage log after I posted the previous comment( would have took about ~3-5 mins)
ERROR:data_lineage.server:Exception on /api/main [GET]
Traceback (most recent call last):
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/pysetup/.venv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/pysetup/.venv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/opt/pysetup/.venv/lib/python3.8/site-packages/flask_restful/__init__.py", line 467, in wrapper
resp = resource(*args, **kwargs)
File "/opt/pysetup/.venv/lib/python3.8/site-packages/flask/views.py", line 89, in view
return self.dispatch_request(*args, **kwargs)
File "/opt/pysetup/.venv/lib/python3.8/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
resp = meth(*args, **kwargs)
File "/opt/pysetup/.venv/lib/python3.8/site-packages/data_lineage/server.py", line 69, in get
column_edges = self._catalog.get_column_lineages(args["job_ids"])
File "/opt/pysetup/.venv/lib/python3.8/site-packages/dbcat/catalog/catalog.py", line 299, in get_column_lineages
return query.all()
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3373, in all
return list(self)
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
return self._execute_and_instances(context)
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/opt/pysetup/.venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[SQL: SELECT column_lineage.id AS column_lineage_id, column_lineage.context AS column_lineage_context, column_lineage.source_id AS column_lineage_source_id, column_lineage.target_id AS column_lineage_target_id, column_lineage.job_execution_id AS column_lineage_job_execution_id, sources_1.id AS sources_1_id, sources_1.source_type AS sources_1_source_type, sources_1.name AS sources_1_name, sources_1.dialect AS sources_1_dialect, sources_1.uri AS sources_1_uri, sources_1.port AS sources_1_port, sources_1.username AS sources_1_username, sources_1.password AS sources_1_password, sources_1.database AS sources_1_database, sources_1.instance AS sources_1_instance, sources_1.cluster AS sources_1_cluster, sources_1.project_id AS sources_1_project_id, sources_1.project_credentials AS sources_1_project_credentials, sources_1.page_size AS sources_1_page_size, sources_1.filter_key AS sources_1_filter_key, sources_1.included_tables_regex AS sources_1_included_tables_regex, sources_1.key_path AS sources_1_key_path, sources_1.account AS sources_1_account, sources_1.role AS sources_1_role, sources_1.warehouse AS sources_1_warehouse, schemata_1.id AS schemata_1_id, schemata_1.name AS schemata_1_name, schemata_1.source_id AS schemata_1_source_id, tables_1.id AS tables_1_id, tables_1.name AS tables_1_name, tables_1.schema_id AS tables_1_schema_id, columns_1.id AS columns_1_id, columns_1.name AS columns_1_name, columns_1.data_type AS columns_1_data_type, columns_1.sort_order AS columns_1_sort_order, columns_1.table_id AS columns_1_table_id, sources_2.id AS sources_2_id, sources_2.source_type AS sources_2_source_type, sources_2.name AS sources_2_name, sources_2.dialect AS sources_2_dialect, sources_2.uri AS sources_2_uri, sources_2.port AS sources_2_port, sources_2.username AS sources_2_username, sources_2.password AS sources_2_password, sources_2.database AS sources_2_database, sources_2.instance AS sources_2_instance, sources_2.cluster AS sources_2_cluster, sources_2.project_id AS sources_2_project_id, sources_2.project_credentials AS sources_2_project_credentials, sources_2.page_size AS sources_2_page_size, sources_2.filter_key AS sources_2_filter_key, sources_2.included_tables_regex AS sources_2_included_tables_regex, sources_2.key_path AS sources_2_key_path, sources_2.account AS sources_2_account, sources_2.role AS sources_2_role, sources_2.warehouse AS sources_2_warehouse, schemata_2.id AS schemata_2_id, schemata_2.name AS schemata_2_name, schemata_2.source_id AS schemata_2_source_id, tables_2.id AS tables_2_id, tables_2.name AS tables_2_name, tables_2.schema_id AS tables_2_schema_id, columns_2.id AS columns_2_id, columns_2.name AS columns_2_name, columns_2.data_type AS columns_2_data_type, columns_2.sort_order AS columns_2_sort_order, columns_2.table_id AS columns_2_table_id, jobs_1.id AS jobs_1_id, jobs_1.name AS jobs_1_name, jobs_1.context AS jobs_1_context, jobs_1.source_id AS jobs_1_source_id, job_executions_1.id AS job_executions_1_id, job_executions_1.job_id AS job_executions_1_job_id, job_executions_1.started_at AS job_executions_1_started_at, job_executions_1.ended_at AS job_executions_1_ended_at, job_executions_1.status AS job_executions_1_status
FROM column_lineage LEFT OUTER JOIN columns AS columns_1 ON columns_1.id = column_lineage.source_id LEFT OUTER JOIN tables AS tables_1 ON tables_1.id = columns_1.table_id LEFT OUTER JOIN schemata AS schemata_1 ON schemata_1.id = tables_1.schema_id LEFT OUTER JOIN sources AS sources_1 ON sources_1.id = schemata_1.source_id LEFT OUTER JOIN columns AS columns_2 ON columns_2.id = column_lineage.target_id LEFT OUTER JOIN tables AS tables_2 ON tables_2.id = columns_2.table_id LEFT OUTER JOIN schemata AS schemata_2 ON schemata_2.id = tables_2.schema_id LEFT OUTER JOIN sources AS sources_2 ON sources_2.id = schemata_2.source_id LEFT OUTER JOIN job_executions AS job_executions_1 ON job_executions_1.id = column_lineage.job_execution_id LEFT OUTER JOIN jobs AS jobs_1 ON jobs_1.id = job_executions_1.job_id]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
Originally posted by @siva-mudiyanur in #57 (comment)
Could supply the instructions related to the postgresql.
Currently, the parser trips up on many common snowflake query history entries like select query_text from table(information_schema.query_history());
- also queries with the rm @SNOWFLAKE_...
syntax... also queries with the keyword recluster
... in the latter case, the error being syntax error at or near "recluster", at index 35
... I am systematically removing these from analysis prior to sending to the analyzer but just FYI that without doing this, the analyzer throws an exception
Hi! I'm interested in extracting data lineage for multiple SQL Querys in a Teradata Database, as I read in other issues you need to access the Query history to let data-lineage do the work, If you have some documentation of how could I integrate this I'm really interested in contributing to implementing a connector for Teradata!
If you want to use an external Postgres database, replace the following parameters in
tokern-lineage-engine.yml
:
- CATALOG_HOST
- CATALOG_USER
- CATALOG_PASSWORD
- CATALOG_DB
This was my first approach but it wasn't working..Here are my observations:
Values provided for External Catalog:
CATALOG_PASSWORD: t@st_passw0rd
CATALOG_USER: catalog_test
CATALOG_DB: tokern
CATALOG_HOST: "127.0.0.1"
Originally posted by @siva-mudiyanur in #57 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.