datapao / dac Goto Github PK
View Code? Open in Web Editor NEWDatabricks Admin Center
License: Apache License 2.0
Databricks Admin Center
License: Apache License 2.0
A key error raises when accessing job_run_dict['cluster_instance"]
. According to the Databricks Job API docs, this is not always present:
The cluster used for this run. If the run is specified to use a new cluster, this field will be set once the Jobs service has requested a cluster for the run.
File "dac/scraping/scraper.py", line 151, in scrape_job_run
state_result_state=state["result_state"] if not failed_run else 'FAIL',
KeyError: 'result_state'
See the docs on availability of result_state
, based on the life_cycle_state
.
The dac-scraper process dies with the following:
ValueError: Unkown event: { ..., 'type': 'INIT_SCRIPTS_STARTED', ... }
Recognized events are: ['INIT', 'CREATING', 'DID_NOT_EXPAND_DISK', 'EXPANDED_DISK', 'FAILED_TO_EXPAND_DISK', 'INIT_SCRIPTS_STARTING', 'INIT_SCRIPTS_FINISHED', 'STARTING', 'RESTARTING', 'TERMINATING', 'EDITED', 'RUNNING', 'RESIZING', 'UPSIZE_COMPLETED', 'NODES_LOST', 'DRIVER_HEALTHY', 'DRIVER_UNAVAILABLE', 'SPARK_EXCEPTION', 'DRIVER_NOT_RESPONDING', 'DBFS_DOWN', 'METASTORE_DOWN', 'AUTOSCALING_STATS_REPORT', 'NODE_BLACKLISTED', 'PINNED', 'UNPINNED']
Stack trace:
Exception in thread scraping-loop-Thread:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1229, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 577, in do_executemany
cursor.executemany(statement, parameters)
sqlite3.IntegrityError: NOT NULL constraint failed: cluster_types.type
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/app/scraping/scraper.py", line 440, in scraping_loop
result = scrape(json_path, session)
File "/app/scraping/scraper.py", line 462, in scrape
instance_types = upsert_instance_types(session)
File "/app/scraping/scraper.py", line 138, in upsert_instance_types
session.commit()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1027, in commit
self.transaction.commit()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 494, in commit
self._prepare_impl()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 473, in _prepare_impl
self.session.flush()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2470, in flush
self._flush(objects)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2608, in _flush
transaction.rollback(_capture_exception=True)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
raise value
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2568, in _flush
flush_context.execute()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute
uow,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
insert,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 1084, in _emit_insert_statements
c = cached_connections[connection].execute(statement, multiparams)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
distilled_params,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1253, in _execute_context
e, statement, parameters, cursor, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1473, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1229, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 577, in do_executemany
cursor.executemany(statement, parameters)
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed: cluster_types.type
[SQL: INSERT INTO cluster_types (scrape_time, type, cpu, mem, dbu_light, dbu_job, dbu_analysis) VALUES (?, ?, ?, ?, ?, ?, ?)]
We have a job where run_state
is not present:
'state': {'life_cycle_state': 'INTERNAL_ERROR', 'state_message': 'Notebook not found: ***REDACTED***'}`
File "dac/scraping/scraper.py", line 145, in scrape_job_run
state_result_state=state["result_state"],
KeyError: 'result_state'
In case of autoscaling clusters there's no num_workers
key in the dictionary. (However there's a key like this 'autoscale': {'min_workers': 1, 'max_workers': 2}
.)
File "dac/scraping/scraper.py", line 72, in scrape_cluster
num_workers=cluster_dict["num_workers"],
KeyError: 'num_workers'
Seems that this should be scraper-run
: https://github.com/datapao/dac/blob/master/db/db.py#L316
In that case, the settings
object contains an existing_cluster_id
key, but not a new_cluster
key. This is permitted according to the Databricks Job API docs.
File "dac/scraping/scraper.py", line 174, in scrape_jobs
new_cluster=job_dict["settings"]["new_cluster"],
KeyError: 'new_cluster'
Recently the scraper stopped working for us, with the following problem:
Exception in thread scraping-loop-Thread:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1229, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 577, in do_executemany
cursor.executemany(statement, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: cluster_states.user_id, cluster_states.cluster_id, cluster_states.timestamp, cluster_states.state
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/app/scraping/scraper.py", line 313, in scraping_loop
result = scrape(json_path)
File "/app/scraping/scraper.py", line 342, in scrape
session.commit()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1027, in commit
self.transaction.commit()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 494, in commit
self._prepare_impl()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 473, in _prepare_impl
self.session.flush()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2470, in flush
self._flush(objects)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2608, in _flush
transaction.rollback(_capture_exception=True)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
raise value
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2568, in _flush
flush_context.execute()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
It's very strange indeed but it seems that the API might return some things twice?
File "dac/aggregation/aggregator.py", line 65, in get_cluster_type
names = clusters[['cluster_name']].copy()
...
KeyError: "None of [Index(['cluster_name'], dtype='object')] are in the [columns]"
It seems to be right: https://github.com/datapao/dac/blob/master/db/db.py#L409
According to the Databricks Job API docs, the creator user name won’t be included in the response if the user has already been deleted. This may result in a key error when scraping a job upon accessing job_dict["creator_user_name"]
.
We have been already running the DAC for a few days and after around a week the operating system killed the scraper because of reaching memory limits:
scripts/dac.sh: line 23: 6 Killed python main.py scrape
[Wed Apr 8 02:51:32 2020] Memory cgroup out of memory: Killed process 7871 (python) total-vm:1209768kB, anon-rss:334320kB, file-rss:33360kB, shmem-rss:0kB
Could there be a memory leak somewhere? (We didn't really give it too much memory, I know...)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.