Comments (12)
Ok, I think I have a fix for this.
This is a horrible bit of bookkeeping. For context, this is what is happening:
We set the catalog using a context manager, and we set the database also using a context manager. Currently what is happening to you is this weird edge case:
set catalog to comms_media_dev
(succeeds)
set database to dart_extensions
within comms_media_dev
(succeeds)
then we write the table, great! Now we try to change catalog and database back in reverse order and...
set database to default
(the previous value that we saved) within comms_media_dev
(fails, you do not have permission to access that)
set catalog back to spark_catalog
(or previous value, but we never get here because of the previous error.
So I think what we need to do is instead:
set catalog
set database
write table
set catalog back
set database back
It would be really great if spark would allow for setting both of these values at the same time, but that is apparently not a thing.
from ibis.
If you want to try out that PR, @mark-druffel, that would be a huge help until I can get a much more complicated pyspark testing setup put together.
from ibis.
Hey @mark-druffel -- let's take the conversation over to #9042 -- I think I know what I missed in that PR that is still causing you errors. Thanks for helping us test this out!
EDIT: let's continue over in #9067 where I'm trying to fix this
from ibis.
Hey @mark-druffel -- we've merged in my fixes from #9067 so hopefully main
will be working now -- definitely let us know if things are still failing, and thanks for your help in testing this out!
from ibis.
@gforsyth I'm still trying to debug this myself to understand but wanted to post here as well. Let me know if I should open a new issue instead of posting on the closed one.
The fix doesn't seem to be working on my end if I'm using it correctly. I double checked my Ibis version to make sure I'm on the right one, I installed w/ ibis-framework[pyspark] @ git+https://github.com/ibis-project/ibis.git@2c1a58e25575f9f0d9876e37b49154e276558526
and my version shown in the environment was:
Just now (2s)
%pip show ibis-framework
Name: ibis-framework
Version: 9.0.0.dev677
Summary: The portable Python dataframe library
Home-page: https://ibis-project.org
Author: Ibis Maintainers
Author-email: [email protected]
License: Apache-2.0
Location: /local_disk0/.ephemeral_nfs/envs/pythonEnv-59b1aa40-b629-4cf3-82ae-6f44d5b1b2f6/lib/python3.10/site-packages
Requires: atpublic, bidict, numpy, pandas, parsy, pyarrow, pyarrow-hotfix, python-dateutil, pytz, rich, sqlglot, toolz, typing-extensions
Required-by:
I tested w/ the following code and got an error that made me think it was trying to split string to tuple, but I already provided a tuple:
import ibis
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
ispark = ibis.pyspark.connect(session = spark)
df = ispark.read_parquet("abfss://media_meas_campaign_info/")
ispark.create_table(name = "raw_media_meas_campaign_info", obj = df, database=["comms_media_dev", "dart_extensions"], overwrite=True)
ValueError: oops
File , line 7
4 ispark = ibis.pyspark.connect(session = spark)
6 df = ispark.read_parquet("abfss://media_meas_campaign_info/")
----> 7 ispark.create_table(name = "raw_media_meas_campaign_info", obj = df, database=["comms_media_dev", "dart_extensions"], overwrite=True)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-59b1aa40-b629-4cf3-82ae-6f44d5b1b2f6/lib/python3.10/site-packages/ibis/backends/pyspark/init.py:497, in Backend.create_table(self, name, obj, schema, database, temp, overwrite, format)
492 if temp is True:
493 raise NotImplementedError(
494 "PySpark backend does not yet support temporary tables"
495 )
--> 497 table_loc = self._to_sqlglot_table(database)
498 catalog, db = self._to_catalog_db_tuple(table_loc)
500 if obj is not None:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-59b1aa40-b629-4cf3-82ae-6f44d5b1b2f6/lib/python3.10/site-packages/ibis/backends/sql/init.py:561, in SQLBackend._to_sqlglot_table(self, database)
559 database = sg.exp.Table(catalog=catalog, db=db)
560 else:
--> 561 raise ValueError("oops")
563 return database
So I tried providing catalog and database as string with dot separator and my error looks similar to the error I got when I opened the issue initially. It seems like it accepted my catalog argument, but dropped my database argument and substituted default
:
import ibis
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
ispark = ibis.pyspark.connect(session = spark)
df = ispark.read_parquet("abfss://media_meas_campaign_info.parquet")
ispark.create_table(name = "raw_media_meas_campaign_info", obj = df, database="comms_media_dev.dart_extensions", overwrite=True)
Py4JJavaError: An error occurred while calling o435.setCurrentDatabase.
: com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: User does not have USE SCHEMA on Schema 'comms_media_dev.default'.
at com.databricks.managedcatalog.UCReliableHttpClient.reliablyAndTranslateExceptions(UCReliableHttpClient.scala:87)
at com.databricks.managedcatalog.UCReliableHttpClient.get(UCReliableHttpClient.scala:139)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$getSchema$1(ManagedCatalogClientImpl.scala:540)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$2(ManagedCatalogClientImpl.scala:4400)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:4399)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:25)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:23)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:151)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:4396)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.getSchema(ManagedCatalogClientImpl.scala:533)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.shouldUpdateSchemaMetadata(ManagedCatalogCommon.scala:2199)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.getSchemaMetadataInternal(ManagedCatalogCommon.scala:2652)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.$anonfun$getSchemaMetadata$3(ManagedCatalogCommon.scala:282)
at scala.Option.getOrElse(Option.scala:189)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.getSchemaMetadata(ManagedCatalogCommon.scala:282)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.schemaExists(ManagedCatalogCommon.scala:287)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$schemaExists$1(ProfiledManagedCatalog.scala:143)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at org.apache.spark.sql.catalyst.MetricKeyUtils$.measure(MetricKey.scala:672)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$profile$1(ProfiledManagedCatalog.scala:60)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.profile(ProfiledManagedCatalog.scala:59)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.schemaExists(ProfiledManagedCatalog.scala:143)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.databaseExists(ManagedCatalogSessionCatalog.scala:625)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.requireScExists(ManagedCatalogSessionCatalog.scala:275)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.setCurrentDatabase(ManagedCatalogSessionCatalog.scala:486)
at com.databricks.sql.DatabricksCatalogManager.setCurrentNamespace(DatabricksCatalogManager.scala:156)
at org.apache.spark.sql.internal.CatalogImpl.setCurrentDatabase(CatalogImpl.scala:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.lang.Thread.run(Thread.java:750)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-59b1aa40-b629-4cf3-82ae-6f44d5b1b2f6/lib/python3.10/site-packages/ibis/backends/pyspark/init.py:240, in Backend._active_database(self, name)
239 self._session.catalog.setCurrentDatabase(name)
--> 240 yield
241 finally:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-59b1aa40-b629-4cf3-82ae-6f44d5b1b2f6/lib/python3.10/site-packages/ibis/backends/pyspark/init.py:507, in Backend.create_table(self, name, obj, schema, database, temp, overwrite, format)
506 df = self._session.sql(query)
--> 507 df.write.saveAsTable(name, format=format, mode=mode)
508 elif schema is not None:
File /databricks/spark/python/pyspark/instrumentation_utils.py:47, in _wrap_function..wrapper(*args, **kwargs)
46 try:
---> 47 res = func(*args, **kwargs)
48 logger.log_success(
49 module_name, class_name, function_name, time.perf_counter() - start, signature
50 )
File /databricks/spark/python/pyspark/sql/readwriter.py:1841, in DataFrameWriter.saveAsTable(self, name, format, mode, partitionBy, **options)
1840 self.format(format)
-> 1841 self._jwrite.saveAsTable(name)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.call(self, *args)
1354 answer = self.gateway_client.send_command(command)
-> 1355 return_value = get_return_value(
1356 answer, self.gateway_client, self.target_id, self.name)
1358 for temp_arg in temp_args:
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:230, in capture_sql_exception..deco(*a, **kw)
227 if not isinstance(converted, UnknownException):
228 # Hide where the exception came from that shows a non-Pythonic
229 # JVM exception message.
--> 230 raise converted from None
231 else:
AnalysisException: [RequestId=5899623e-983f-4972-812a-dbdc7706c8a3 ErrorClass=INVALID_PARAMETER_VALUE.MANAGED_TABLE_FORMAT] Only Delta is supported for managed tablesDuring handling of the above exception, another exception occurred:
Py4JJavaError Traceback (most recent call last)
File , line 7
4 ispark = ibis.pyspark.connect(session = spark)
6 df = ispark.read_parquet("abfss://lmedia_meas_campaign_info/")
----> 7 ispark.create_table(name = "raw_media_meas_campaign_info", obj = df, database="comms_media_dev.dart_extensions", overwrite=True)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-59b1aa40-b629-4cf3-82ae-6f44d5b1b2f6/lib/python3.10/site-packages/ibis/backends/pyspark/init.py:504, in Backend.create_table(self, name, obj, schema, database, temp, overwrite, format)
502 query = self.compile(table)
503 mode = "overwrite" if overwrite else "error"
--> 504 with self._active_catalog(catalog), self._active_database(db):
505 self._run_pre_execute_hooks(table)
506 df = self._session.sql(query)
File /usr/lib/python3.10/contextlib.py:153, in _GeneratorContextManager.exit(self, typ, value, traceback)
151 value = typ()
152 try:
--> 153 self.gen.throw(typ, value, traceback)
154 except StopIteration as exc:
155 # Suppress StopIteration unless it's the same exception that
156 # was passed to throw(). This prevents a StopIteration
157 # raised inside the "with" statement from being suppressed.
158 return exc is not value
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-59b1aa40-b629-4cf3-82ae-6f44d5b1b2f6/lib/python3.10/site-packages/ibis/backends/pyspark/init.py:242, in Backend._active_database(self, name)
240 yield
241 finally:
--> 242 self._session.catalog.setCurrentDatabase(current)
File /databricks/spark/python/pyspark/instrumentation_utils.py:47, in _wrap_function..wrapper(*args, **kwargs)
45 start = time.perf_counter()
46 try:
---> 47 res = func(*args, **kwargs)
48 logger.log_success(
49 module_name, class_name, function_name, time.perf_counter() - start, signature
50 )
51 return res
File /databricks/spark/python/pyspark/sql/catalog.py:193, in Catalog.setCurrentDatabase(self, dbName)
183 def setCurrentDatabase(self, dbName: str) -> None:
184 """
185 Sets the current default database in this session.
186
(...)
191 >>> spark.catalog.setCurrentDatabase("default")
192 """
--> 193 return self._jcatalog.setCurrentDatabase(dbName)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.call(self, *args)
1349 command = proto.CALL_COMMAND_NAME +
1350 self.command_header +
1351 args_command +
1352 proto.END_COMMAND_PART
1354 answer = self.gateway_client.send_command(command)
-> 1355 return_value = get_return_value(
1356 answer, self.gateway_client, self.target_id, self.name)
1358 for temp_arg in temp_args:
1359 if hasattr(temp_arg, "_detach"):
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:224, in capture_sql_exception..deco(*a, **kw)
222 def deco(*a: Any, **kw: Any) -> Any:
223 try:
--> 224 return f(*a, **kw)
225 except Py4JJavaError as e:
226 converted = convert_exception(e.java_exception)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
from ibis.
Hey @mark-druffel -- I don't have multiple catalogs set up, so it's very possible I missed something.
That first error you got is because it needs to be a tuple
, not a list
-- we can almost certainly relax that requirement (and also have a much better error message. So it would be
ispark.create_table(name = "raw_media_meas_campaign_info", obj = df, database=("comms_media_dev", "dart_extensions"), overwrite=True)
That said, the second way should be equivalent to the tuple way, so something is a bit wrong. I will try to figure out what's going sideways.
from ibis.
That makes sense, I just tried that and error looks the same as the second attempt from above. Please let me know if there's anything I can do to help and thanks so much for your quick response!
ispark.create_table(name = "raw_media_meas_campaign_info", obj = df, database=("comms_media_dev", "dart_extensions"), overwrite=True)
> Py4JJavaError: An error occurred while calling o435.setCurrentDatabase.
> : com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: User does not have USE SCHEMA on Schema 'comms_media_dev.default'.
> at com.databricks.managedcatalog.UCReliableHttpClient.reliablyAndTranslateExceptions(UCReliableHttpClient.scala:87)
> at com.databricks.managedcatalog.UCReliableHttpClient.get(UCReliableHttpClient.scala:139)
> at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$getSchema$1(ManagedCatalogClientImpl.scala:540)
> at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$2(ManagedCatalogClientImpl.scala:4400)
> at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
> at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:4399)
> at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:25)
> at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:23)
> at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:151)
> at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:4396)
> at com.databricks.managedcatalog.ManagedCatalogClientImpl.getSchema(ManagedCatalogClientImpl.scala:533)
> at com.databricks.sql.managedcatalog.ManagedCatalogCommon.shouldUpdateSchemaMetadata(ManagedCatalogCommon.scala:2199)
> at com.databricks.sql.managedcatalog.ManagedCatalogCommon.getSchemaMetadataInternal(ManagedCatalogCommon.scala:2652)
> at com.databricks.sql.managedcatalog.ManagedCatalogCommon.$anonfun$getSchemaMetadata$3(ManagedCatalogCommon.scala:282)
> at scala.Option.getOrElse(Option.scala:189)
> at com.databricks.sql.managedcatalog.ManagedCatalogCommon.getSchemaMetadata(ManagedCatalogCommon.scala:282)
> at com.databricks.sql.managedcatalog.ManagedCatalogCommon.schemaExists(ManagedCatalogCommon.scala:287)
> at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$schemaExists$1(ProfiledManagedCatalog.scala:143)
> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
> at org.apache.spark.sql.catalyst.MetricKeyUtils$.measure(MetricKey.scala:672)
> at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$profile$1(ProfiledManagedCatalog.scala:60)
> at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
> at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.profile(ProfiledManagedCatalog.scala:59)
> at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.schemaExists(ProfiledManagedCatalog.scala:143)
> at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.databaseExists(ManagedCatalogSessionCatalog.scala:625)
> at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.requireScExists(ManagedCatalogSessionCatalog.scala:275)
> at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.setCurrentDatabase(ManagedCatalogSessionCatalog.scala:486)
> at com.databricks.sql.DatabricksCatalogManager.setCurrentNamespace(DatabricksCatalogManager.scala:156)
> at org.apache.spark.sql.internal.CatalogImpl.setCurrentDatabase(CatalogImpl.scala:100)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
> at py4j.Gateway.invoke(Gateway.java:306)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
> at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
> at java.lang.Thread.run(Thread.java:750)
> File /local_disk0/.ephemeral_nfs/envs/pythonEnv-59b1aa40-b629-4cf3-82ae-6f44d5b1b2f6/lib/python3.10/site-packages/ibis/backends/pyspark/__init__.py:240, in Backend._active_database(self, name)
> 239 self._session.catalog.setCurrentDatabase(name)
> --> 240 yield
> 241 finally:
> File /local_disk0/.ephemeral_nfs/envs/pythonEnv-59b1aa40-b629-4cf3-82ae-6f44d5b1b2f6/lib/python3.10/site-packages/ibis/backends/pyspark/__init__.py:507, in Backend.create_table(self, name, obj, schema, database, temp, overwrite, format)
> 506 df = self._session.sql(query)
> --> 507 df.write.saveAsTable(name, format=format, mode=mode)
> 508 elif schema is not None:
> File /databricks/spark/python/pyspark/instrumentation_utils.py:47, in _wrap_function.<locals>.wrapper(*args, **kwargs)
> 46 try:
> ---> 47 res = func(*args, **kwargs)
> 48 logger.log_success(
> 49 module_name, class_name, function_name, time.perf_counter() - start, signature
> 50 )
> File /databricks/spark/python/pyspark/sql/readwriter.py:1841, in DataFrameWriter.saveAsTable(self, name, format, mode, partitionBy, **options)
> 1840 self.format(format)
> -> 1841 self._jwrite.saveAsTable(name)
> File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.__call__(self, *args)
> 1354 answer = self.gateway_client.send_command(command)
> -> 1355 return_value = get_return_value(
> 1356 answer, self.gateway_client, self.target_id, self.name)
> 1358 for temp_arg in temp_args:
> File /databricks/spark/python/pyspark/errors/exceptions/captured.py:230, in capture_sql_exception.<locals>.deco(*a, **kw)
> 227 if not isinstance(converted, UnknownException):
> 228 # Hide where the exception came from that shows a non-Pythonic
> 229 # JVM exception message.
> --> 230 raise converted from None
> 231 else:
> AnalysisException: [RequestId=952d4d82-e41a-4892-83ba-d52cbbfce80e ErrorClass=INVALID_PARAMETER_VALUE.MANAGED_TABLE_FORMAT] Only Delta is supported for managed tables
>
> During handling of the above exception, another exception occurred:
> Py4JJavaError Traceback (most recent call last)
> File <command-1657592028371427>, line 7
> 4 ispark = ibis.pyspark.connect(session = spark)
> 6 df = ispark.read_parquet("abfss://media_meas_campaign_info")
> ----> 7 ispark.create_table(name = "raw_media_meas_campaign_info", obj = df, database=("comms_media_dev", "dart_extensions"), overwrite=True)
> File /local_disk0/.ephemeral_nfs/envs/pythonEnv-59b1aa40-b629-4cf3-82ae-6f44d5b1b2f6/lib/python3.10/site-packages/ibis/backends/pyspark/__init__.py:504, in Backend.create_table(self, name, obj, schema, database, temp, overwrite, format)
> 502 query = self.compile(table)
> 503 mode = "overwrite" if overwrite else "error"
> --> 504 with self._active_catalog(catalog), self._active_database(db):
> 505 self._run_pre_execute_hooks(table)
> 506 df = self._session.sql(query)
> File /usr/lib/python3.10/contextlib.py:153, in _GeneratorContextManager.__exit__(self, typ, value, traceback)
> 151 value = typ()
> 152 try:
> --> 153 self.gen.throw(typ, value, traceback)
> 154 except StopIteration as exc:
> 155 # Suppress StopIteration *unless* it's the same exception that
> 156 # was passed to throw(). This prevents a StopIteration
> 157 # raised inside the "with" statement from being suppressed.
> 158 return exc is not value
> File /local_disk0/.ephemeral_nfs/envs/pythonEnv-59b1aa40-b629-4cf3-82ae-6f44d5b1b2f6/lib/python3.10/site-packages/ibis/backends/pyspark/__init__.py:242, in Backend._active_database(self, name)
> 240 yield
> 241 finally:
> --> 242 self._session.catalog.setCurrentDatabase(current)
> File /databricks/spark/python/pyspark/instrumentation_utils.py:47, in _wrap_function.<locals>.wrapper(*args, **kwargs)
> 45 start = time.perf_counter()
> 46 try:
> ---> 47 res = func(*args, **kwargs)
> 48 logger.log_success(
> 49 module_name, class_name, function_name, time.perf_counter() - start, signature
> 50 )
> 51 return res
> File /databricks/spark/python/pyspark/sql/catalog.py:193, in Catalog.setCurrentDatabase(self, dbName)
> 183 def setCurrentDatabase(self, dbName: str) -> None:
> 184 """
> 185 Sets the current default database in this session.
> 186
> (...)
> 191 >>> spark.catalog.setCurrentDatabase("default")
> 192 """
> --> 193 return self._jcatalog.setCurrentDatabase(dbName)
> File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.__call__(self, *args)
> 1349 command = proto.CALL_COMMAND_NAME +\
> 1350 self.command_header +\
> 1351 args_command +\
> 1352 proto.END_COMMAND_PART
> 1354 answer = self.gateway_client.send_command(command)
> -> 1355 return_value = get_return_value(
> 1356 answer, self.gateway_client, self.target_id, self.name)
> 1358 for temp_arg in temp_args:
> 1359 if hasattr(temp_arg, "_detach"):
> File /databricks/spark/python/pyspark/errors/exceptions/captured.py:224, in capture_sql_exception.<locals>.deco(*a, **kw)
> 222 def deco(*a: Any, **kw: Any) -> Any:
> 223 try:
> --> 224 return f(*a, **kw)
> 225 except Py4JJavaError as e:
> 226 converted = convert_exception(e.java_exception)
> File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
> 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
> 325 if answer[1] == REFERENCE_TYPE:
> --> 326 raise Py4JJavaError(
> 327 "An error occurred while calling {0}{1}{2}.\n".
> 328 format(target_id, ".", name), value)
> 329 else:
> 330 raise Py4JError(
> 331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
> 332 format(target_id, ".", name, value))
from ibis.
Question that might seem a bit odd, but does the table show up in the appropriate place in spite of the error message?
In the output of something like:
ispark.list_tables(database=("comms_media_dev", "dart_extensions"))
from ibis.
from ibis.
Not sure if this helps at all, but if I set the catalog & db from the spark session and pass the parameter and it appears the db switches back to default.
from ibis.
^ Sorry disregard didn't see your last pop through. Yea spark not allowing both at the same time is really annoying imho
from ibis.
Sorry for the delay, databricks takes forever to start... Now it says the schema can't be found, but I provided an obj
parameter 🤔 I also added format='delta'
because I did get an error w/o it this time saying managed tables must be in delta format. I had forgotten to add it on prior tests, but that's a valid error which I wasn't getting before.
import ibis
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
ispark = ibis.pyspark.connect(session = spark)
df = ispark.read_parquet("abfss://media_meas_campaign_info/")
print(f"Tables: {ispark.list_tables(database = ('comms_media_dev','dart_extensions'))}\n")
print(f"Current Catalog: {ispark._session.catalog.currentCatalog()}\n")
print(f"Current Database: {ispark._session.catalog.currentDatabase()}\n")
ispark.create_table(name = "raw_media_meas_campaign_info", obj = df, database=('comms_media_dev','dart_extensions'), overwrite=True, format = "delta")
print(f"Current Catalog: {ispark._session.catalog.currentCatalog()}\n")
print(f"Current Database: {ispark._session.catalog.currentDatabase()}\n")
Tables: ['ibis_read_parquet_472gdhsajjakrgoq2mzf7ffz7u', 'ibis_read_parquet_73xgg7oaunet5oyv5rmderp7wa', 'ibis_read_parquet_dzbw5jngqngsxpg6ug7u266w2i', 'ibis_read_parquet_g2kop6usdncf3k67qgk4i7igpi', 'ibis_read_parquet_j6q3xnj7uzcg5ecfsmdty6l4xa', 'ibis_read_parquet_wifrr4hijbevvdhhlv5kivn2ey', 'ibis_read_parquet_xzhqoneiorfqhfiqdk7nqmpe4u', 'raw_media_meas_offer_info', 'raw_target_history', 'standardized_media_meas_campaign_info', 'standardized_media_meas_offer_info', 'standardized_target_history']
Current Catalog: hive_metastore
Current Database: default
[[SCHEMA_NOT_FOUND](https://docs.microsoft.com/azure/databricks/error-messages/error-classes#schema_not_found)] The schema `dart_extensions` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
To tolerate the error on drop use DROP SCHEMA IF EXISTS. SQLSTATE: 42704
File <command-4437199335976496>, line 10
8 print(f"Current Catalog: {ispark._session.catalog.currentCatalog()}\n")
9 print(f"Current Database: {ispark._session.catalog.currentDatabase()}\n")
---> 10 ispark.create_table(name = "raw_media_meas_campaign_info", obj = df, database=('comms_media_dev','dart_extensions'), overwrite=True, format = "delta")
11 print(f"Current Catalog: {ispark._session.catalog.currentCatalog()}\n")
12 print(f"Current Database: {ispark._session.catalog.currentDatabase()}\n")
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-10d3656f-1fae-4528-917f-49d0869552d4/lib/python3.10/site-packages/ibis/backends/pyspark/__init__.py:532, in Backend.create_table(self, name, obj, schema, database, temp, overwrite, format)
529 else:
530 raise com.IbisError("The schema or obj parameter is required")
--> 532 return self.table(name, database=db)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-10d3656f-1fae-4528-917f-49d0869552d4/lib/python3.10/site-packages/ibis/backends/sql/__init__.py:137, in SQLBackend.table(self, name, schema, database)
134 catalog = table_loc.catalog or None
135 database = table_loc.db or None
--> 137 table_schema = self.get_schema(name, catalog=catalog, database=database)
138 return ops.DatabaseTable(
139 name,
140 schema=table_schema,
141 source=self,
142 namespace=ops.Namespace(catalog=catalog, database=database),
143 ).to_expr()
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-10d3656f-1fae-4528-917f-49d0869552d4/lib/python3.10/site-packages/ibis/backends/pyspark/__init__.py:459, in Backend.get_schema(self, table_name, catalog, database)
457 table_loc = self._to_sqlglot_table((catalog, database))
458 catalog, db = self._to_catalog_db_tuple(table_loc)
--> 459 with self._active_catalog_database(catalog, db):
460 df = self._session.table(table_name)
461 struct = PySparkType.to_ibis(df.schema)
File /usr/lib/python3.10/contextlib.py:135, in _GeneratorContextManager.__enter__(self)
133 del self.args, self.kwds, self.func
134 try:
--> 135 return next(self.gen)
136 except StopIteration:
137 raise RuntimeError("generator didn't yield") from None
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-10d3656f-1fae-4528-917f-49d0869552d4/lib/python3.10/site-packages/ibis/backends/pyspark/__init__.py:254, in Backend._active_catalog_database(self, catalog, db)
252 if not PYSPARK_LT_34 and catalog is not None:
253 self._session.catalog.setCurrentCatalog(catalog)
--> 254 self._session.catalog.setCurrentDatabase(db)
255 yield
256 finally:
File /databricks/spark/python/pyspark/instrumentation_utils.py:47, in _wrap_function.<locals>.wrapper(*args, **kwargs)
45 start = time.perf_counter()
46 try:
---> 47 res = func(*args, **kwargs)
48 logger.log_success(
49 module_name, class_name, function_name, time.perf_counter() - start, signature
50 )
51 return res
File /databricks/spark/python/pyspark/sql/catalog.py:193, in Catalog.setCurrentDatabase(self, dbName)
183 def setCurrentDatabase(self, dbName: str) -> None:
184 """
185 Sets the current default database in this session.
186
(...)
191 >>> spark.catalog.setCurrentDatabase("default")
192 """
--> 193 return self._jcatalog.setCurrentDatabase(dbName)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.__call__(self, *args)
1349 command = proto.CALL_COMMAND_NAME +\
1350 self.command_header +\
1351 args_command +\
> 1352 proto.END_COMMAND_PART
> 1354 answer = self.gateway_client.send_command(command)
> -> 1355 return_value = get_return_value(
> 1356 answer, self.gateway_client, self.target_id, self.name)
> 1358 for temp_arg in temp_args:
> 1359 if hasattr(temp_arg, "_detach"):
> File /databricks/spark/python/pyspark/errors/exceptions/captured.py:230, in capture_sql_exception.<locals>.deco(*a, **kw)
> 226 converted = convert_exception(e.java_exception)
> 227 if not isinstance(converted, UnknownException):
> 228 # Hide where the exception came from that shows a non-Pythonic
> 229 # JVM exception message.
> --> 230 raise converted from None
> 231 else:
> 232 raise
from ibis.
Related Issues (20)
- bug Error when connecting to Trino after upgrading to 9.0.0
- bug: table formatting characters don't render monospace with some fonts HOT 5
- bug: `read_parquet` and similar methods silently overwrite tables HOT 2
- bug: names_sort argument in table.pivot_wider has no effect HOT 1
- feat: add a method for table existence check HOT 5
- bug: `create_table(temp=True)` timing out due to slow table existence check HOT 2
- add support for TIMESTAMPTZ HOT 4
- feat: add a table_exists(table_name) api HOT 1
- bug: `to_sql` always shows DuckDB SQL for a memtable even if there's a default backend set HOT 1
- Polars backend can read only 1 csv HOT 1
- docs: add ops docstrings
- ci: more granular cloud run labels
- feat: allow to connect to a duckdb named in-memory database
- feat(memtable): handle castable inputs from date-like strings to date HOT 3
- feat: better distinguish ibis scalars from evaluated results in interactive repr
- feat: add databricks support HOT 2
- bug: .insert() uses column position, not names, to align HOT 1
- feat(settings): generalize and extend settings passthrough dictionary
- feat: Leverage SQLGlot AST for query parsing HOT 2
- bug: EOFError when running ibis tutorial code with ClickHouse backend HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ibis.