Comments (5)
Can you please elaborate what are your requirements?
from spark-bigquery-connector.
We are diagnosing variable job runtime and are looking at a Spark job that does a large read from BigQuery. It is difficult to tell how long the isolated portion of the BigQuery read takes since a stage containing the read might also include something like a broadcast join so the plan view in the Spark History UI doesn't always represent just the BQ portion.
Correction on the connector version, it is 0.30.
The Spark driver logs this by default and I was looking for some other options.
24/03/27 03:31:44 INFO DirectBigQueryRelation: |Querying table xyz.123, parameters sent from Spark:|requiredColumns=[<column>],|filters=[] 24/03/27 03:31:46 INFO ReadSessionCreator: Read session:{"readSessionName":"projects/xyz","readSessionCreationStartTime":"2024-03-27T03:31:44.470062Z","readSessionCreationEndTime":"2024-03-27T03:31:46.047985Z","readSessionPrepDuration":740,"readSessionCreationDuration":837,"readSessionDuration":1577} 24/03/27 03:31:46 INFO ReadSessionCreator: Requested 20000 max partitions, but only received 2 from the BigQuery Storage API for session xyz.123. Notice that the number of streams in actual may be lower than the requested number, depending on the amount parallelism that is reasonable for the table and the maximum amount of parallelism allowed by the system. 24/03/27 03:31:46 INFO BigQueryRDDFactory: Created read session for table 'xyz.123': xyz.123
I don't think readSessionDuration
represents the actual time in BQ retrieval. Looks like there has been a lot of work around this recently.
from spark-bigquery-connector.
Are you using filers? Can you please upgrade to version 0.37.0 ? Also, switching to the latest flaor of the connector (spark-3.x-bigquery
) may help
from spark-bigquery-connector.
Some queries use filters, perhaps most?
A BQ upgrade is on the horizon, I think there is a breaking decimal change sometime after .30 that I haven't looked closely at yet.
Just to be clear, there's nothing on .30 that I can change to DEBUG in a logging config for more activity timing?
from spark-bigquery-connector.
Looks pretty good if we can get there.
from spark-bigquery-connector.
Related Issues (20)
- Idempotent write support in BQ
- JARs marked 'latest' not being updated HOT 1
- Automatically read JSON types
- BigQuery Pushdown filtering on Spark 3.4.2 HOT 10
- BIGNUMERIC Precision Handling: Inaccurate Decimal Values HOT 2
- java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.encoders.ExpressionEncoder org.apache.spark.sql.catalyst.encoders.RowEncoder.apply(org.apache.spark.sql.types.StructType)' HOT 4
- How to handle column datatype change? HOT 8
- Add `createTable(TableInfo)` method to `BigQueryClient`
- Error: This connector was made for Scala null, it was not meant to run on Scala 2.12 HOT 2
- Predicate pushdown doesn't work with DateTime BQ field - Spark 3.5, connector version 0.37 HOT 1
- error while writing with many timestamp data
- Request to Stop Enforcing Delete Permissions on Materialization Dataset HOT 1
- PySpark virtualenv: Missing dependencies lead to No FileSystem for scheme "gs" HOT 3
- Unable to write to a partitioned table in spark 3.5 with connector version spark-3.4-bigquery-0.34.0 HOT 6
- TIMESTAMP appears to add an hour compared to DATETIME HOT 2
- Indirect write to existing datetime column not possible HOT 2
- Configuring DataProc Cluster to Publish BigQuery Metrics on GCP HOT 1
- BigNumeric precision is too wide (76), Spark can only handle decimal types with max precision of 38 HOT 6
- BigQuery Storage API always returning 200 partitions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-bigquery-connector.