Comments (14)
The number of rows in the table currently seems to be 51384 (both using sql as
well as from API). Could you please provide a bit more information on the API
call that you are performing that shows 57672 rows?
Original comment by [email protected]
on 13 Apr 2016 at 5:00
from google-bigquery.
[deleted comment]
from google-bigquery.
[deleted comment]
from google-bigquery.
[deleted comment]
from google-bigquery.
> The number of rows in the table currently seems to be 51384 (both using sql
as well as from API)
My colleague retried to load, and removed the old table since correct data was
required for data analysis. That should be why.
I asked him to copy and keep a bad table next time (we encounter this
phenomenon about 3 times per day)
> Could you please provide a bit more information on the API call that you are
performing that shows 57672 rows?
The body of load job api is like this
https://github.com/embulk/embulk-output-bigquery/blob/7698ef320a78dd93ec79a512d5
18daec8865d3bc/lib/embulk/output/bigquery/bigquery_client.rb#L152-L170
body = {
configuration: {
load: {
destination_table: {
project_id: @project,
dataset_id: @dataset,
table_id: table,
},
schema: {
fields: fields,
},
write_disposition: 'WRITE_APPEND',
source_format: 'NEWLINE_DELIMITED_JSON',
max_bad_records: 0,
field_delimiter: nil,
encoding: 'UTF-8',
ignore_unknown_values: false,
allow_quoted_newlines: false,
}
}
opts = {
upload_source: path,
content_type: "application/octet-stream",
}
and issue this load job in parallel to one temporary table.
Original comment by [email protected]
on 13 Apr 2016 at 5:58
from google-bigquery.
get_table API is just like this >
https://github.com/embulk/embulk-output-bigquery/blob/7698ef320a78dd93ec79a512d5
18daec8865d3bc/lib/embulk/output/bigquery/bigquery_client.rb#L380. Nothing is
special.
Original comment by [email protected]
on 13 Apr 2016 at 6:01
from google-bigquery.
I found some information from the history of the table. I will provide an
update with more details 04/13 morning PDT.
Original comment by [email protected]
on 13 Apr 2016 at 6:41
from google-bigquery.
[deleted comment]
from google-bigquery.
Occured again. In this case, the number of rows was correctly 3, but it became
6.
Load job IDs:
job_nbtQdZBoT0PzW9cW46NVohP40H0 (only 1 job because number of inputs was small)
Copy job ID:
job_gg2CvFl9fEH7XrMhJ3fZPp7cl4E
Response.statistics for Load Job IDs
[job_nbtQdZBoT0PzW9cW46NVohP40H0]
response.statistics:{:creation_time=>"1460535126422",
:start_time=>"1460535144895", :load=>{:output_bytes=>"838", :output_rows=>"3",
:input_files=>"1", :input_file_bytes=>"412"}, :end_time=>"1460535155398"}
Report from embulk-output-bigquery:
{"num_input_rows":3,"num_response_rows":3,"num_output_rows":6,"num_rejected_rows
":-3}
Original comment by [email protected]
on 13 Apr 2016 at 8:41
from google-bigquery.
Original comment by [email protected]
on 13 Apr 2016 at 3:40
- Changed state: Accepted
from google-bigquery.
Looking at the first example (with 9 load jobs):
The job corresponding to loading 6288 rows (job_Quhqjcf1DuBQTYYXhWgfPXKjrek)
was submitted twice (possibly due to a retry?). The two job ids are:
job_EWDq-z0DQ9Ho5FpkeSvbm3_Iz0Q
job_Quhqjcf1DuBQTYYXhWgfPXKjrek
If you would like loads to be idempotent, you can supply a job_id to the load
job.
Original comment by [email protected]
on 13 Apr 2016 at 5:46
from google-bigquery.
Detailed advice about managing job retry can be found here:
https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects#managingj
obs
Original comment by [email protected]
on 13 Apr 2016 at 5:57
from google-bigquery.
Hmm, as looking DEBUG log of google-api-ruby-client, I could not find
job_EWDq-z0DQ9Ho5FpkeSvbm3_Iz0Q.
But, thank you for your information, and I will try generating and supply a
job_id by myself.
Original comment by [email protected]
on 13 Apr 2016 at 6:33
from google-bigquery.
Original comment by [email protected]
on 13 Apr 2016 at 7:15
- Changed state: Done
from google-bigquery.
Related Issues (20)
- Single precision float storage option HOT 1
- Any plan to move to more modern issue tracking system? HOT 1
- BigQuery takes more than 5 minutes to process query with string matching and aggregations HOT 1
- BigQuery UI enhancements HOT 2
- data load with fieldDelimiter "%00" HOT 2
- count(*) behaves unpredictably with repeated fields HOT 1
- Ignore case not working in views HOT 4
- bq command line crash requested issue submission HOT 3
- DATEDIFF always returns null if using a date field that is the result of a LAG window function in a subquery / view HOT 1
- Loading csv file to bigquery failed HOT 1
- have ROUND(), FLOOR() and CEIL() return INTEGER type HOT 1
- Unable to copy federated tables HOT 2
- BigQuery UI do not update properly the tables list on "Refresh" action HOT 1
- Table details doesn't get updated when the table is modified HOT 2
- Cannot access shared dataset when using the python client api HOT 2
- BigQuery mistakenly flattens on nested field when it's not referenced HOT 1
- billingTierLimitExceeded appears for Load job HOT 6
- Error running query
- An internal error occurred and the request could not be completed. Error: 3144498
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from google-bigquery.