GithubHelp home page GithubHelp logo

einride / protobuf-bigquery-go Goto Github PK

View Code? Open in Web Editor NEW
44.0 19.0 5.0 507 KB

Seamlessly save and load protocol buffers to and from BigQuery using Go.

Home Page: https://pkg.go.dev/go.einride.tech/protobuf-bigquery

License: MIT License

Makefile 2.93% Go 97.07%
go golang protobufs protocol-buffers protobuf bigquery google-cloud

protobuf-bigquery-go's Introduction

Protobuf + BigQuery + Go

PkgGoDev GoReportCard Codecov

Seamlessly save and load protocol buffers to and from BigQuery using Go.

This library provides add-ons to cloud.google.com/bigquery for first-class protobuf support using protobuf reflection.

Installing

$ go get -u go.einride.tech/protobuf-bigquery

Examples

protobq.InferSchema

BigQuery schema inference for arbitrary protobuf messages.

func ExampleInferSchema() {
	msg := &library.Book{}
	schema := protobq.InferSchema(msg)
	expected := bigquery.Schema{
		{Name: "name", Type: bigquery.StringFieldType},
		{Name: "author", Type: bigquery.StringFieldType},
		{Name: "title", Type: bigquery.StringFieldType},
		{Name: "read", Type: bigquery.BooleanFieldType},
	}
	fmt.Println(cmp.Equal(expected, schema))
	// Output: true
}

protobq.MessageSaver

An implementation of bigquery.ValueSaver that saves arbitrary protobuf messages to BigQuery.

func ExampleMessageSaver() {
	ctx := context.Background()
	// Write protobuf messages to a BigQuery table.
	projectID := flag.String("project", "", "BigQuery project to write to.")
	datasetID := flag.String("dataset", "", "BigQuery dataset to write to.")
	tableID := flag.String("table", "", "BigQuery table to write to.")
	create := flag.Bool("create", false, "Flag indicating whether to create the table.")
	flag.Parse()
	// Connect to BigQuery.
	client, err := bigquery.NewClient(ctx, *projectID)
	if err != nil {
		panic(err) // TODO: Handle error.
	}
	table := client.Dataset(*datasetID).Table(*tableID)
	// Create the table by inferring the BigQuery schema from the protobuf schema.
	if *create {
		if err := table.Create(ctx, &bigquery.TableMetadata{
			Schema: protobq.InferSchema(&publicv1.FilmLocation{}),
		}); err != nil {
			panic(err) // TODO: Handle error.
		}
	}
	// Insert the protobuf messages.
	inserter := table.Inserter()
	for i, filmLocation := range []*publicv1.FilmLocation{
		{Title: "Dark Passage", ReleaseYear: 1947, Locations: "Filbert Steps"},
		{Title: "D.O.A", ReleaseYear: 1950, Locations: "Union Square"},
		{Title: "Flower Drum Song", ReleaseYear: 1961, Locations: "Chinatown"},
	} {
		if err := inserter.Put(ctx, &protobq.MessageSaver{
			Message:  filmLocation,
			InsertID: strconv.Itoa(i), // include an optional insert ID
		}); err != nil {
			panic(err) // TODO: Handle error.
		}
	}
}

protobq.MessageLoader

An implementation of bigquery.ValueLoader that loads arbitrary protobuf messages from BigQuery.

func ExampleMessageLoader() {
	ctx := context.Background()
	// Read from the public "film locations" BigQuery dataset into a proto message.
	const (
		project = "bigquery-public-data"
		dataset = "san_francisco_film_locations"
		table   = "film_locations"
	)
	// Connect to BigQuery.
	client, err := bigquery.NewClient(ctx, project)
	if err != nil {
		panic(err) // TODO: Handle error.
	}
	// Load BigQuery rows into a FilmLocation message.
	messageLoader := &protobq.MessageLoader{
		Message: &publicv1.FilmLocation{},
	}
	// Iterate rows in table.
	rowIterator := client.Dataset(dataset).Table(table).Read(ctx)
	for {
		// Load next row into the FilmLocation message.
		if err := rowIterator.Next(messageLoader); err != nil {
			if errors.Is(err, iterator.Done) {
				break
			}
			panic(err) // TODO: Handle error.
		}
		// Print the message.
		fmt.Println(prototext.Format(messageLoader.Message))
	}
}

Features

Support for Well-Known Types (google.protobuf)

Protobuf BigQuery
google.protobuf.Timestamp TIMESTAMP
google.protobuf.Duration FLOAT (seconds)
google.protobuf.DoubleValue FLOAT
google.protobuf.FloatValue FLOAT
google.protobuf.Int32Value INTEGER
google.protobuf.Int64Value INTEGER
google.protobuf.Uint32Value INTEGER
google.protobuf.Uint64Value INTEGER
google.protobuf.BoolValue BOOLEAN
google.protobuf.StringValue STRING
google.protobuf.BytesValue BYTES
google.protobuf.StructValue STRING (JSON)

Reference ≫

Support for API Common Protos (google.type)

Protobuf BigQuery
google.type.Date DATE
google.type.DateTime RECORD (or DATETIME)
google.type.LatLng GEOGRAPHY
google.type.TimeOfDay TIME

Reference ≫

protobuf-bigquery-go's People

Contributors

alfredgunnar avatar dependabot[bot] avatar m4mm4r avatar niklaskb avatar odsod avatar sofiathoren avatar stinakas avatar tiopramayudi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

protobuf-bigquery-go's Issues

Question: How does this work with BigQuery Managed Writer

Migrated recently to use Bigquery Storage Write API(https://cloud.google.com/bigquery/docs/write-api-batch#go), and was wondering how this package works with the new API. Infer schema works perfect but it conflicts with the Proto Descriptor which is now needed when writing to pending streams

        // We need to communicate the descriptor of the protocol buffer message we're using, which
        // is analagous to the "schema" for the message.  Both SampleData and SampleStruct are
        // two distinct messages in the compiled proto file, so we'll use adapt.NormalizeDescriptor
        // to unify them into a single self-contained descriptor representation.
        m := &exampleproto.SampleData{}
        descriptorProto, err := adapt.NormalizeDescriptor(m.ProtoReflect().Descriptor())
        if err != nil {
                return fmt.Errorf("NormalizeDescriptor: %w", err)
        }

        // Instantiate a ManagedStream, which manages low level details like connection state and provides
        // additional features like a future-like callback for appends, etc.  NewManagedStream can also create
        // the stream on your behalf, but in this example we're being explicit about stream creation.
        managedStream, err := client.NewManagedStream(ctx, managedwriter.WithStreamName(pendingStream.GetName()),
                managedwriter.WithSchemaDescriptor(descriptorProto))
        if err != nil {
                return fmt.Errorf("NewManagedStream: %w", err)
        }
        defer managedStream.Close()

OneOf fields skipped as "synthetic" but still marshalled by the Marshaller

We have upgraded from v0.25.0 to a higher version lately and see issues with newly created tables.
We use options.InferSchema(msg) to create and update schemas for our tables.
We use options.Marshal(msg) to marshal the messages.

With version v.26.0+, some rows cannot be ingested and BQ rejects them with an error:

some.field._some_one_of: no such field: _some_one_of."; Reason: "invalid"

We use these Options:

protobq.SchemaOptions{
	UseEnumNumbers:           false,
	UseOneofFields:           true,
	UseDateTimeWithoutOffset: true,
}

When calling options.Marshal(msg), the nested map[string]bigquery.Value will contain the _some_one_of field.
When calling options.InferSchema(msg) the nested bigquery.Schema will not contain the _some_one_of field anymore.

When I disable this change:

  // if oneof.IsSynthetic() {
  // 	continue
  // }

everything works as expected.

How do you handle this case?

`bool` set to `false` is skipped during marshal

If you have a bool field in the proto.Message and this is set to false it will not be part of the output map[string]bigquery.Value during marshalling. This means that when inserting the data into BigQuery this column will get the value null.

I think null would be fine if the type was the wrapper BoolValue but for bool it should resolve to false in the end.

I modified the existing test in example_test.go

func TestMarshal(t *testing.T) {
	msg := &library.Book{
		Name:   "publishers/123/books/456",
		Author: "P.L. Travers",
		Title:  "Mary Poppins",
		Read:   false,
	}
	row, err := protobq.Marshal(msg)

	if err != nil {
		// TODO: Handle error.
	}
	expected := map[string]bigquery.Value{
		"name":   "publishers/123/books/456",
		"author": "P.L. Travers",
		"title":  "Mary Poppins",
		"read":   false,
	}

	fmt.Println("Expected:", expected)
	fmt.Println("Got:", row)
	fmt.Println("Equal:", cmp.Equal(expected, row))
}

Output

Expected: map[author:P.L. Travers name:publishers/123/books/456 read:false title:Mary Poppins]
Got: map[author:P.L. Travers name:publishers/123/books/456 title:Mary Poppins]
Equal: false

I can't find any MarshalOptions to control this behaviour.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.