This simple maven project is a utility project to build Avro schema from an Google Bigquery table and to write any JSON data string into a GenericRecord (AVRO format) so that it is also useful to translate TableRow to GenericRecord when you have to read from Bigquery and write in AVRO format.
In Google Cloud Platform there is no such utility so this would help in case you need.
version: DRAFT
In a transformation process:
public void processElement(ProcessContext c) throws IOException {
final TableRow row = c.element();
final GenericRecord record = new JsonGenericRecordReader().read(row, schema);
c.output(record);
}
Table schema conversion:
private final BigQueryToAvroSchema converter = BigQuerySchemaConverter.getInstance();
private Schema getAvroSchema(TableSchema tableSchema) {
final Table table = getTableWithSchema();
final Schema schema = converter.toAvroSchema(table);
return schema;
}
Avro schema conversion:
public TableSchema getTableSchema(GenericRecord message, TableSchema tableSchema) {
final TableSchema tableSchema = AvroSchemaConverter.toTableSchema(avroSchema, "firstRecor", "secondField" "internalField");
return tableSchema;
}
TableRow transformation from Avro record:
public TableRow getTableRow(GenericRecord message, TableSchema tableSchema) {
final TableRow tableRow = AvroUtils.convertGenericRecordToTableRow(message, tableSchema);
return tableRow;
}
Seen this has been tested on some internal Google Bigquery tables, any review, bug or feedback is welcome so please contribute to it writing here.