For avro files destined for BigQuery import, Google suggests defining the type as

I see a similar question in <a class="issue-link js-issue-link" data-error-text="Faile

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

How to add custom attributes to serialised schema type eg. sqlType about avsc HOT 4 CLOSED

buzzware commented on June 5, 2024

How to add custom attributes to serialised schema type eg. sqlType

from avsc.

Comments (4)

buzzware commented on June 5, 2024

I see a similar question in #307 but it seems to have been closed without an answer

from avsc.

buzzware commented on June 5, 2024

This works with to/fromBuffer, but fails when I use BlockEncoder - will submit a test case later. It seems I need to override _resolve() but how do I implement it ?
I see a lot of test cases using to/fromBuffer but none using BlockEncoder with logical types.

It would be nice if I could just :

override StringType and add attributes, but only LogicalType supports attributes
OR
just provide the exact json schema I want and have the library write it verbatim, perhaps with a rawSchema flag.

class GoogleJson extends LogicalType {

	_export(attrs) {
		attrs.sqlType = 'JSON'
	};

	_toValue(input) {
		return JSON.stringify(input);
	}

	_fromValue(input) {
		return JSON.parse(input);
	}
}

const schema = {
	name: 'Thing',
	type: 'record',
	fields: [
		{name: 'amount', type: 'int'},
		{name: 'calc', type: {type: 'string', logicalType: 'google-json'}}
	]
};


const thingAvroType = Type.forSchema(
	//@ts-ignore
	schema,
	{logicalTypes: {'google-json': GoogleJson}}
);


describe('GoogleJson', () => {

	it('buffer', async () => {
		const thing = {
			amount: 32,
			calc: {a: 1, b: 2}
		};
		const buf = thingAvroType.toBuffer(thing);
		const thing2 = thingAvroType.fromBuffer(buf);
		expect(thing2).toMatchObject(thing);
	});
});

from avsc.

mtth commented on June 5, 2024

Hi @buzzware. The best option with decorated schemas is typically to keep a copy alongside the generated type and reference it directly when the custom attributes are needed. However this doesn't work well with BlockEncoders which expect a single schema or type argument currently. I think it would be worth extending the BlockEncoder API to better support this, for example via an additional schema option. In the meantime here are a couple workarounds:

If you don't need any type options when writing records, you can pass in the raw schema directly to the BlockEncoder which will then be written as-is.
If you do need type options, it's a bit trickier but you can use two encoders where the first will only write the header. Something like:

async function pipedBlockEncoder(type, schema, writable) {
  const syncMarker = crypto.randomBytes(16);
  // Header-only encoder (note the schema argument)
  const prelude = new BlockEncoder(schema, {writeHeader: true, syncMarker});
  prelude.end();
  await pipeline(prelude, writable, {end: false}); // from node:stream/promises
  // Data encoder (we pass in the type here, not the schema)
  const content = new BlockEncoder(type, {writeHeader: false, syncMarker});
  content.pipe(writable);
  return content;
}

from avsc.

buzzware commented on June 5, 2024

Thanks @mtth, I just got it working for the first time with that, including import to BigQuery with an auto-generated JSON column.

import {createFileDecoder, createFileEncoder, Schema, schema, streams, Type, types,} from "avsc";
import fs = require("fs");
import path = require("path");
import BlockEncoder = streams.BlockEncoder;
import {randomBytes} from "crypto";
const { finished, pipeline } = require('node:stream/promises');

describe('GoogleJson', () => {

	async function pipedBlockEncoder(type, schema, writable) {
		const syncMarker = randomBytes(16);
		// Header-only encoder (note the schema argument)
		const prelude = new BlockEncoder(schema, {writeHeader: true, syncMarker});
		prelude.end();
		await pipeline(prelude, writable, {end: false}); // from node:stream/promises
		// Data encoder (we pass in the type here, not the schema)
		const content = new BlockEncoder(type, {writeHeader: false, syncMarker});
		content.pipe(writable);
		return content;
	}

	it('mtth file example', async () => {
		const thing = {
			amount: 32,
			calc: JSON.stringify({a: 1, b: 2})
		};
		const testFile = '/Users/gary/Downloads/avro_test.avro';
		fs.rmSync(testFile,{force: true});

		const schema: Schema = {
			name: 'Thing',
			type: 'record',
			fields: [
				{name: 'amount', type: 'int'},
				{name: 'calc', type: {type: 'string', sqlType: 'JSON'}}
			]
		};
		const type = Type.forSchema(schema);
		let writeable = fs.createWriteStream(testFile, {encoding: 'binary'});
		let encoder = await pipedBlockEncoder(type,schema,writeable)
		encoder.write(thing);
		encoder.write(thing);
		encoder.write(thing);
		encoder.end();
		await finished(writeable);
		console.log('end');
	});
})

from avsc.

How to add custom attributes to serialised schema type eg. sqlType about avsc HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs