GithubHelp home page GithubHelp logo

Comments (4)

buzzware avatar buzzware commented on June 5, 2024

I see a similar question in #307 but it seems to have been closed without an answer

from avsc.

buzzware avatar buzzware commented on June 5, 2024

This works with to/fromBuffer, but fails when I use BlockEncoder - will submit a test case later. It seems I need to override _resolve() but how do I implement it ?
I see a lot of test cases using to/fromBuffer but none using BlockEncoder with logical types.

It would be nice if I could just :

  1. override StringType and add attributes, but only LogicalType supports attributes
    OR
  2. just provide the exact json schema I want and have the library write it verbatim, perhaps with a rawSchema flag.
class GoogleJson extends LogicalType {

	_export(attrs) {
		attrs.sqlType = 'JSON'
	};

	_toValue(input) {
		return JSON.stringify(input);
	}

	_fromValue(input) {
		return JSON.parse(input);
	}
}

const schema = {
	name: 'Thing',
	type: 'record',
	fields: [
		{name: 'amount', type: 'int'},
		{name: 'calc', type: {type: 'string', logicalType: 'google-json'}}
	]
};


const thingAvroType = Type.forSchema(
	//@ts-ignore
	schema,
	{logicalTypes: {'google-json': GoogleJson}}
);


describe('GoogleJson', () => {

	it('buffer', async () => {
		const thing = {
			amount: 32,
			calc: {a: 1, b: 2}
		};
		const buf = thingAvroType.toBuffer(thing);
		const thing2 = thingAvroType.fromBuffer(buf);
		expect(thing2).toMatchObject(thing);
	});
});

from avsc.

mtth avatar mtth commented on June 5, 2024

Hi @buzzware. The best option with decorated schemas is typically to keep a copy alongside the generated type and reference it directly when the custom attributes are needed. However this doesn't work well with BlockEncoders which expect a single schema or type argument currently. I think it would be worth extending the BlockEncoder API to better support this, for example via an additional schema option. In the meantime here are a couple workarounds:

  1. If you don't need any type options when writing records, you can pass in the raw schema directly to the BlockEncoder which will then be written as-is.
  2. If you do need type options, it's a bit trickier but you can use two encoders where the first will only write the header. Something like:
async function pipedBlockEncoder(type, schema, writable) {
  const syncMarker = crypto.randomBytes(16);
  // Header-only encoder (note the schema argument)
  const prelude = new BlockEncoder(schema, {writeHeader: true, syncMarker});
  prelude.end();
  await pipeline(prelude, writable, {end: false}); // from node:stream/promises
  // Data encoder (we pass in the type here, not the schema)
  const content = new BlockEncoder(type, {writeHeader: false, syncMarker});
  content.pipe(writable);
  return content;
}

from avsc.

buzzware avatar buzzware commented on June 5, 2024

Thanks @mtth, I just got it working for the first time with that, including import to BigQuery with an auto-generated JSON column.

import {createFileDecoder, createFileEncoder, Schema, schema, streams, Type, types,} from "avsc";
import fs = require("fs");
import path = require("path");
import BlockEncoder = streams.BlockEncoder;
import {randomBytes} from "crypto";
const { finished, pipeline } = require('node:stream/promises');

describe('GoogleJson', () => {

	async function pipedBlockEncoder(type, schema, writable) {
		const syncMarker = randomBytes(16);
		// Header-only encoder (note the schema argument)
		const prelude = new BlockEncoder(schema, {writeHeader: true, syncMarker});
		prelude.end();
		await pipeline(prelude, writable, {end: false}); // from node:stream/promises
		// Data encoder (we pass in the type here, not the schema)
		const content = new BlockEncoder(type, {writeHeader: false, syncMarker});
		content.pipe(writable);
		return content;
	}

	it('mtth file example', async () => {
		const thing = {
			amount: 32,
			calc: JSON.stringify({a: 1, b: 2})
		};
		const testFile = '/Users/gary/Downloads/avro_test.avro';
		fs.rmSync(testFile,{force: true});

		const schema: Schema = {
			name: 'Thing',
			type: 'record',
			fields: [
				{name: 'amount', type: 'int'},
				{name: 'calc', type: {type: 'string', sqlType: 'JSON'}}
			]
		};
		const type = Type.forSchema(schema);
		let writeable = fs.createWriteStream(testFile, {encoding: 'binary'});
		let encoder = await pipedBlockEncoder(type,schema,writeable)
		encoder.write(thing);
		encoder.write(thing);
		encoder.write(thing);
		encoder.end();
		await finished(writeable);
		console.log('end');
	});
})

from avsc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.