GithubHelp home page GithubHelp logo

isabella232 / kirby-schema-generator Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bonniernews/kirby-schema-generator

0.0 0.0 0.0 5.47 MB

Generates the BigQuery schema from newline-delimited JSON or CSV data records.

License: Apache License 2.0

Python 99.79% Makefile 0.21%

kirby-schema-generator's Introduction

Kirby Schema Generator

Generates schema files that are used for describing data that is ingested by Kirby.

Usage

A sample data file is required for analyzing which will be used when generating the final schema.

$ python3 generate_schema.py <sample_data_file> --encryption_key_id <key_column_name> --personal_columns <column_names> 

sample_data_file: Path to the sample file to be analyzed.

encryption_key_id: If dataset is to be encrypted the column to use for keys, usually the user id.

personal_columns: Columns that contain personal information.

This will return the generated schema to stdout. To save it to a file:

$ python3 generate_schema.py <sample_data_file> [options] > schema.json

For a description on all options, use:

$ python3 generate_schema.py --help

Schema column description

Column descriptions have to be added to the schema manually after it has been generated.

Examples

From a CSV sample file

users.csv:

user_id, name, email, subscription_start, subscription_end
1000, Kirby Kirbysson, [email protected], 2019-01-02, 2021-02-01
1001, Luigi Plumberson, [email protected], 2019-03-01, 2019-04-01
$ python3 generate_schema.py users.csv --encryption_key_id user_id --personal_columns name email

From a JSON sample file

users.json:

{
  "user_id": 1000,
  "name": "Kirby Kirbysson",
  "email": "[email protected]",
  "subscription_start": "2019-01-02",
  "subscription_end": "2021-02-01"
}
{
  "user_id": 1001,
  "name": "Luigi Plumberson",
  "email": "[email protected]",
  "subscription_start": "2019-03-01",
  "subscription_end": "2019-04-01"
}
$ python3 generate_schema.py users.json --input_format json --encryption_key_id user_id --personal_columns name email

TODO

  • Add support for nested JSON properties

kirby-schema-generator's People

Contributors

bxparks avatar korotkevics avatar jtschichold avatar abroglesc avatar jonwarghed avatar sofiebn avatar de-code avatar frans-k avatar falderin-bn avatar janfang-bn avatar jolin1337 avatar riccardomc avatar ziggerzz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.