GithubHelp home page GithubHelp logo

responsibleaiml / kedro-auto-catalog Goto Github PK

View Code? Open in Web Editor NEW

This project forked from waylonwalker/kedro-auto-catalog

0.0 0.0 0.0 36 KB

Kedro catalog create with default configuration

License: MIT License

Python 100.00%

kedro-auto-catalog's Introduction

Kedro Auto Catalog

A configurable version of the built in kedro catalog create cli. Default types can be configured in the projects settings.py, to get these types rather than MemoryDataSets.

PyPI - Version PyPI - Python Version


Table of Contents

Installation

pip install kedro-auto-catalog

Configuration

Configure the project defaults in src/<project_name>/settings.py with this dict.

AUTO_CATALOG = {
    "directory": "data",
    "subdirs": ["raw", "intermediate", "primary"],
    "layers": ["raw", "intermediate", "primary"],
    "default_extension": "parquet",
    "default_type": "pandas.ParquetDataSet",
}

Usage

To auto create catalog entries for the __default__ pipeline, run this from the command line.

kedro auto-catalog -p __default__

If you want a reminder of what to do, use the --help.

❯ kedro auto-catalog --help❯
Usage: kedro auto-catalog [OPTIONS]

  Create Data Catalog YAML configuration with missing datasets.

  Add configurable datasets to Data Catalog YAML configuration file for each
  dataset in a registered pipeline if it is missing from the `DataCatalog`.

  The catalog configuration will be saved to
  `<conf_source>/<env>/catalog/<pipeline_name>.yml` file.

  Configure the project defaults in `src/<project_name>/settings.py` with this
  dict.

Options:
  -e, --env TEXT       Environment to create Data Catalog YAML file in.
                       Defaults to `base`.
  -p, --pipeline TEXT  Name of a pipeline.  [required]
  -h, --help           Show this message and exit.

Example

Using the kedro-spaceflights example, running kedro auto-catalog -p __default__ yields the following catalog in conf/base/catalog/__default__.yml

X_test:
  filepath: data/X_test.pq
  type: pandas.ParquetDataSet
X_train:
  filepath: data/X_train.pq
  type: pandas.ParquetDataSet
y_test:
  filepath: data/y_test.parquet
  type: pandas.ParquetDataSet
y_train:
  filepath: data/y_train.parquet
  type: pandas.ParquetDataSet

subdirs and layers

If we use the example configuration with "subdirs": ["raw", "intermediate", "primary"] and "layers": ["raw", "intermediate", "primary"], it will convert any leading subdir/layer in your dataset name into a directory. If we change y_test to raw_y_test, it will put y_test.parquet in the raw directory, and in the raw layer.

raw_y_test:
  filepath: data/raw/y_test.parquet
  layer: raw
  type: pandas.ParquetDataSet

License

kedro-auto-catalog is distributed under the terms of the MIT license.

kedro-auto-catalog's People

Contributors

waylonwalker avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.