GithubHelp home page GithubHelp logo

kedro-inspect's Introduction

kedro-inspect

Overview

The single objective of kedro-inspect is to decouple the representation of a Kedro pipeline from its implementation and execution. This is useful for inspecting the pipeline without having access to the Kedro project or setting up dependencies that are only needed when running the pipeline.

Once we isolate the pipeline representation, we can use it for various purposes, such as analysing its structure, document it, or share it with others.

This representation can be saved to a static file (e.g. JSON). Then, the saved pipeline can be visualized using the Kedro-Viz package, or any other tool (written in any programming language) that can read the pipeline file format.

Inspection

The plan is to inspect the pipeline better, i.e. add more information to the pipeline representation over time, such as fine-grained type information or package dependencies per node.

This added information can be useful for various purposes, such as:

  • Generating documentation & schemas for the pipeline
  • Visualisation
  • Optimising pipeline execution
  • Generating a pipeline test suite

Compare to current Kedro functionality

Kedro provides serialisation of the pipeline. The crucial difference is that kedro-inspect does not require the Kedro project, hence can be used without setting up the project or its dependencies.

Usage

usage: kedro-inspect [-h] [-p PIPELINE] [-o OUTPUT] [--indent INDENT] project_path

Inspect a Kedro pipeline.

positional arguments:
  project_path          path to the Kedro project

optional arguments:
  -h, --help            show this help message and exit
  -p PIPELINE, --pipeline PIPELINE
                        name of the pipeline to inspect (default: __default__)
  -o OUTPUT, --output OUTPUT
                        path to the output file (default: None)
  --indent INDENT       indentation for JSON output (default: None)

Running kedro-inspect on spaceflights-pandas, we get a list of representations of the nodes in the pipeline. For example, the first node is represented as follows:

"nodes": [
        {
            "name": "preprocess_companies_node",
            "tags": [],
            "confirms": [],
            "namespace": null,
            "inputs": "companies",
            "outputs": "preprocessed_companies",
            "function": {
                "func": "spaceflights_pandas.pipelines.data_processing.nodes.preprocess_companies",
                "parameters": [
                    {
                        "name": "companies",
                        "kind": "POSITIONAL_OR_KEYWORD",
                        "type_hint": "pandas.core.frame.DataFrame"
                    }
                ],
                "return_value": "pandas.core.frame.DataFrame"
            },
            "param_to_input": {
                "companies": [
                    "companies"
                ]
            }
        },
        ...
]

kedro-inspect's People

Contributors

alparibal avatar

Stargazers

Nok Lam Chan avatar Joel avatar

Watchers

 avatar

Forkers

datajoely

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.