GithubHelp home page GithubHelp logo

ascend-great-expectations-gcs's Introduction

Ascend.io / Great Expectations docker image for Google Cloud storage

This image is a wrapper around official Ascend.io image to use Great Expectations validation tool.

Build the docker image

The image is built with Github action located in this file : .github/workflows/docker-build.yaml.

For now the image is pushed on docker hub at this address: fosk06/ascend-great-expectations-gcs:latest

The image is build on push on main branch and with git tag with the following form "v{X}.{Y}.{Z}"

With:

  • X = Major version
  • Y = Minor version
  • Z = Correction version

Use it on ascend.io platform

This docker image is built for PySpark transforms on Ascend.io platform. First you need a Google cloud storage bucket named for example "great_expectations_store" and a service account with the role "storage.admin" on this bucket. Then upload this service account as a credentials on your Ascend.io instance and name it for example "great_expectations_sa".

Now you can create your PySpark transform on Ascend.io. In the advanced settings> Runtime settings > container image URL set the correct docker hub image url : fosk06/ascend-great-expectations-gcs:latest

Then in the "Custom Spark Params" click on "require credentials" and chose you credential previously uploaded "great_expectations_sa".

Write the PySpark transform

# import the custom package
from ascend_great_expectations_gcs.validator import Validator

# lets admit we are working on a "customer" table, write the expectations in specific function
def expectations(validator):
  validator.expect_column_to_exist("customer_id")
  validator.expect_column_values_to_not_be_null("customer_id")
  validator.expect_column_to_exist('created_at')

# Ascend.io transform callback
def transform(spark_session: SparkSession, inputs: List[DataFrame], credentials=None):
    df = inputs[0]
    # instanciate the validator
    validator = Validator(
        name= NAME, # name of the validator
        gcp_project=PROJECT, # your GCP project
        bucket=BUCKET, # the name of your GCP bucket, for example "great_expectations_store"
        credentials=credentials, # credentials from the transform callback
    )
    validator.add_expectations(expectations)
    validator.run(df)
    return df

test the class

create a virtual env then in ./venv/lib/python3.9/site-package write those two files

ascend_great_expectations_gcs_test.pth => set you package folder path ascend_great_expectations_gcs.pth => set you package folder path here

https://webdevdesigner.com/q/how-do-you-set-your-pythonpath-in-an-already-created-virtualenv-55773/

git tag -d v1.4.1 git push --delete origin v1.4.1

ascend-great-expectations-gcs's People

Contributors

fosk06 avatar ps-remi avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

isabella232

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.