GithubHelp home page GithubHelp logo

dqops / dqo Goto Github PK

View Code? Open in Web Editor NEW
54.0 5.0 12.0 72.13 MB

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

Home Page: https://dqops.com/docs/

License: Apache License 2.0

Python 18.30% Shell 0.06% Batchfile 0.05% HCL 0.01% Java 72.79% Jinja 4.09% Dockerfile 0.01% JavaScript 0.01% HTML 0.01% TypeScript 4.58% CSS 0.01% Handlebars 0.08%
data-ops data-quality data-quality-checks data-quality-measurement data-quality-monitoring data-quality-report monitoring data-observability data-profiling

dqo's Introduction

DQOps Data Quality Operations Center

DQOps is an DataOps friendly data quality monitoring tool with customizable data quality checks and data quality dashboards. DQOps comes with around 150 predefined data quality checks which helps you monitor the quality of your data.

DQOps screens

Key features

  • Intuitive graphical interface and access via CLI
  • Support of a number of different data sources: BigQuery, Snowflake, PostgreSQL, Redshift, SQL Server and MySQL
  • ~150 build-in table and column checks with easy customization
  • Table and column-level checks which allows writing your own SQL queries
  • Daily and monthly date partition testing
  • Data grouping by up to 9 different data grouping levels
  • Build-in scheduling
  • Calculation of data quality KPIs which can be displayed on multiple built-in data quality dashboards
  • Data quality incident management and notifications

Installation

To use DQOps you need:

  • Python version 3.8 or greater (for details see Python's documentation and download sites).
  • Ability to install Python packages with pip.
  • If you want to compile DQOps locally, also Java JDK (version 17 or higher), and a configured JAVA_HOME environment variable.

DQOps is available on PyPi repository.

  1. To install DQOps via pip manager just run

    python -m pip install --user dqops
    

    If you prefer to work with the source code, just clone our GitHub repository https://github.com/dqops/dqo and run

  2. Run dqops app to finalize the installation.

    python -m dqops
    
  3. Create DQOps userhome folder.

    After installation, you will be asked whether to initialize the DQOps user's home folder in the default location. Type Y to create the folder.
    The user's home folder locally stores data such as sensor readouts and the data quality check results, as well as data source configurations. You can learn more about data storage here.

  4. Login to DQOps Cloud.

    To use DQOps features, such as storing data quality definitions and results in the cloud or data quality dashboards, you must create a DQOps cloud account.

    After creating a user's home folder, you will be asked whether to log in to the DQO cloud. After typing Y, you will be redirected to https://cloud.dqops.com/registration, where you can create a new account, use Google single sign-on (SSO) or log in if you already have an account.

    During the first registration, a unique identification code (API Key) will be generated and automatically retrieved by DQOps application. The API Key is now stored in the configuration file.

  5. Open the DQOps User Interface Console in your browser by CTRL-clicking on the link displayed on the command line (for example http://localhost:8888) or by copying the link.

What you can do with DQOps

DQOps is designed as the primary platform for data quality teams, and for all data engineering or data science teams who want to apply data quality for their data platforms.

The following list shows selected use cases, with examples and best practices.

The following examples also show the whole process of configuring data quality checks, both using YAML files, or using the DQOps user interface.

DQOps client

You can integrate DQOps into data pipelines and ML pipelines by calling a Python client for DQOps. Install the client as a Python package:

python -m pip install --user dqops

The dqops package contains a remote client that can connect to a DQOps instance and perform all operations supported by the user interface. The DQOps client could be used inside data pipelines or data preparation code to verify the quality of tables.

You can use the unauthenticated client to connect to a local DQOps instance from your data pipeline code. First, create the client object.

from dqops import client

dqops_client = client.Client(base_url="http://localhost:8888")

Alternatively, if you are connecting to a production instance of DQOps that has authentication enabled, you have to open the user's profile screen in DQOps and generate your DQOps API Key. Then take the key and use it as the token, when creating an AuthenticatedClient instead.

from dqops import client

dqops_client = client.AuthenticatedClient(base_url="http://localhost:8888", token="Your DQO API Key")

Now, you can call operations on DQOps. The following code shows how to execute data quality checks on data sources that are already registered in DQOps.

from dqops.client.api.jobs import run_checks
from dqops.client.models import CheckSearchFilters, \
                              RunChecksParameters


request_body = RunChecksParameters(
  check_search_filters=CheckSearchFilters(
      column='sample_column',
      column_data_type='string',
      connection='sample_connection',
      full_table_name='sample_schema.sample_table',
      enabled=True
  )
)

check_results = run_checks.sync(
  client=dqops_client,
  json_body=request_body
)

The run_checks operation returns a summary of executed data quality checks and the highest data quality issue severity level. In the following example, the most severe issue was at an error severity level.

{
  "jobId" : {
    "jobId" : 123456789,
    "createdAt" : "2023-10-11T13:42:00Z"
  },
  "result" : {
    "highest_severity" : "error",
    "executed_checks" : 10,
    "valid_results" : 7,
    "warnings" : 1,
    "errors" : 2,
    "fatals" : 0,
    "execution_errors" : 0
  },
  "status" : "succeeded"
}

Learn more about the DQOps Python client in the DQOps REST API client reference documentation that shows Python code examples for every operation supported by the client.

Documentation

For full documentation with guides and use cases, visit https://dqops.com/docs/

The getting started guide shows how to start using DQOps.

Also, read the DQOps concept guide to know how DQOps operates, and how to configure data quality checks.

Contact and issues

If you find any issues with the tool, just post it here:

https://github.com/dqops/dqo/issues

or contact us via https://dqops.com/

dqo's People

Contributors

andreijanowski avatar documati avatar dqops avatar fibonafide avatar joankal avatar joanna-kalinowska-pg avatar lazarmiuchin avatar mackrack avatar mj426382 avatar nephasia avatar patrykjay-dsstream avatar pawel-duda avatar piotrczarnas avatar psychologianauki avatar radoslawnowak avatar rklimek123-dqo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dqo's Issues

Able to host in our own GCP Org?

Looking at what you are offering here & think it's wonderful. Kudos for the level of detail & knowledge sharing put in across the board - not unnoticed!

I am Dir of Data Science & MLops with an org committed to automating & proactively responding to data incidents. We are a GCP, cloud-first group.

My question is, rather than joining DQOPS GCP resources, are we able to set up shop within our organization? Meaning not just the front end, but everything??

I haven't been able to find anything speaking to that on the resources you have share - but if that's a possibility I'd love to learn more.

I'm guessing the initial reaching is something like, it's not just one service or tool it's integrated widely across the board. Compute, messaging, hosting etc...which would be understandable certainly.

That said, this is what we do on a daily basis (manage massive data & provide that to customers end-to-end. I'm not worried about handing to /stand up, maintain & sustain on our own.

So if you could share a bit about what that would look like, or perhaps we could jump on a call to do the same.

Appreciate it! ๐Ÿ”ฅ

Error trying to add DQOps connection to DuckDB using the CLI

dqo> connection add --duckdb-directories=<"path"="/folder1/subfolder2/my-salesforce-pipeline">

Command failed, error message: Cannot invoke "com.dqops.cli.terminal.TerminalFactory.getReader()" because "this.terminalFactory" is null

Not sure what the proper syntax for the console command.

I tried to do a similar thing using the Web interface and got this error:

com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Invalid Input Error: Unrecognized configuration property "path"

Thanks.

Error adding MS SQL SERVER

Command failed, error message: Cannot invoke "com.dqops.connectors.sqlserver.SqlServerAuthenticationMode.toString()" because "cloned.authenticationMode" is null

Please enter one of the [] values: 10
SQL Server host name (--sqlserver-host) [${SQLSERVER_HOST}]: 00000
SQL Server port number (--sqlserver-port) [${SQLSERVER_PORT}]: 1433
SQL Server database name (--sqlserver-database) [${SQLSERVER_DATABASE}]: mydatabase
SQL Server user name (--sqlserver-user) [${SQLSERVER_USER}]: myuser
SQL Server user password (--sqlserver-password) [${SQLSERVER_PASSWORD}]: mypassword
Disable SSL encryption (--sqlserver-disable-encryption) [y,N]: Y
Connection CIGAM was successfully added.
Run 'table import -c=mydatabase' to import tables.
dqo> table import -c=mydatabase
Command failed, error message: Cannot invoke "com.dqops.connectors.sqlserver.SqlServerAuthenticationMode.toString()" because "cloned.authenticationMode" is null
dqo>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.