GithubHelp home page GithubHelp logo

odysseusinc / dataqualitydashboard Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ohdsi/dataqualitydashboard

0.0 1.0 0.0 5.08 MB

A tool to help improve data quality standards in observational data science.

Home Page: https://ohdsi.github.io/DataQualityDashboard

License: Apache License 2.0

R 5.36% CSS 0.76% JavaScript 90.70% HTML 1.51% TSQL 1.43% Perl 0.24%

dataqualitydashboard's Introduction

DataQualityDashboard

Build Status codecov.io

The DataQualityDashboard is a tool to help improve data quality standards in observational data science.

Introduction

An R package for characterizing the data quality of a person-level data source that has been converted into the OMOP CDM 5.3.1 format.

Features

  • Utilizes configurable data check thresholds
  • Analyzes data in the OMOP Common Data Model format for all data checks
  • Produces a set of data check results with supplemental investigation assets.

Technology

DataQualityDashboard is an R package

System Requirements

Requires R (version 3.2.2 or higher). Requires DatabaseConnector and SqlRender.

R Installation

install.packages("devtools")
devtools::install_github("OHDSI/DataQualityDashboard")

Executing Data Quality Checks

# fill out the connection details -----------------------------------------------------------------------
connectionDetails <- DatabaseConnector::createConnectionDetails(dbms = "", 
                                                              user = "", 
                                                              password = "", 
                                                              server = "", 
                                                              port = "", 
                                                              extraSettings = "")

cdmDatabaseSchema <- "yourCdmSchema" # the fully qualified database schema name of the CDM
resultsDatabaseSchema <- "yourResultsSchema" # the fully qualified database schema name of the results schema (that you can write to)
cdmSourceName <- "Your CDM Source" # a human readable name for your CDM source

# determine how many threads (concurrent SQL sessions) to use ----------------------------------------
numThreads <- 1 # on Redshift, 3 seems to work well

# specify if you want to execute the queries or inspect them ------------------------------------------
sqlOnly <- FALSE # set to TRUE if you just want to get the SQL scripts and not actually run the queries

# where should the logs go? -------------------------------------------------------------------------
outputFolder <- "output"

# logging type -------------------------------------------------------------------------------------
verboseMode <- FALSE # set to TRUE if you want to see activity written to the console

# write results to table? ------------------------------------------------------------------------------
writeToTable <- TRUE # set to FALSE if you want to skip writing to a SQL table in the results schema

# if writing to table and using Redshift, bulk loading can be initialized -------------------------------

# Sys.setenv("AWS_ACCESS_KEY_ID" = "",
#            "AWS_SECRET_ACCESS_KEY" = "",
#            "AWS_DEFAULT_REGION" = "",
#            "AWS_BUCKET_NAME" = "",
#            "AWS_OBJECT_KEY" = "",
#            "AWS_SSE_TYPE" = "AES256",
#            "USE_MPP_BULK_LOAD" = TRUE)

# which DQ check levels to run -------------------------------------------------------------------
checkLevels <- c("TABLE", "FIELD", "CONCEPT")

# which DQ checks to run? ------------------------------------

checkNames <- c() # Names can be found in inst/csv/OMOP_CDM_v5.3.1_Check_Desciptions.csv

# run the job --------------------------------------------------------------------------------------
DataQualityDashboard::executeDqChecks(connectionDetails = connectionDetails, 
                                    cdmDatabaseSchema = cdmDatabaseSchema, 
                                    resultsDatabaseSchema = resultsDatabaseSchema,
                                    cdmSourceName = cdmSourceName, 
                                    numThreads = numThreads,
                                    sqlOnly = sqlOnly, 
                                    outputFolder = outputFolder, 
                                    verboseMode = verboseMode,
                                    writeToTable = writeToTable,
                                    checkLevels = checkLevels,
                                    checkNames = checkNames)

# inspect logs ----------------------------------------------------------------------------
ParallelLogger::launchLogViewer(logFileName = file.path(outputFolder, cdmSourceName, 
                                                      sprintf("log_DqDashboard_%s.txt", cdmSourceName)))

# (OPTIONAL) if you want to write the JSON file to the results table separately -----------------------------
jsonFilePath <- ""
DataQualityDashboard::writeJsonResultsToTable(connectionDetails = connectionDetails, 
                                            resultsDatabaseSchema = resultsDatabaseSchema, 
                                            jsonFilePath = jsonFilePath)
                                            

Viewing Results

Launching Dashboard as Shiny App

DataQualityDashboard::viewDqDashboard(jsonPath = file.path(getwd(), outputFolder, cdmSourceName, sprintf("results_%s.json", cdmSourceName)))

Launching on a web server

If you have npm installed:

  1. Install http-server:
npm install -g http-server
  1. Rename the json file to results.json and place it in inst/shinyApps/www

  2. Go to inst/shinyApps/www, then run:

http-server

A results JSON file for the Synthea synthetic dataset will be shown. You can view your results by replacing the results.json file with your file (with name results.json).

View checks

To see description of checks using R, execute the command bellow:

View(read.csv(system.file("csv","OMOP_CDMv5.3.1_Check_Descriptions.csv",package="DataQualityDashboard"),as.is=T))

Support

License

DataQualityDashboard is licensed under Apache License 2.0

Development status

In early development phase. Not ready for use.

Acknowledgements

This project is supported in part through the National Science Foundation grant IIS 1251151.

dataqualitydashboard's People

Contributors

anthonymolinaro avatar anthonysena avatar clairblacketer avatar cukarthik avatar fdefalco avatar pbr6cornell avatar vojtechhuser avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.