GithubHelp home page GithubHelp logo

agile-lab-dev / dataquality Goto Github PK

View Code? Open in Web Editor NEW
138.0 138.0 50.0 4.22 MB

DataQuality for BigData

License: GNU Lesser General Public License v3.0

Scala 76.93% Shell 0.87% PLpgSQL 0.50% CSS 0.70% HTML 6.29% TypeScript 13.99% JavaScript 0.73%

dataquality's People

Contributors

agile-lab avatar emakhov avatar erond avatar noiano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataquality's Issues

Is fileType = csv supported?

I have tried to run ExampleCustomer.conf configuration and after getting the exception:
18/01/14 20:27:13 WARN apps.DQMasterBatch$: ************************************************************************
Exception in thread "main" it.agilelab.bigdata.DataQuality.exceptions.IllegalParameterException: Unknown parameters = csv
at it.agilelab.bigdata.DataQuality.configs.ConfigReader$$anonfun$getSourcesById$1.apply(ConfigReader.scala:81)
at it.agilelab.bigdata.DataQuality.configs.ConfigReader$$anonfun$getSourcesById$1.apply(ConfigReader.scala:61)

I run into the code and it looks like only "fixed" fileType is supported:

        val schema    = generalConfig.getString("fileType") match {
          case "fixed"   => {
            if      (Try(generalConfig.getObjectList("schema")).isSuccess) getFixedStructSchema(generalConfig)
            else if (Try(generalConfig.getStringList("fieldLengths")).isSuccess) getFixedSchema(generalConfig)
            else {
              val allKeys = generalConfig.entrySet().map(_.getKey)
              throw new IllegalParameterException("\n CONFIG: "+allKeys.mkString(" - "))
            }
          }
          case x => throw new IllegalParameterException(x)
        }

Is there any other more updated branch than master? Thanks!

SERVER ERROR: Gateway Time-out

Hi emakhov,
I am onto POC for using DQ tool for my organization. I am trying to setup DQ code base into my local machine/gcp, but I am stuck with following errors.

[error] SERVER ERROR: Gateway Time-out url=https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/org.webjars.npm/graceful-fs/
[error] SERVER ERROR: Gateway Time-out url=https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/org.webjars.npm/jsonfile/
[error] Server access Error: Operation timed out (Connection timed out) url=https://repo1.maven.org/maven2/org/webjars/npm/minimatch/maven-metadata.xml
[error] Server access Error: Operation timed out (Connection timed out) url=https://repo1.maven.org/maven2/org/webjars/npm/semver/maven-metadata.xml
[error] SERVER ERROR: Gateway Time-out url=https://repo.typesafe.com/typesafe/ivy-releases/org.webjars.npm/minimist/

[ERROR] [03/25/2019 15:53:32.052] [sbt-web-akka.actor.default-dispatcher-2] [akka://sbt-web/user/$a/process] null
akka.actor.ActorInitializationException: exception during creation

Please let me know how to fix this issue?

Are you interested in integrating DataQuality with DataSphere Studio?

DataQuality is a very great project in the field of data quality, and I think a good way to enhance the influence of our two projects is to integrate DataQuality with DataSphere Studio.
What is DataSphere Studio?
DataSphere Studio is a one-stop data application development and management portal open-sourced by WeBank. It meets the requirements of the entire process of data application development from data exchange, desensitization and cleaning, analysis and mining, quality inspection, visual display, regular scheduling to data output.
Github address: https://github.com/WeBankFinTech/DataSphereStudio
Are you interested?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.