agile-lab-dev / dataquality Goto Github PK
View Code? Open in Web Editor NEWDataQuality for BigData
License: GNU Lesser General Public License v3.0
DataQuality for BigData
License: GNU Lesser General Public License v3.0
Will this work with with AWS S3 as a data source instead of HDFS?
Thanks
I see there was PR merged wit s3 support in Nov. Does it now support s3 as a source instead of hdfs after this ?
I have tried to run ExampleCustomer.conf configuration and after getting the exception:
18/01/14 20:27:13 WARN apps.DQMasterBatch$: ************************************************************************
Exception in thread "main" it.agilelab.bigdata.DataQuality.exceptions.IllegalParameterException: Unknown parameters = csv
at it.agilelab.bigdata.DataQuality.configs.ConfigReader$$anonfun$getSourcesById$1.apply(ConfigReader.scala:81)
at it.agilelab.bigdata.DataQuality.configs.ConfigReader$$anonfun$getSourcesById$1.apply(ConfigReader.scala:61)
I run into the code and it looks like only "fixed" fileType is supported:
val schema = generalConfig.getString("fileType") match {
case "fixed" => {
if (Try(generalConfig.getObjectList("schema")).isSuccess) getFixedStructSchema(generalConfig)
else if (Try(generalConfig.getStringList("fieldLengths")).isSuccess) getFixedSchema(generalConfig)
else {
val allKeys = generalConfig.entrySet().map(_.getKey)
throw new IllegalParameterException("\n CONFIG: "+allKeys.mkString(" - "))
}
}
case x => throw new IllegalParameterException(x)
}
Is there any other more updated branch than master? Thanks!
Hi emakhov,
I am onto POC for using DQ tool for my organization. I am trying to setup DQ code base into my local machine/gcp, but I am stuck with following errors.
[error] SERVER ERROR: Gateway Time-out url=https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/org.webjars.npm/graceful-fs/
[error] SERVER ERROR: Gateway Time-out url=https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/org.webjars.npm/jsonfile/
[error] Server access Error: Operation timed out (Connection timed out) url=https://repo1.maven.org/maven2/org/webjars/npm/minimatch/maven-metadata.xml
[error] Server access Error: Operation timed out (Connection timed out) url=https://repo1.maven.org/maven2/org/webjars/npm/semver/maven-metadata.xml
[error] SERVER ERROR: Gateway Time-out url=https://repo.typesafe.com/typesafe/ivy-releases/org.webjars.npm/minimist/
[ERROR] [03/25/2019 15:53:32.052] [sbt-web-akka.actor.default-dispatcher-2] [akka://sbt-web/user/$a/process] null
akka.actor.ActorInitializationException: exception during creation
Please let me know how to fix this issue?
DataQuality is a very great project in the field of data quality, and I think a good way to enhance the influence of our two projects is to integrate DataQuality with DataSphere Studio.
What is DataSphere Studio?
DataSphere Studio is a one-stop data application development and management portal open-sourced by WeBank. It meets the requirements of the entire process of data application development from data exchange, desensitization and cleaning, analysis and mining, quality inspection, visual display, regular scheduling to data output.
Github address: https://github.com/WeBankFinTech/DataSphereStudio
Are you interested?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.