This project aims to produce a set of tools, that will help big data integration engineers, model the data automatically with a certain confidence interval.
These instructions help you start developing and running the project for testing purposes.
- Netbeans
- Hadoop Cluster
To start developing
- Install Netbeans;
- Clone the git repository;
- Configure as Maven project;
- Maven package and deploy them in the Big Data infrastructure.
- Change endpoints in
AtlasClient.AtlasCosumer
- Quality Tests: select tables in
basicProfiler.Profiler
- Similarity Tests: select tables in
Similarity.SimilarityAnalysis
- Spark - The scalabe event processing engine
- Atlas - Data Governance and Metadata framework for Hadoop
- Ranger - Enable, monitor and manage comprehensive data security across the Hadoop platform.
- José Magalhães
- João Galvão
- Maria Inês Costa
We use SemVer for versioning. For the versions available, see the tags on this repository.
This project is currently internal.
- Cheers for the LID4 community