GithubHelp home page GithubHelp logo

john-liu / incubator-griffin Goto Github PK

View Code? Open in Web Editor NEW

This project forked from apache/griffin

0.0 1.0 0.0 23.4 MB

Mirror of Apache griffin (Incubating)

License: Other

Scala 20.03% Python 2.08% Java 18.57% CSS 4.42% HTML 25.67% JavaScript 29.23%

incubator-griffin's Introduction

Apache Griffin

Apache Griffin is a model driven data quality solution for modern data systems. It provides a standard process to define data quality measures, execute, report, as well as an unified dashboard across multiple data systems. You can access our home page here. You can access our wiki page here. You can access our issues jira page here.

Contact us

Dev List

CI

Repository

Snapshot:

Release:

How to run in docker

  1. Install docker.
  2. Pull our built docker image.
    docker pull bhlx3lyx7/griffin_demo:0.0.1
    
  3. Increase vm.max_map_count of your local machine, to use elasticsearch.
    sysctl -w vm.max_map_count=262144
    
  4. Run this docker image, wait for about one minute, then griffin is ready.
    docker run -it -h sandbox --name griffin_demo -m 8G --memory-swap -1 \
    -p 32122:2122 -p 37077:7077 -p 36066:6066 -p 38088:8088 -p 38040:8040 \
    -p 33306:3306 -p 39000:9000 -p 38042:8042 -p 38080:8080 -p 37017:27017 \
    -p 39083:9083 -p 38998:8998 -p 39200:9200 bhlx3lyx7/griffin_demo:0.0.1
    
  5. Now you can visit UI through your browser, login with account "test" and password "test" if required.
    http://<your local IP address>:38080/
    
    You can also follow the steps using UI here.

How to deploy and run at local

  1. Install jdk (1.8 or later versions).
  2. Install mysql.
  3. Install npm (version 6.0.0+).
  4. Install Hadoop (2.6.0 or later), you can get some help here.
  5. Install Spark (version 1.6.x, griffin does not support 2.0.x at current), if you want to install Pseudo Distributed/Single Node Cluster, you can get some help here.
  6. Install Hive (version 1.2.1 or later), you can get some help here. You need to make sure that your spark cluster could access your HiveContext.
  7. Install Livy, you can get some help here. Griffin need to schedule spark jobs by server, we use livy to submit our jobs. For some issues of Livy for HiveContext, we need to download 3 files, and put them into Hdfs.
    datanucleus-api-jdo-3.2.6.jar
    datanucleus-core-3.2.10.jar
    datanucleus-rdbms-3.2.9.jar
    
  8. Install ElasticSearch. ElasticSearch works as a metrics collector, Griffin produces metrics to it, and our default UI get metrics from it, you can use your own way as well.
  9. Modify configuration for your environment. You need to modify the configuration part of code, to make Griffin works well in you environment. service/src/main/resources/application.properties
    spring.datasource.url = jdbc:mysql://<your IP>:3306/quartz?autoReconnect=true&useSSL=false
    spring.datasource.username = <user name>
    spring.datasource.password = <password>
    
    hive.metastore.uris = thrift://<your IP>:9083
    hive.metastore.dbname = <hive database name>    # default is "default"
    
    service/src/main/resources/sparkJob.properties
    sparkJob.file = hdfs://<griffin measure path>/griffin-measure.jar
    sparkJob.args_1 = hdfs://<griffin env path>/env.json
    sparkJob.jars_1 = hdfs://<datanucleus path>/datanucleus-api-jdo-3.2.6.jar
    sparkJob.jars_2 = hdfs://<datanucleus path>/datanucleus-core-3.2.10.jar
    sparkJob.jars_3 = hdfs://<datanucleus path>/datanucleus-rdbms-3.2.9.jar
    sparkJob.uri = http://<your IP>:8998/batches
    
    ui/js/services/services.js
    ES_SERVER = "http://<your IP>:9200"
    
    Configure measure/measure-batch/src/main/resources/env.json for your environment, and put it into Hdfs /
  10. Build the whole project and deploy.(NPM should be installed , on mac you can try 'brew install node')
    mvn install
    
    Create a directory in Hdfs, and put our measure package into it.
    cp /measure/target/measure-0.1.3-incubating-SNAPSHOT.jar /measure/target/griffin-measure.jar
    hdfs dfs -put /measure/target/griffin-measure.jar <griffin measure path>/
    
    After all our environment services startup, we can start our server.
    java -jar service/target/service.jar
    
    After a few seconds, we can visit our default UI of Griffin (by default the port of spring boot is 8080).
    http://<your IP>:8080
    
  11. Follow the steps using UI here.

Note: The front-end UI is still under development, you can only access some basic features currently.

Contributing

See CONTRIBUTING.md for details on how to contribute code, documentation, etc.

incubator-griffin's People

Contributors

bhlx3lyx7 avatar guoyuepeng avatar john-liu avatar lionel3l avatar rachelyang2 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.