GithubHelp home page GithubHelp logo

raystack / meteor Goto Github PK

View Code? Open in Web Editor NEW
180.0 9.0 39.0 14.69 MB

Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.

Home Page: https://raystack.github.io/meteor/

License: Apache License 2.0

Dockerfile 0.01% Makefile 0.15% Go 99.83% Python 0.01%
metadata collector scraper extractors sinks bigdata data-catalog data-management dataops

meteor's Introduction

Meteor

test workflow build workflow Go Report Card Coverage Status Version License

Meteor is a plugin driven agent for collecting metadata. Meteor has plugins to source metadata from a variety of data stores, services and message queues. It also has sink plugins to send metadata to variety of third party APIs and catalog services.

Key Features

  • No Dependency: Written in Go. It compiles into a single binary with no external dependency.
  • Extensible: Plugin system allows new sources and sinks to be easily added.
  • Ecosystem: Extract metadata for many popular services with a wide number of service plugins.
  • Customizable: Add your own processors and sinks to suit your many use cases.
  • Runtime: Meteor can run inside VMs or containers with minimal memory footprint.

Documentation

Explore the following resources to get started with Meteor:

  • Usage Guides will help you get started on Meteor.
  • Concepts describes all important Meteor concepts.
  • Contribute contains resources for anyone who wants to contribute to Meteor.

Installation

Install Meteor on macOS, Windows, Linux, OpenBSD, FreeBSD, and on any machine.

Binary (Cross-platform)

Download the appropriate version for your platform from releases page. Once downloaded, the binary can be run from anywhere. You don’t need to install it into a global location. This works well for shared hosts and other systems where you don’t have a privileged account. Ideally, you should install it somewhere in your PATH for easy use. /usr/local/bin is the most probable location.

Homebrew

# Install meteor (requires homebrew installed)
$ brew install raystack/tap/meteor

# Upgrade meteor (requires homebrew installed)
$ brew upgrade meteor

# Check for installed meteor version
$ meteor version

Usage

Meteor’s CLI is fully featured but simple to use, even for those who have very limited experience working from the command line. Run meteor --help to see list of all available commands and instructions to use.

# List of commands
$ meteor --help

# Print command reference
$ meteor reference

Running locally

# Clone the repo
$ git clone https://github.com/raystack/meteor.git

# Install all the golang dependencies
$ go mod tidy

# Build meteor binary file
$ make build

# Run meteor on a recipe file
$ ./meteor run sample-recipe.yaml

# Run meteor on multiple recipes in a directory
$ ./meteor run directory-path

Running tests

# Running all unit tests, excluding extractors
$ make test

# Run integration test for any extractor
$ cd plugins/extractors/<name-of-extractor>
$ go test -tags=integration

Contribute

Development of Meteor happens in the open on GitHub, and we are grateful to the community for contributing bugfixes and improvements. Read below to learn how you can take part in improving Meteor.

Read our contributing guide to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to Meteor.

To help you get your feet wet and get you familiar with our contribution process, we have a list of good first issues that contain bugs which have a relatively limited scope. This is a great place to get started.

This project exists thanks to all the contributors.

License

Meteor is Apache 2.0 licensed.

meteor's People

Contributors

andrelsjunior avatar anjali9791 avatar arujit avatar bsushmith avatar chief-rishab avatar grayflash avatar irainia avatar ishanarya0 avatar kushsharma avatar lucapette avatar mabdh avatar maztohir avatar rahmatrhd avatar ravisuhag avatar sbchaos avatar scortier avatar srtpatil avatar stewartjingga avatar sudo-suhas avatar vianhazman avatar zearin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

meteor's Issues

Add more helpful logging

Is your feature request related to a problem? Please describe.
When running multiple recipes, we do not get any visibility at what's going on unless it is finished.

Describe the solution you'd like
Add logs when recipe is first being run and when it is finished without having to wait for all recipes.

Describe alternatives you've considered
none

Additional context
none

Add metadata extractor for oracle

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/oracle, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Table

Field Sample Value
urn my_database.my_table
name my_table
source oracle
description table description
profile.total_rows 2100
schema [][Column]

Column

Field Sample Value
name total_price
description item's total price
data_type decimal
is_nullable true
length 12,2

Meteor installation setup

Deliverables:

  • add installation guide, docks/guides/installation.md
  • Add config for goreleaser
  • Update Makefile and Dockerfile
  • Update github ci

Add metadata extractor for mssql

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/mssql, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Table

Field Sample Value
urn my_database.my_table
name my_table
source mssql
description table description
profile.total_rows 2100
schema [][Column]

Column

Field Sample Value
name total_price
description item's total price
data_type decimal
is_nullable true
length 12,2

update kafka library to avoid enabling cgo

Is your feature request related to a problem? Please describe.
Meteor requires CGO_ENABLED to be set to true for cross building due to dependency with https://github.com/confluentinc/confluent-kafka-go. Enabling CGO will require the machine to have all required os tools or else build will fail.

Describe the solution you'd like
Update kafka library to those who do not use C client. e.g. https://github.com/segmentio/kafka-go

Describe alternatives you've considered
Create a docker image with all the required os toolchains.

Add metadata extractor for redshift

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/mysql, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

The struct fields may vary for different DB's, choose the best suited ones from proto/odpf/meta

Table

Field Sample Value
urn my_database.my_table
name my_table
source dremio
description table description
profile.total_rows 2100
schema [][Column]

Column

Field Sample Value
name total_price
description item's total price
data_type decimal
is_nullable true
length 12,2

Add metadata extractor for mongodb

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/mysql, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Outputs

Field Sample Value
urn my_database.my_collection
name my_collection
source mongodb
description table description
profile.total_rows 2100

List command for plugins

Meteor should provide a list command for checking available plugins

$ meteor list
$ meteor list extractors  
$ meteor list sinks
$ meteor list processors 
  • Add list command

Add metadata extractor for clickhouse

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/mysql, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a array of Tables of struct meta.Table

Table

Field Sample Value
urn my_database.my_table
name my_table
source clickhouse
description table description
profile.total_rows 2100
schema []Column

Column

Field Sample Value
name total_price
description item's total price
data_type String

Allow plugins test/dry run before running extractors

Is your feature request related to a problem? Please describe.
Running extraction task could take some time, and the error might happen on the sinking part which would make those long extraction time and process is wasted.

Describe the solution you'd like
Enable validation or test run of all plugins before running everything, especially processors and sink plugins

Describe alternatives you've considered
None

Additional context
None

Add metadata extractor for caraml-store

  • Add unit tests
  • Add extractor
  • Add README.md in plugins/extractors/caramlstore, defining output
  • Register your extractor plugins/extractors/populate.go
  • Add extractor the extractor list in docs/reference/extractor.md

Add Tableau metadata extractor

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/tableau, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Outputs

Field Sample Value
urn tableau.dashboard_name
name dashboard_name
source metabase
description table description
schema []Chart

Chart

Field Sample Value
urn tableau.dashboard_name.chart_name
source tableau
dashboard_urn tableau.dashboard_name
description chart description

Add metadata extractor for snowflake

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/mysql, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output may contain a Table or any field from Table.pb.go

Table

Field Sample Value
urn my_database.my_table
name my_table
source snowflake
description table description
profile.total_rows 2100
schema [][Column]

Column

Field Sample Value
name total_price
description item's total price
data_type decimal
is_nullable true
length 12,2

Add metadata extractor for GCS

Proposal

  • We should create Bucket type proto to support cloud storages metadata such as Google Cloud Storage, AWS S3, Aliyun Storage
  • We should create Blobs facets to enrich the bucket proto with the metadata of blobs inside a bucker
  • We should extract metadata from Google Cloud Storage and map it to the created Bucket type.
  • The metadata extracted from Google Cloud Storage can be in bucket and blob levels.

Outputs

Field Sample Value
urn project_id/bucket_name
name bucket_name
source googlecloudstorage
location ASIA
storage_type STANDARD
tags []{key:value}
timestamps.created_at.seconds 1551082913
timestamps.created_at.nanos 1551082913

Column

Field Sample Value
urn project_id/bucket_name/blob_path
name blob_path
size 311
deleted_at.seconds 1551082913
expired_at.seconds 1551082913
tags []{key:value}
ownership.owners []{name:[email protected]}
timestamps.created_at.seconds 1551082913
timestamps.created_at.nanos 1551082913
timestamps.updated_at.seconds 1551082913
timestamps.updated_at.nanos 1551082913

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/mysql, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Use emitter in extractor interface

Instead of using a channel in the extractor interface, we can use an emitter that simplifies writing plugins for end-users and emitter takes care of streaming data to processors and sinks.

emitter.Emit(interface{})
emitter.Close()

Add metadata extractor for hive

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/hive, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Table

Field Sample Value
urn my_database.my_table
name my_table
source hive
description table description
profile.total_rows 2100
schema [][Column]

Column

Field Sample Value
name total_price
description item's total price
data_type decimal
is_nullable true
length 12,2

Add metadata extractor for postgres

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/postgres, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Table

Field Sample Value
urn my_database.my_table
name my_table
source postgres
description table description
profile.total_rows 2100
schema [][Column]

Column

Field Sample Value
name total_price
description item's total price
data_type decimal
is_nullable true
length 12,2

Add metadata extractor for druid

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/druid, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Explore the Table Data Model and add as many features as possible.

Table

Field Sample Value
urn table.urn
name my_table
source druid
description table description

Add version support in recipes

Is your feature request related to a problem? Please describe.
If Meteor introduces breaking changes/features in recipes older recipes will stop working.

Describe the solution you'd like
Meteor should add a version field in recipes to detect which API version is used for that given recipe.

name: sample-recipe
version: v1beta1
source:
...

Describe alternatives you've considered
None

Improve documentation

Add detailed documentation of each metadata model and all their properties for a given version.

Add a feature matrix for extractor
Clear a feature matrix of extractors on what metadata models they are collecting. As an example show all table extractors in one table and show what all features are supported for each extractor.

Add concept about recipe
Add complete detail about what recipes are, what they do, how to write etc.

Add metadata extractor for glue

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/glue, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Explore the Table Data Model and add as many features as possible.

Table

Field Sample Value
urn my_database.my_table
name my_table
source glue
description table description
schema [][Column]

Column

Field Sample Value
name total_price
description item's total price
data_type decimal

Version command for meteor CLI

Meteor should provide a version command which gives information about the current version of CLI and if there is an upgraded version available.

$ meteor version

Add metadata extractor for mysql

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/mysql, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Table

Field Sample Value
urn my_database.my_table
name my_table
source mysql
description table description
profile.total_rows 2100
schema [][Column]

Column

Field Sample Value
name total_price
description item's total price
data_type decimal
is_nullable true
length 12,2

Connect sql databases with connection strings

Currrently, we expect user to pass his UserID and Password in recipe and we have static uri for using these to setup connection with database. eg: for mysql we have it as fmt.Sprintf("%s:%s@tcp(%s)/", config.UserID, config.Password, config.Host).

We also have these as mandatory values in and we return an error if they are absent. But it is possible to user a lot of db's and dashboards with default values as username and no-password at all, for eg: root@tcp((localhost:3306)/. So, I think we should allow it.

Update the extractor config to make Username and Password optional, and make the connection URI dynamic.

Add metadata extractor for dbt

Write a metadata extractor for dbt

Deliverables

  • describe the metadata that can be returned in comments here.
  • describe the data model that can be used for output, i.e, the metadata is in form of Table, Dashboard, User, etc.
  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/presto, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Add metadata extractor for superset

Deliverables

  • add unit tests ( TODO: Create user in superset )
  • add extractor
  • add README.md in plugins/extractors/superset, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Outputs

Field Sample Value
resource.urn superset.dashboard_name
resource.name dashboard_name
resource.service superset
resource.url dashboard_url

Chart

Field Sample Value
name chart_name
dashboard_source superset
description chart_description
url chart_url
datasource chart_datasource
dashboard_urn dashboard:dashboard_id

Generate command for recipes

Meteor should provide a generate command for recipes.

$ meteor gen recipe --source=name --sink=name --processor=name

Add metadata extractor for looker

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/looker, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Outputs

Field Sample Value
urn metabase.dashboard_name
name dashboard_name
source looker
description table description
schema []Chart

Chart

Field Sample Value
urn looker.chart_name
dashboard_urn looker.dashboard_name
description chart description

Add metadata extractor for dremio

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/mysql, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Table

Field Sample Value
urn my_database.my_table
name my_table
source dremio
description table description
profile.total_rows 2100
schema [][Column]

Column

Field Sample Value
name total_price
description item's total price
data_type decimal
is_nullable true
length 12,2

Add metadata extractor for grafana

Feature: Allows meteor to extract metadata from Grafana.

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/grafana, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Outputs

Field Sample Value
urn grafana.HzK8qNW7z
name new-dashboard-copy
source grafana
url http://localhost:3000/d/HzK8qNW7z/new-dashboard-copy
charts []chart

Chart

Field Sample Value
urn 5WsKOvW7z.4
name Panel Random
type table
source grafana
description random description for this panel
url http://localhost:3000/d/5WsKOvW7z/test-dashboard-updated?viewPanel=4
data_source postgres
raw_query SELECT\n urn,\n created_at AS \"time\"\nFROM resources\nORDER BY 1
dashboard_urn grafana.5WsKOvW7z
dashboard_source grafana

Add metadata extractor for bigtable

We should show basic metadata for all Google Bigtable instances belonging to a Google Cloud Project. For each table in an instance, we can show Column Family names and their GC Policy.

Add metadata extractor for csv

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/csv, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Outputs

Field Sample Value
urn filename.csv
name filename.csv
source csv
schema.columns []Column

Column

Field Sample Value Description
name order_id csv header e.g. first row

Add metadata extractor for presto

Write a metadata extractor for Presto view

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/presto, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Table

Field Sample Value
urn my_database.my_table
name my_table
source presto
description table description
profile.total_rows 2100
schema [][Column]

Column

Field Sample Value
name total_price
description item's total price
data_type decimal
length 12,2

Add metabase metadata extractor

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/metabase, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Outputs

Field Sample Value
urn metabase.dashboard_name
name dashboard_name
source metabase
description table description
schema []Chart

Chart (Basically the cards in metabase)

Field Sample Value
urn metabase.dashboard_name.card_name
source metabase
dashboard_urn metabase.dashboard_name
description card description

Support custom processors

Custom processor is needed to:

  • fetch data from any services for enriching metadata
  • custom calculation or aggregation

Every team will have different use cases and it is not ideal for them to submit a new PR for every processor that is only applicable to their use cases.

Calculate total data count on a recipe run

Is your feature request related to a problem? Please describe.
As a user, I want to be able to check and monitor how much metadata my recipe is processing/extracting.

Describe the solution you'd like

  1. Gather additional metrics for recipe run total data (e.g. runDataCount)
  2. Print out a run report in a tabular format after all recipes are finished running

Describe alternatives you've considered
None

Lint command for recipes

Meteor should provide a lint command to detect recipe issues for a given file path or folder.

$ meteor lint [path]

Add metadata extractor for elastic search

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/elastic, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Info command for plugin information

Meteor should provide a command to give detailed info about a plugin

$ meteor info extractor [name]
$ meteor info sink [name]
$ meteor info processor [name]

Add metadata extractor for delta_lake

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/deltalake, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Table

Field Sample Value
urn my_database.my_table
name my_table
source delta_lake
description table description
profile.total_rows 2100
schema [][Column]

Column

Field Sample Value
name total_price
description item's total price
data_type decimal
is_nullable true
length 12,2

Add metadata extractor for cassandra

Deliverables

  • add unit tests
  • add extractor
  • add README.md in plugins/extractors/cassandra, defining output
  • register your extractor plugins/extractors/populate.go
  • add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Explore the Table Data Model and add as many features as possible.

Table

Field Sample Value
urn SSTable
name my_table
source cassandra
description table description

need more context here

Add metadata extractor for github

Deliverables

  • add extractor
  • add README.md in plugins/extractors/github, defining output
  • add extractor the extractor list in docs/reference/extractor.md

Outputs

Field Sample Value
urn https://github.com/ravisuhag
email [email protected]
username ravisuhag
full_name Ravi Suhag
is_active true

Add a tour for using meteor

Add a tour for users to use meteor end to end.

  1. Install meteor
  2. Give an overview of plugins and explain list plugins commands.
  3. Create and lint recipe
  4. Run meteor
  5. Deploy meteor

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.