Topic: data-pipeline Goto Github
Some thing interesting about data-pipeline
Some thing interesting about data-pipeline
data-pipeline,A list of useful resources to learn Data Engineering from scratch
User: adilkhash
data-pipeline,:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
User: aeksco
data-pipeline,Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
Organization: agnostiqhq
Home Page: https://www.covalent.xyz
data-pipeline,The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Organization: airbytehq
Home Page: https://airbyte.com
data-pipeline,An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
User: airscholar
Home Page: https://www.youtube.com/watch?v=GqAcTrqKcrY
data-pipeline,Serverless Data Pipeline powered by Kinesis Firehose, API Gateway, Lambda, S3, and Athena
User: alexcasalboni
data-pipeline,Flink CDC is a streaming data integration tool
Organization: apache
Home Page: https://nightlies.apache.org/flink/flink-cdc-docs-stable
data-pipeline,SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Organization: apache
Home Page: https://seatunnel.apache.org/
data-pipeline,ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Organization: bruin-data
Home Page: https://bruin-data.github.io/ingestr/
data-pipeline,BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Organization: bytedance
Home Page: https://bytedance.github.io/bitsail/
data-pipeline,Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Organization: conduitio
Home Page: https://conduit.io
data-pipeline,Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.
Organization: confluentinc
Home Page: https://developer.confluent.io/
data-pipeline,Use SQL to build ELT pipelines on a data lakehouse.
Organization: cuebook
Home Page: https://cuelake.cuebook.ai
data-pipeline,Example end to end data engineering project.
User: damklis
data-pipeline,Performance Observability for Apache Spark
Organization: dataflint
data-pipeline,A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.
User: digitalghost-dev
Home Page: https://streamlit.digitalghost.dev/
data-pipeline,The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Organization: elementary-data
Home Page: https://www.elementary-data.com/
data-pipeline,Feldera Continuous Analytics Platform
Organization: feldera
Home Page: https://feldera.com
data-pipeline,Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Organization: googlecloudplatform
data-pipeline,Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management
Organization: indexical-metrics-measure-advisory
Home Page: https://imma-watchmen.com/
data-pipeline,A list about Apache Kafka
User: infoslack
data-pipeline,Code review for data in dbt
Organization: infuseai
Home Page: https://www.piperider.io/
data-pipeline,Jayvee is a domain-specific language and runtime for automated processing of data pipelines
Organization: jvalue
Home Page: https://jvalue.github.io/jayvee/
data-pipeline,Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Organization: kestra-io
Home Page: https://kestra.io
data-pipeline,(project & tutorial) dag pipeline tests + ci/cd setup
User: marcosmarxm
data-pipeline,Memphis.dev is a highly scalable and effortless data streaming platform
Organization: memphisdev
Home Page: https://memphis.dev
data-pipeline,Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
User: msamogh
data-pipeline,🔥🔥🔥 Open Source Hightouch, Census, and RudderStack Alternative
Organization: multiwoven
Home Page: https://multiwoven.com
data-pipeline,Fluent data pipelines for python and your shell
User: olirice
data-pipeline,Bulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.
Organization: openbridge
Home Page: https://www.openbridge.com
data-pipeline,Data pipelines from re-usable components
Organization: patterns-app
data-pipeline,Making DAG construction easier
Organization: pipeline-tools
Home Page: https://pipeline-tools.github.io/gusty-docs/
data-pipeline,task management & automation tool
Organization: pydoit
Home Page: http://pydoit.org
data-pipeline,A lightweight stream processing library for Go
User: reugn
Home Page: https://pkg.go.dev/github.com/reugn/go-streams
data-pipeline,Privacy and Security focused Segment-alternative, in Golang and React
Organization: rudderlabs
Home Page: https://www.rudderstack.com/
data-pipeline,A Clojure machine learning library
Organization: scicloj
data-pipeline,The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
Organization: snowplow
Home Page: http://snowplowanalytics.com
data-pipeline,Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Organization: sparkfish
Home Page: https://github.com/sparkfish/augraphy
data-pipeline,Smarter data pipelines for audio.
Organization: spotify
Home Page: https://docs.klio.io
data-pipeline,Practical Data Engineering: A Hands-On Real-Estate Project Guide
Organization: sspaeti-com
Home Page: https://ssp.sh/blog/data-engineering-project-in-twenty-minutes
data-pipeline,Streaming reactive and dataflow graphs in Python
Organization: streamlet-dev
data-pipeline,The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย
Organization: thedataengineeringbook
Home Page: https://thedataengineeringbook.online
data-pipeline,Automated Tool for Optimized Modelling
User: tvdboom
Home Page: https://tvdboom.github.io/ATOM/
data-pipeline,:whale: Tool to automate data quality checks on data pipelines
Organization: ubisoft
Home Page: https://ubisoft.github.io/mobydq/
data-pipeline,Content for architecting a data science platform for products using Luigi, Spark & Flask.
Organization: unnati-xyz
Home Page: http://www.unnati.xyz
data-pipeline,Build and deploy a serverless data pipeline on AWS with no effort.
User: vincentclaes
Home Page: https://pypi.org/project/datajob/
data-pipeline,An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
Organization: whylabs
Home Page: https://whylogs.readthedocs.io/
data-pipeline,Tools for ASR Corpus Generation from Online Video
User: yc9701
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.