dvryaboy Goto Github PK
Type: User
Type: User
Apache Incubator Proposal for Parquet Format
A curated list of awesome big data frameworks, ressources and other awesomeness.
Prototype Bud runtime (Bloom Under Development)
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, and HBase code.
Elephant Twin is a framework for creating indexes in Hadoop
Elephant Twin LZO uses Elephant Twin to create LZO block indexes
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
Mirror of Apache Giraph
The GitBook documentation for Aqueduct
Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
This document attempts to capture useful patterns and warn about subtle gotchas when it comes to designing and evolving schemas for long-term serialized data. It is not intended as a guide for how to best represent a particular dataset or process.
source examples to support the "Cascading for the Impatient" blog post series
Mirror of Apache Parquet
Mirror of Apache Parquet
lakeFS - Data version control for your data lake | Git for data
The Mass Spec Query Language (MassQL) is a domain specific language meant to be a succinct way to express a query in a mass spectrometry centric fashion.
As we are moving to Apache, please open your pull requests on: https://github.com/apache/incubator-parquet-format
Plugin for Pentaho Data Integration allowing reading and writing of Google Spreadsheets
Mirror of Apache Pig
Eclipse plugin for Apache Pig
PigLatin mode for Emacs.
an anagram
A Scala API for Cascading
Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.
Java library relying on semver.org principles to check binary code compatibility
Vertica Hadoop Connector
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.