dvryaboy Goto Github PK

followers: 61.0 following: 1.0 repos: 27.0 gists: 6.0

Type: User

dvryaboy's Projects

apache-proposal

Apache Incubator Proposal for Parquet Format

awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

bud

Prototype Bud runtime (Bloom Under Development)

cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.

elephant-bird

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, and HBase code.

elephant-twin

Elephant Twin is a framework for creating indexes in Hadoop

elephant-twin-lzo

Elephant Twin LZO uses Elephant Twin to create LZO block indexes

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.

giraph

Mirror of Apache Giraph

gitbook

The GitBook documentation for Aqueduct

hadoop-lzo

Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20

idl_storage_guidelines

This document attempts to capture useful patterns and warn about subtle gotchas when it comes to designing and evolving schemas for long-term serialized data. It is not intended as a guide for how to best represent a particular dataset or process.

impatient

source examples to support the "Cascading for the Impatient" blog post series

incubator-parquet-format

Mirror of Apache Parquet

incubator-parquet-mr

Mirror of Apache Parquet

lakefs

lakeFS - Data version control for your data lake | Git for data

massquerylanguage

The Mass Spec Query Language (MassQL) is a domain specific language meant to be a succinct way to express a query in a mass spectrometry centric fashion.

parquet-format-1

As we are moving to Apache, please open your pull requests on: https://github.com/apache/incubator-parquet-format

pdi-google-spreadsheet-plugin

Plugin for Pentaho Data Integration allowing reading and writing of Google Spreadsheets

pig

Mirror of Apache Pig

pigeditor

Eclipse plugin for Apache Pig

piglatin-mode

PigLatin mode for Emacs.

redelm

an anagram

scalding

A Scala API for Cascading

scribe

Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.

semantic-versioning

Java library relying on semver.org principles to check binary code compatibility

vertica-hadoop-connector

Vertica Hadoop Connector

dvryaboy Goto Github PK

dvryaboy's Projects

Recommend Projects

Recommend Topics

Recommend Org

Jobs