GithubHelp home page GithubHelp logo

hive-orc-mr's Introduction

Applications based on hadoop & hbase

Env

  • Hive-0.11.0
  • Hadoop-1.1.2
  • JDK-1.6.0_35 +

ORC File Format

The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data.

Compared with RCFile format, for example, ORC file format has many advantages such as:

  • a single file as the output of each task, which reduces the NameNode's load
  • Hive type support including datetime, decimal, and the complex types (struct, list, map, and union)
  • light-weight indexes stored within the file
    • skip row groups that don't pass predicate filtering
    • seek to a given row
  • block-mode compression based on data type
    • run-length encoding for integer columns
    • dictionary encoding for string columns
  • concurrent reads of the same file using separate RecordReaders
  • ability to split files without scanning for markers
  • bound the amount of memory needed for reading or writing
  • metadata stored using Protocol Buffers, which allows addition and removal of fields

ORC related HQL


CREATE EXTERNAL TABLE test.test_20130728_orc(
  stat_date string, 
  stat_hour string, 
  ip string, 
  logdate string, 
  method string, 
  url string, 
  uid string, 
  pid string, 
  aid int, 
  wid int, 
  vid int, 
  type int, 
  stat int, 
  mtime float, 
  ptime float, 
  channel string, 
  boxver string, 
  bftime int, 
  country string, 
  province string, 
  city string, 
  isp string, 
  ditchid int, 
  drm int, 
  charge int, 
  ad int, 
  adclick int, 
  groupid int, 
  client int, 
  usertype int, 
  ptolemy int, 
  fixedid string, 
  userid string) 
STORED AS ORC 
LOCATION "/data/test/20130728_orc"
tblproperties ("orc.compress"="ZLIB");

INSERT OVERWRITE TABLE test.test_20130728_orc SELECT * FROM test.test_20130728;

hive-orc-mr's People

Contributors

mayanhui avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.