GithubHelp home page GithubHelp logo

pageman / big-data-hadoop-commands Goto Github PK

View Code? Open in Web Editor NEW

This project forked from linda-ikechukwu/big-data-hadoop-commands

0.0 1.0 0.0 13 KB

A list of important commands for some Apache hadoop tasks

big-data-hadoop-commands's Introduction

Big Data Hadoop Commands

This file contains commonly used commands for some basic tasks for the Hadoop big data open source framework and its major components.

HDFS

HDFS is the primary storage powerhouse of the hadoop ecosystem.

N.B: The hdfs file system is navigated with the default linux command line commands, just prefix with a '-' . Also run commands without the braces i.e {}

  1. Upload a local file to a HDFS directory
    hdfs dfs -put {local-source-file-path} {destination-source-file-path}

  2. Download file from HDFS to a local directory.
    hdfs dfs -get {destination-source-file-path} {local-source-file-path}

  3. Append the contents of a local file to a file on hdfs
    hdfs dfs -appendToFile {local-source-file-path} {destination-source-file-path}

  4. Merge the contents of mutiple files in a hdfs directory and download to a file on a local directory, then view contents to confirm.
    hdfs dfs -getmerge {path-to-hdfs-directory-containing-all-files-to-be-merged or different-paths } {path-to loacal-file} cat {path-to-loacl-file}

  5. Merge multiple files on hdfs into one single file on hdfs
    hadoop fs -cat {path-to-source-files-seperated-by-a-space-or-path-to-source-folder-containing-all-files/*} | hadoop -put - {path-to-destination-file}

HBase

Hbase is a No SQL, column oriented database for the big data hadoop ecosystem.

  1. Create a common table
    create 'table-name','column-family-name'

A column family name can be likened to a sub-category of a table containing columns of related information that will likely be queried together

  1. Create a Namespace
    create_namespace 'namespace-name'

A namespace is analogous to a function. Used to avoid conflict amognst common table names

  1. Create a table in a namespace
    create 'namespace-name:table-name', 'column-family-name'

For all commands involving a table name, if the table was created in a namespace, the namespace should be included right before the table name i.e "namespace-name:table-name"

  1. Add data to a table
    put 'table-name', 'RowKey', 'column-family-name:column-name', 'value'

  2. Query entries to a Rowkey
    get 'table-name', 'RowKey'

  3. Query entries to a column
    get 'table-name', 'column-family-name:column-name'

  4. Query table for a perticular value
    scan 'table-name', {FILTER=>"ValueFilter(=,'binaryFilter:value')"}

  5. Query a particular column for a particular value
    `scan 'table-name', {FILTER=>"ColumnPrefixFilter('column-name') AND ValueFilter(=,'binaryFilter:value')"}

  6. Delete a value from a column
    delete 'table-name', 'RowKey', 'column-family-name:column-name'

  7. Delete all data in a row from differnet columns
    deleteall 'table-name', 'RowKey',

  8. Delete a whole table
    disable 'table-name'
    drop 'table-name'

  9. Creating and Dividing a common table into regions and specifying row start keys for each region.
    create 'table-name', 'column-family-name', SPILITS => ['first-start-key', 'second-start-key', ...]

Specifying n number of start keys creates n+1 number of regions where the first region starts at 0 and ends at the first startkey

HIVE

Hive is a data warehouse used to query and analyze data stored in different databases and file systems that with hadoop using an SQL like interface.

  1. Get the current date and time
    select from_unixtime(unix_timestamp(), dd-MM-yyyy HH:mm);

  2. Create an internal table
    create table table-name (column-name data-type, column-name data type, ....) row format delimited fields terminated by ',' stored as textfile ;

Internal tables are tables that are only accessed within hive while external tables can be accessed outside of hive

  1. Create an external table
    create external table <table-name> (column-name data-type, column-name data type, ....) row format delimited fields terminated by ',' stored as textfile;

  2. Load data from a local file to hive
    load data local inpath <'path-to-local-file'> into table <table-name>;

  3. Load data from a hdfs file
    load data inpath 'path-to-hdfs-file' into table table-name;

  4. Load data immediately from source when creating the table
    create table <table-name> (column-name data-type, column-name data type, ....) row format delimited fields terminated by ',' stored as textfile location <'path-to-source-file'>;

  5. Load only rows of a table which contain a given column value into another table
    insert into <destination-table-name> select * from <source-table-name> where <column-name=value>;

  6. Load data from one Hive table to another.
    create table <new-table-name> as select * from <source-table-name>

  7. Load only rows of a table which contain a given column value into another table
    insert into <destination-table-name> select * from <source-table-name> where <column-name=value>;

  8. Create a table with the specifications of an existing table
    create <new-table-name> like <existing-table-name>;

  9. Query table for all rows containing occurence of a particular value in a column
    select * from <table-name> where <column-name='value'>;

  10. Query all entries to a table select * from <table-name>;

  11. Associating a Hive table with a Hbase base table on table creation
    create external table <external-table-name> (key int, gid map <<column-1-data-type,column-2-data-type>>) stored by 'org.hadoop.hive.hbase.HbaseStorageHandler' with SERDEPROPERTIES ("hbase.columns.mapping" = "<hbase-table-column-family-name:>") TBLPROPERTIES ("hbase.table.name" = "<hbase-table-name>");

big-data-hadoop-commands's People

Contributors

linda-ikechukwu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.