GithubHelp home page GithubHelp logo

webhdfs's Introduction

webhdfs - A client library implementation for Hadoop WebHDFS, and HttpFs, for Ruby

The webhdfs gem is to access Hadoop WebHDFS (EXPERIMENTAL: and HttpFs). WebHDFS::Client is a client class, and WebHDFS::FileUtils is utility like 'fileutils'.

Installation

gem install webhdfs

Usage

WebHDFS::Client

For client object interface:

require 'webhdfs'
client = WebHDFS::Client.new(hostname, port)
# or with pseudo username authentication
client = WebHDFS::Client.new(hostname, port, username)

To create/append/read files:

client.create('/path/to/file', data)
client.create('/path/to/file', data, :overwrite => false, :blocksize => 268435456, :replication => 5, :permission => '0666')

#This does not require whole data in memory, and it can be read chunk by chunk, ex: File data
client.create('/path/to/file', file_IO_handle, :overwrite => false, :permission => 0666)

client.append('/path/to/existing/file', data)

client.read('/path/to/target') #=> data
client.read('/path/to/target' :offset => 2048, :length => 1024) #=> data

To mkdir/rename/delete directories or files:

client.mkdir('/hdfs/dirname')
client.mkdir('/hdfs/dirname', :permission => '0777')

client.rename(original_path, dst_path)

client.delete(path)
client.delete(dir_path, :recursive => true)

To get status or list of files and directories:

client.stat(file_path) #=> key-value pairs for file status
client.list(dir_path)  #=> list of key-value pairs for files in dir_path

And, 'content_summary', 'checksum', 'homedir', 'chmod', 'chown', 'replication' and 'touch' methods available.

For known errors, automated retries are available. Set retry_known_errors option as true.

#### To retry for LeaseExpiredException automatically
client.retry_known_errors = true

# client.retry_interval = 1 # [sec], default: 1
# client.retry_times = 1 # [times], default: 1

WebHDFS::FileUtils

require 'webhdfs/fileutils'
WebHDFS::FileUtils.set_server(host, port)
# or
WebHDFS::FileUtils.set_server(host, port, username, doas)

WebHDFS::FileUtils.copy_from_local(localpath, hdfspath)
WebHDFS::FileUtils.copy_to_local(hdfspath, localpath)

WebHDFS::FileUtils.append(path, data)

For HttpFs

For HttpFs instead of WebHDFS:

client = WebHDFS::Client.new('hostname', 14000)
client.httpfs_mode = true

client.read(path) #=> data

# or with webhdfs/filetuils
WebHDFS::FileUtils.set_server('hostname', 14000)
WebHDFS::FileUtils.set_httpfs_mode
WebHDFS::FileUtils.copy_to_local(remote_path, local_path)

For HTTP Proxy servers

client = WebHDFS::Client.new('hostname', 14000, 'proxy.server.local', 8080)
client.proxy_user = 'jack'   # if needed
client.proxy_pass = 'secret' # if needed

For SSL

Note that net/https and openssl libraries must be available:

client = WebHDFS::Client.new('hostname', 4443)
client.ssl = true
client.ssl_ca_file = "/path/to/ca_file.pem" # if needed
client.ssl_varify_mode = :peer # if needed (:none or :peer)

For Kerberos Authentication

Note that gssapi library must be available:

client = WebHDFS::Client.new('hostname', 14000)
client.kerberos = true

AUTHORS

LICENSE

  • Copyright: Copyright (c) 2012- Fluentd Project
  • License: Apache License, Version 2.0

webhdfs's People

Contributors

kovyrin avatar kzk avatar scauglog avatar skaji avatar soam-verticloud avatar tagomoris avatar uijin avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.