GithubHelp home page GithubHelp logo

cephfs-hadoop's Introduction

CephFS Hadoop Plugin!

In a hurry ?

  • Install virtualbox and vagrant.
  • Make sure they are working correctly.

Then just run:

  • cd ./resources/vagrant
  • vagrant up

Wow ! How did you do that? Vagrant

This repository contains the source code for the Hadoop FileSystem (HCFS) implementation on Ceph.

In addition, for developers, it includes a Vagrant recipe for spinning up a Ceph 1 node cluster to test the plugin.

The vagrant recipe

  • installs ceph-deploy, ceph, ceph-fuse, etc..
  • installs the ceph java bindings
  • configures and sets up a single node cluster
  • creates a fuse mount in /mnt/ceph
  • installs maven
  • creates a shared directory for development (/ceph-hadoop)
  • creates a shared directory for vagrant setup (/vagrant)
  • installs custom HCFS jars for HADOOP-9361
  • finally runs the entire build, creates the jar, and runs unit tests.

Learning the details

To grok the details, just check out the Vagrantfile. In that file, we call 4 scripts (config.vm.provision). The java steps are summarized by the maven download and mvn clean package step.

Publishing , deployment , and continuous integration

This is all TBD. For now, we manually publish this jar to maven central, see pom.xml for details.

cephfs-hadoop's People

Contributors

dengquan avatar dotnwat avatar gregsfortytwo avatar hellertime avatar jayunit100 avatar kdunn926 avatar rootfs avatar shangzhong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cephfs-hadoop's Issues

Does the hadoop plugin access the data as a cluster over the network or locally?

Dear developers,

I'm new on hadoop.

We have being running cephfs for almost two years, loading it with large files (4GB to 4TB in size).

Now we need to process this files and we are looking on the hadoop plugin as a solution for using mapreduce.

Does the hadoop plugin access cephfs over the network as a normal cluster or I can install the hadoop's processors on every ceph node and process the data locally?

Thanks and regards,
Aristeu

Connection failed

Through some configuration operations, it seems to be successful...

hadoop fs -ls /
I got these messages

Loading libcephfs-jni from default path: /usr/local/hadoop/lib/native
Loading libcephfs-jni: Success!
ls: Invalid argument

Hadoop 2.7.3 Ceph 10.2.5

Within several unchanged ops, I dont think fs API changed....
Any help? or some conflicts appears in new version? thx!

Vagrant Up Fails

==> default: [INFO] 6 errors
==> default: [INFO] -------------------------------------------------------------
==> default: [INFO] ------------------------------------------------------------------------
==> default: [INFO] BUILD FAILURE
==> default: [INFO] ------------------------------------------------------------------------
==> default: [INFO]
==> default: Total time: 1:41.463s
==> default: [INFO]
==> default: Finished at: Sun Mar 08 07:46:59 UTC 2015
==> default: [INFO] Final Memory: 22M/52M
==> default: [INFO] ------------------------------------------------------------------------
==> default: [ERROR]
==> default: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project cephfs-hadoop: Compilation failure: Compilation failure:
==> default: [ERROR]
==> default: /ceph-hadoop/src/main/java/org/apache/hadoop/fs/ceph/CephInputStream.java:[46,11] cannot find symbol
==> default: [ERROR]
==> default: symbol : class CephFS
==> default: [ERROR]
==> default: location: class org.apache.hadoop.fs.ceph.CephInputStream
==> default: [ERROR] /ceph-hadoop/src/main/java/org/apache/hadoop/fs/ceph/CephInputStream.java:[60,46] cannot find symbol
==> default: [ERROR] symbol : class CephFS
==> default: [ERROR] location: class org.apache.hadoop.fs.ceph.CephInputStream
==> default: [ERROR] /ceph-hadoop/src/main/java/org/apache/hadoop/fs/ceph/CephOutputStream.java:[50,11] cannot find symbol
==> default: [ERROR] symbol : class CephFS
==> default: [ERROR] location: class org.apache.hadoop.fs.ceph.CephOutputStream
==> default: [ERROR] /ceph-hadoop/src/main/java/org/apache/hadoop/fs/ceph/CephOutputStream.java:[62,47] cannot find symbol
==> default: [ERROR] symbol : class CephFS
==> default: [ERROR] location: class org.apache.hadoop.fs.ceph.CephOutputStream
==> default: [ERROR] /ceph-hadoop/src/main/java/org/apache/hadoop/fs/ceph/CephTalker.java:[43,26] cannot find symbol
==> default: [ERROR] symbol: class CephFS
==> default: [ERROR] /ceph-hadoop/src/main/java/org/apache/hadoop/fs/ceph/CephFileSystem.java:[71,11] cannot find symbol
==> default: [ERROR] symbol : class CephFS
==> default: [ERROR] location: class org.apache.hadoop.fs.ceph.CephFileSystem

hadoop & ceph version

Is there a version compatibility list for hadoop and ceph? Does the current package support hadoop3.0 and above?

The config of "ceph.root.dir" seems not work

I think if config "ceph.root.dir", hadoop should use it as root dir.
For example, when config
<property>
<name>fs.default.name</name>
<value>ceph://192.0.0.1:6789/</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>ceph://192.0.0.1:6789/</value>
</property>
<property>
<name>ceph.root.dir</name>
<value>/hadoop-dir</value>
</property>

I think when visit path "ceph://192.0.0.1:6789:/123",it should enter /hadoop-dir/123 on cephfs.
But it seems not work like this.
When we run hbase on cephfs, if config the ceph.root.dir not use "/", it will throw some exception.
Is it a bug or I misunderstand it ๏ผŸ

CephFS HDFS clients failing to respond to cache pressure

I've been running a CephFS system for a while now (currently Ceph v0.94.7). This cluster is primarily used for HDFS access via Apache Spark using the cephfs-hadoop shim.

I've encountered frequent cases where the cephfs-hadoop based clients put the cluster into a HEALTH_WARN state with messages about the clients failing to respond to cache pressure.

I've only begun to start debugging this issue, but I wanted to start here and get an idea where I might need to focus my search. What can cause a cephfs client to misbehave like this? Is there some cephfs messages that might not be handled properly in this hdfs shim?

Exception in namenode join java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): ceph://dellnode1:6789/ is not of scheme 'hdfs'.

I meet the the following error when I run start-all.sh command with configured core-site.xml.
Error:
`2017-10-10 15:35:52,615 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2017-10-10 15:35:52,618 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode []
2017-10-10 15:35:52,909 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-10-10 15:35:53,029 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-10-10 15:35:53,029 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started
2017-10-10 15:35:53,311 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): ceph://dellnode1:6789/ is not of scheme 'hdfs'.
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:371)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:353)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:406)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:483)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:503)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:670)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:655)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1304)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1370)

2017-10-10 15:35:53,315 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-10-10 15:35:53,316 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at dellnode1/10.10.1.1
************************************************************/`

and the contents of my core-site.xml file is listed as follows:

<property>
	<name>hadoop.tmp.dir</name>
	<value>file:///home/hadoop/temp</value>
</property>

<property>
	<name>fs.defaultFS</name>
	<value>ceph://dellnode1:6789/</value>
</property>

<property>
	<name>ceph.conf.file</name>
	<value>/etc/ceph/ceph.conf</value>
</property>

<property>
	<name>ceph.auth.id</name>
	<value>admin</value>
</property>

<property>
	<name>ceph.auth.keyring</name>
	<value>/etc/ceph/ceph.client.admin.keyring</value>
</property>

<property>
	<name>ceph.data.pools</name>
	<value>hadoop1</value>
</property>

<property>
	<name>ceph.root.dir</name>
	<value>/mnt/cephfs</value>
</property>

<property>
	<name>ceph.object.size</name>
	<value>67108864</value>
</property>

<property>
	<name>fs.ceph.impl</name>
	<value>org.apache.hadoop.fs.ceph.CephFileSystem</value>
</property>

<property>
	<name>fs.AbstractFileSystem.ceph.impl</name>
	<value>org.apache.hadoop.fs.ceph.CephFs</value>
</property>

`

I have added related *.jar files in the lib directory of hadoop, and I also have added the path of related *.jar files with $HADOOP_CLASSPATH in hadoop-env.sh.

`
[root@dellnode1 hadoop]# ls /usr/local/hadoop-2.4.0/lib
cephfs-hadoop-0.80.6.jar libcephfs.jar native

[root@dellnode1 hadoop]# ls -l /usr/local/hadoop-2.4.0/lib/native/
total 4412
lrwxrwxrwx 1 root root 22 Oct 10 12:21 libcephfs_jni.so -> libcephfs_jni.so.1.0.0
-rwxr-xr-x 1 root root 106272 Oct 10 12:18 libcephfs_jni.so.1
-rwxr-xr-x 1 root root 106272 Oct 10 12:18 libcephfs_jni.so.1.0.0
-rw-r--r-- 1 root root 1084656 Oct 9 22:05 libhadoop.a
-rw-r--r-- 1 root root 1487268 Oct 9 22:05 libhadooppipes.a
lrwxrwxrwx 1 root root 18 Oct 9 21:35 libhadoop.so -> libhadoop.so.1.0.0
-rwxr-xr-x 1 root root 640963 Oct 9 22:05 libhadoop.so.1.0.0
-rw-r--r-- 1 root root 582048 Oct 9 22:05 libhadooputils.a
-rw-r--r-- 1 root root 298218 Oct 9 22:05 libhdfs.a
lrwxrwxrwx 1 root root 16 Oct 9 21:35 libhdfs.so -> libhdfs.so.0.0.0
-rwxr-xr-x 1 root root 200058 Oct 9 22:05 libhdfs.so.0.0.0
`

content in hadoop-env.sh

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hadoop-2.4.0/lib/cephfs-hadoop-0.80.6.jar:/usr/local/hadoop-2.4.0/lib/libcephfs.jar

I am curious how to solve this issue. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.