GithubHelp home page GithubHelp logo

dicl / veloxdfs Goto Github PK

View Code? Open in Web Editor NEW
26.0 14.0 4.0 16.68 MB

DHT-based Distributed File System for MapReduce Jobs

Home Page: http://dicl.github.io/VeloxDFS/index.html

License: Apache License 2.0

Shell 1.09% C++ 90.57% Makefile 1.11% M4 5.75% Vim Script 0.05% Java 1.07% Python 0.36%
dfs parallel-computing distributed-computing hdfs boost-libraries file-system hpc

veloxdfs's Introduction

VeloxDFS {#mainpage}


Build Status Release Slack room

VeloxDFS is a decentralized distributed file system based on ChordDHT and HDFS.

This distributed file system serves as the foundation and essential component of the Velox Big Data Framework (VBDF), however, it can be used independently of the other components such as VeloxMR (MapReduce Framework).

Key features of current VeloxDFS implementation includes:

  • No central directory service such as in HDFS NameNode.
  • Logical block representation where the size adapts to the current workload.
  • HADOOP API to be used instead of HDFS.

VELOX ECOSYSTEM

Related projects

Project Description URL
VeloxMR experimental MapReduce engine based on VeloxDFS https://github.com/DICL/VeloxMR
eclipsed deployment/debugging helper script writen in RUBY https://github.com/DICL/eclipsed
velox-hadoop VeloxDFS JAVA library for Hadoop https://github.com/DICL/velox-hadoop
velox-deploy-ansible ansible playbook to automatize velox/hadoop installation https://github.com/DICL/velox-deploy-ansible
hadoop-etc Hadoop configuration files to use VeloxDFS https://github.com/vicentebolea/hadoop-etc
velox-report Project to benchmark and log VeloxDFS performance https://github.com/vicentebolea/velox-report

USAGE

To deploy the system please refer to velox command and its following options:

$ veloxd up|down|restart|status

The command line API has the following options:

$ veloxdfs put|get|cat|ls|rm|format|show

The C++ API can be found at vdfs.hh and DFS.hh files, as for the JAVA API, it can be found at src/java directory.

COMPILING & INSTALLING

Further information can be found it in: Installation

Compiling requirements

  • C++14 support, this is: GCC >= 4.9, Clang >= 3.4, or ICC >= 16.0.
  • Boost library >= 1.56.
  • Sqlite3 library.
  • GNU Autotools (Autoconf, Automake, Libtool).
  • Unittest++ [optional].

For single user installation for developers

$ mkdir -p local_eclipse/{tmp,sandbox}                 # Create a sandbox directories
$ cd local_eclipse                                     # enter in the directory
$ git clone [email protected]:DICL/VeloxDFS.git           # Clone the project from github
$ cd VeloxDFS
$ sh autogen.sh                                        # Generate configure script 
$ cd ../tmp                                            # Go to building folder
$ sh ../VeloxDFS/configure --prefix=`pwd`/../sandbox # Check requirements and generate the Makefile

# If you get a boost error go the FAQ section of the README

### This last command will be needed whenever you want to recompile the source
$ make [-j#] install                                   # Compile & install add -j flag to speed up

Now edit in your ~/.bashrc or ~/.profile:

export PATH="/home/*..PATH/To/eclipse/..*/sandbox/bin":$PATH
export LIBRARY_PATH="/home/*..PATH/To/eclipse/..*/sandbox/lib"
export C_INCLUDE_PATH="/home/*..PATH/To/eclipse/..*/sandbox/include"

Default settings for VELOXDFS

"log" : {
  "type" : "LOG_LOCAL6"
  "name" : "ECLIPSE"
  "mask" : "DEBUG"
},

"cache" : {
  "numbin"      : 100,
  "size"        : 200000,
  "concurrency" : 1
},

"filesystem" : {
  "block"    : 137438953,
  "buffer"   : 512,
  "replica"  : 1
}

Further information can be found it in: Conf reference

FAQ

  • Question : configure stops with errors related to boost library.

  • Answer : It probably means that you do not have boost library installed in the default location, in such case you should specify the boost library location.

      sh ../VeloxDFS/configure --prefix ~/sandbox --with-boost=/usr/local --with-boost-libdir=/usr/local/lib
    

    In this example we assume that the boost headers are in /usr/local/include while the library files are inside /usr/local/lib.

AUTHORS

veloxdfs's People

Contributors

cristalcho avatar deukyeon avatar kbjin avatar moohnam avatar olzhabay avatar vicentebolea avatar wonbaekimys avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

veloxdfs's Issues

Change usage of std::string to movable object

Constructor of std::string does not allow any moving. Instead, it always copies all data.
Except if pointer std::string is not used. However, that is used minorly in project.
This leads to too many unnecessary copying around data, which can be size of block (64mb).

Proposal, use custom class that will not be copying data, but moving it. Also, it will control number of owners and gets destructed when owners<=0

class Slice {
  char* ptr;
  uint64_t size;
  atomic<int> owners;
}

Other way is to use std::shared_ptrstd::string

Solving this will do some improvement to performance

DFSput removes \n between the chunks.

Here I ilustrate how the error occurs, this is the diff of a file called config.h before inserting it with dfsput and after received with dfsget. I believe that the bug is in the dfsput.cc.

--- config.h    2016-03-09 15:34:48.967514994 +0900
+++ build/config.h      2016-03-09 15:31:52.664229099 +0900
@@ -29,6 +29,7 @@

 /* Define to 1 if you have the <memory.h> header file. */
 #define HAVE_MEMORY_H 1
+
 /* Define to 1 if you have the <netdb.h> header file. */
 #define HAVE_NETDB_H 1

@@ -65,6 +66,7 @@

 /* Name of package */
 #define PACKAGE "eclipse"
+
 /* Define to the address where bug reports for this package should be sent. */
 #define PACKAGE_BUGREPORT "[email protected]"

@@ -94,7 +96,8 @@
    #define below would cause a syntax error. */
 /* #undef _UINT32_T */

-/* Define for Solaris 2.5.1 so the uint8_t typedef from <sys/synch.h>,   <pthread.h>, or <semaphore.h> is not used. If the typedef were allowed, the
+/* Define for Solaris 2.5.1 so the uint8_t typedef from <sys/synch.h>,
+   <pthread.h>, or <semaphore.h> is not used. If the typedef were allowed, the
    #define below would cause a syntax error. */
 /* #undef _UINT8_T */

Adding doxygen

Doxygen is the only software which generates documentation from C++ code, we have to start commenting our functions, making those comment using doxygen syntax will be trivial.

Also github supports creating a website which our project documentation, class diagrams and so on.

Getting more done in GitHub with ZenHub

Hola! @olzhabay has created a ZenHub account for the DICL organization. ZenHub is the leading team collaboration and project management solution built for GitHub.


How do I use ZenHub?

To get set up with ZenHub, all you have to do is download the browser extension and log in with your GitHub account. Once you do, you’ll get access to ZenHub’s complete feature-set immediately.

What can ZenHub do?

ZenHub adds a series of enhancements directly inside the GitHub UI:

  • Real-time, customizable task boards for GitHub issues;
  • Burndown charts, estimates, and velocity tracking based on GitHub Milestones;
  • Personal to-do lists and task prioritization;
  • “+1” button for GitHub issues and comments;
  • Drag-and-drop file sharing;
  • Time-saving shortcuts like a quick repo switcher.

Add ZenHub to GitHub

Still curious? See more ZenHub features or read user reviews. This issue was written by your friendly ZenHub bot, posted by request from @olzhabay.

ZenHub Board

Getting more done in GitHub with ZenHub

Hola! @vicentebolea has created a ZenHub account for the DICL organization. ZenHub is the leading team collaboration and project management solution built for GitHub.


How do I use ZenHub?

To get set up with ZenHub, all you have to do is download the browser extension and log in with your GitHub account. Once you do, you’ll get access to ZenHub’s complete feature-set immediately.

What can ZenHub do?

ZenHub adds a series of enhancements directly inside the GitHub UI:

  • Real-time, customizable task boards for GitHub issues;
  • Burndown charts, estimates, and velocity tracking based on GitHub Milestones;
  • Personal to-do lists and task prioritization;
  • “+1” button for GitHub issues and comments;
  • Drag-and-drop file sharing;
  • Time-saving shortcuts like a quick repo switcher.

Add ZenHub to GitHub

Still curious? See more ZenHub features or read user reviews. This issue was written by your friendly ZenHub bot, posted by request from @vicentebolea.

ZenHub Board

Problems linking with ASIO serialization

@nammh and I found that in some of our servers Eclipse does not compile since it can not find the boost libraries.

For GCC you must specify in LIBRARY_PATH the location of the the libraries, normally writing in your .bashrc the next line will make the trick:

export LIBRARY_PATH=/usr/local/lib

As for Clang++ compiler, things are a bit harder since it does not read LIBRARY_PATH. In that case, write in your .bashrc:

export CONFIG_SITE="~/.clang_fix"

Then create that ~/.clang_fix file and write inside:

#!/bin/bash
export LDFLAGS+="-L/usr/local/lib"

Release 1.1.0 task list

There is a few changes to be made for the next minor release:

  • #51 : |compatible| Add doxygen to document the code
  • #50 : |compatible| Change raw pointers to smart pointers (Many bugs were caused by that #46)
  • #52 : |compatible| Remove commented code and adopt unified coding style
  • #49 : |compatible| Decide open source license [Critical]
  • #41 : |incompatible| Change the way the EclipseDFS initializes [Critical]
  • #55 : |incompatible| Change dfsput/dfsget... to dfs [put|get|format|ls]
  • #59 : |incompatible| dfsformat [Critical]
  • #56 : |incompatible| Create dfs [show] to display blocks locations of a file.

Note that previous releases were pre-releases, version 1.1.0 aims to be a release.
Compatibles changes will be released in versions 1.0.*, incompatibles will wait until
all of them are done to relase 1.1.0

Release 1.2.0 task list

  • : |compatible| Add replica support [Critical]
  • : |compatible| Add Persistence of the network in case of failure [Critical]

About dfsget and retrieve operation in DFS

I am implementing the logic for dfsget inside the eclipse network. The main ideas is that the leader node request file blocks to the neighbors, and file blocks arrives in a asynchronous manner. So in order to send them to the client i have to wait to collect all the block so that i can send them in order. So should i store in disk the blocks?

Thread pool for std::future

When calling std::async, it creates new thread. Better having thread pool and enqueue to it functions and lambdas with return of std::future.

Logical block configuration

How can I configure logical block features?
Just adding adding --enable-lblocks options as configuration, /eclipse_node abort as I execute the program
Is there any additional options that I should handle?

LocalIO Update

In localio function update should pass value variable by reference, so it will not double copy.

Iterative map workflow discussion

We already determined the way of implement the iterative map workflow. However, @kbjin mentioned that not having shuffling phase in between the map iterations might lead to an inconsistent since the key values created from the previous iteration might be scattered in different nodes. @kbjin can you explain here your question?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.