GithubHelp home page GithubHelp logo

purush7 / golang-distributed-filesystem Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ligfx/golang-distributed-filesystem

0.0 0.0 0.0 1.96 MB

HDFS-alike in Go. Written in 2014 to learn the language and get a job.

Go 99.68% Makefile 0.32%

golang-distributed-filesystem's Introduction

GoDoc Build Status

Writing a HDFS clone in Go to learn more about Go and the nitty-gritty of distributed systems.

Features/TODO

  • MetaDataNode/DataNode handle uploads
  • MetaDataNode/DataNode handle downloads
  • DataNode dynamically registers with MetaDataNode
  • DataNode tells MetaDataNode its blocks on startup
  • MetaDataNode persists file->blocklist map
  • DataNode pipelines uploads to other DataNodes
  • MetaDataNode can restart and DataNode will re-register (heartbeats)
  • Tell DataNodes to re-register if MetaDataNode doesn't recognize them
  • Drop DataNodes when they go down (heartbeats)
  • DataNode sends size of data directory (heartbeat)
  • MetaDataNode obeys replication factor
  • MetaDataNode balances based on current reported space
  • MetaDataNode balances based on expected new blocks
  • MetaDataNode handles not enough DataNodes for replication
  • Have MetaDataNode manage the block size stuff (in HDFS, clients can change this per-file)
  • Re-replicate blocks when a DataNode disappears
  • Drop over-replicated blocks when a DataNode comes up
  • Looking at DataNode utilization should take into account the DeletionIntents and ReplicationIntents
  • Grace period for replicating new blocks
  • MetaDataNode balances blocks as it runs!
  • Record hash digest of blocks, reject send if hash is wrong
  • DataNode needs to keep track of blocks it's receiving / deleting / checking so that the integrity checker can run only on real blocks
  • Remove blocks if checksum doesn't match
  • Run a cluster in a single process for testing
  • Structure things better
  • Resiliency to weird protocol stuff (run the RPC loop manually?)
  • Command line parser doesn't work that well (try "main datanode -help")
  • Events from servers for testing
  • Better configuration handling (defaults)
  • Allow decommissioning nodes
  • Better logging, so warnings normally can be fatal for tests (two levels: warn that this process broke, and warn that somebody we're communicating with broke)
  • Don't need to wait around to delete blocks, just prevent any new reads and we'll come back to them
  • DataNode should do stuff on startup, and then spawn workers, not just spawn everybody (race conditions with address and data directories)
  • Support multiple MetaDataNodes somehow (DHT? Raft? Get rid of MetaDataNodes and use Gossip?)
  • Keep track of MoveIntents (subtract from predicted utilization of node), might fix the volatility when re-balancing
  • HashiCorp claims heartbeats are inefficient (linear work aafo number of nodes). Use Gossip?
  • Don't force a long-running connection for creating a file, give the client a lease and let them re-connect
  • If a client tries to upload a block and every DataNode in its list is down, it needs to get more from the MetaDataNode.
  • Keep track of blocks as we're creating a file, if the client bails before committing then delete the blocks.

golang-distributed-filesystem's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.