GithubHelp home page GithubHelp logo

bnclabs / gostore Goto Github PK

View Code? Open in Web Editor NEW
35.0 6.0 3.0 18.58 MB

Storage algorithms.

License: MIT License

Go 97.14% Makefile 0.44% Python 0.48% Java 1.94%
balanced-tree btree llrb golang malloc mvcc multithreading lsm

gostore's Introduction

Storage algorithms in golang

GoDoc Go Report Card

Package storage implement a collection of storage algorithm and necessary tools and libraries. Applications wishing to use this package please checkout interfaces defined under api/.

As of now, two data structures are available for indexing key,value entries:

  • llrb in memory left-leaning red-black tree
  • bubt immutable, durable bottoms up btree.
  • bogn multi-leveled, lsm based, ACID compliant storage.

There are some sub-packages that are common to all storage algorithms:

  • flock read-write mutex locks across process.
  • lib collections of helper functions.
  • lsm implements log-structured-merge.
  • malloc custom memory alloctor, can be used instead of golang's memory allocator or OS allocator.

How to contribute

Issue Stats Issue Stats

  • Pick an issue, or create an new issue. Provide adequate documentation for the issue.
  • Assign the issue or get it assigned.
  • Work on the code, once finished, raise a pull request.
  • Gostore is written in golang, hence expected to follow the global guidelines for writing go programs.
  • If the changeset is more than few lines, please generate a report card.
  • As of now, branch master is the development branch.

gostore's People

Contributors

prataprc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gostore's Issues

BUBT: Settings parameter for min/max key/value.

Similar to llrb. Add settings parameters:

  • MinKeysize, below which keys won't be accepted.
  • MaxKeysize, above which keys won't be accepted.
  • MinValsize, below which values won't be accepted.
  • MaxValsize, above which values won't be accepted.

Add validation logic to check and fail.

BUBT: Cache all intermediate nodes.

When Opening a well-formed bubt-index cache the intermediate nodes (m-blocks) in memory. Either using golang map or using llrb or some other fast lookup mechanism. It should typically be a map of fpos->mblock buffer.

Implement this as a configurable feature.

BUBT: Implement validate.

Tree validation. This would involve walking the entire tree.

  • Check whether keys are in sorted order.
  • Check for metadata information, whether it is consistent with configured settings.
  • Check for node utilisation.
  • Check for tree depth.
  • Check for block alignment.
  • Check for value file's sanity.

Use piecewise Iteration for a full table scan.

Full table scan on active tree will have following issues:

  • With MVCC disabled, Iterations will lock the entire tree, until it
    completes.
  • With MVCC enabled, Iterations won't lock the writer and won't
    interfer with other concurrent readers. But if there are hundreds
    and thousands of mutations happening in the background, while the
    iterator is holding the snapshot, it can lead to huge memory
    pressure.

To avoid this, implement a PiecewiseIterator that scans the tree
part by part and stitches them together to simulate a full-table scan.
Here are some implementation guidelines.

  • Can be implemented only when application can maintain a monotonically
    increasing seqno for every mutation (CREATE, UPDATE, DELETE).
  • LLRB should be created with metadata.bornseqno, metadata.deadseqno
    enabled.
  • PiecewiseIterators will repeat a large number of small Iteration on
    the tree until it completes a full table scan.
  • Application should supply the tillseqno as a form of timestamp to
    filter mutations that happens after the point in time that started
    PiecewiseIteration.
  • The first iteration will have startkey as nil and endkey as nil.
  • Each iteration will read only 1000 entries. And the last key will be
    remembered as the startkey for the next iteration with inclusion
    set to high.
  • For every iteration, the entries read from the tree will be filtered
    and returned to the application.
    • Get the node's timestamp by picking the larger value between
      bornseqno and deadseqno.
    • Node's timestamp should be less than or equal to tillseqno. Else
      skip.

The key difference between Iterator and PiecewiseIterator is that
PiecewiseIterator does not give a point time view of the tree snapshot.
For example, if an entry that is not yet read by the piecewise-iterator
is updated after the iteration has started, then the entry might be skipped
and this entry won't be part of the final output.

This implies:

  • The final output won't contain the full sample set of entries.
  • Cannot be queried with correctness.

Hopefully it can still be used with LSM reads.

BUBT: Implement Stats().

Returned statistics map should include only pre-computed statistics. Avoid walking over the tree.
This must be a cheap call. Some of the statistics should include:

  • keymem: total payload of key.
  • valmem: total payload of value.
  • paddingmem: bytes wasted, that neither has payload not overhead fields.
  • n_mblocks: number of m-blocks in m-index file.
  • n_zblocks: number of z-blocks in z-index file.
  • n_vblocks: number of blocks in value file.
  • seqno: highest seqno associated with a key,value entry.
  • epoch: timestamp since January 1, 1970 UTC.
  • n_deleted: number of entries marked as deleted.

All of them can be part of the infoblock.

BUBT: Optimise memory allocation.

  • z-block and m-block allocations can be recycles once it is flushed to disk.
  • benchmark block level functions and entry level functions.
  • memory profile BUBT for allocations and free.

LSM: Testcase with deleted entries.

Add a test case that has deleted entries and entries that has not suffered deletes. Add another case where deleted entries are created once again.

LSM: Simple test cases.

Create a local data-structure and a list of the same to test LSM logic. This purpose of this test case , when compared to existing LLRB based merge index, is to make corner case testing easer to create and maintain.

LLRB: Optimize go_writer.go:respch.

go_writer.go exports several API that internally uses unbuffered
chan []interface{} as response channel that are purely used for
synchronisation.

May be this can be changed to chan struct{}.

BUBT: Implement Log.

Should provide two variant.

  • One when index is actively used and periodically logged.
  • Another for involved details on the index. Before index become active and/or before it is about to be destroyed.

LSM: Validation and benchmark.

Create a command line tool, may be outside gostore repository, to:

  • Validate LSMRange and LSMMerge APIs.
  • Performance benchmark LSMRange and LSMMerge against single index iteration.
  • Memory and CPU profile LSMRange and LSMMerge.
  • Try with 1,2,4,8 iterators. Even better make the number of indexes to merge configurable via cmd-line.

Let this be easily repeatable. Add cmd-line options to integrate the tool with gostore CI.

Statistics: Overview and detailed descriptions.

Statistics are vital for debugging and characterisation. Adequate stats are implemented for malloc/ bubt/ and llrb/ packages. Create a page with a short overview of storage and memory stats and detailed description on how to interpret then and relationship between stats.

Memoryutilization: Improvise and refactor.

Package malloc/ has global variable MEMUtilization used for
Blocksize calculation. Once Slab struct is created and
Blocksizes and SuitableSize are localized to Slab, we can
configure MEMUtilization per slab instance instead of keeping
it global.

Package llrb/ has config-parameter called memutilization used
while validation. There is also an API ExpectedUtilization that
makes the config-parameter redundant. This parameter is used while
validating llrb tree. I suppose we can can avoid both the
config-paramter and the API, instead add an argument to Validate
for expected-memory-utilization ratio.

README page.

Start a proper README page for gostore. Every sub-package should have a README package. Let the README content have the following template.

  • State the goals in clear bullet points.
  • Current status of the project, let be informative for potential users of gostore.
  • Quicklinks.
  • Settings are important in gostore. Add an introduction and relevant links to default-settings for bubt, llrb, malloc.
  • Introduction to ideas and concepts.
  • List panic cases, and its recovery.
  • Projects using gostore.
  • Links to external articles, papers, news, blogs.
  • How to contribute.

For sub-pkg README, keep the original details with sub-pkg's doc.go. Include introduction to basic concepts, ideas, and getting-started guid.

Cleanup panic and recovery.

  • Panic only when absolutely required.
  • If a function / go-routine panics, document the same.
  • Every go-routine should be coded in a separate file, document exit/panic/recover cases for all go-routines.

LLRB: Read-lock for Clone()

Right now Clone() implementation acquires a write while cloning the
llrb tree. I think this can be converted into read-lock.

BUBT: Implement Fullstats().

Returned statistics map should include all statistics that are relevant to the btree index.

  • Include build time statistics.
  • Include IndexReader / IndeSnapshot related statitics.
  • If there is a need to walk the entire tree to compute statistic information, this the place to do. If possible try to move this computing in Build().

LLRB: Stats review and writeup.

  • Review stats counting in LLRB storage.
  • Create a write-up on stats accounting, its meaning and how
    they are related to each other. This must be a base document
    for every one who does characterization.
  • Check for whether stats values need to be atomically protected.

Slack channel.

Start a slack channel for the following projects - golog, gosettings, gson, gofast, gostore. If possible start it under gophers.slack.com

Reverse link them with golog, gosettings, gson, gofast, gostore.

LLRB: Block diagram of go-routines.

Picture can say thousand words. Create a block diagram of go-routines, how the interact with application logic and underlying socket. Mostly it is important to trace the execution path and its exceptional cases.

LLRB: Testcases with lsm enabled.

Log-Structured-Merge

Log-Structured-Merge (LSM) is available off-the-shelf with LLRB store.
Just enable lsm via settings while creating the LLRB tree. Enabling
LSM will have the following effects:

  • DeleteMin, DeleteMax and Delete will simply be marked as deleted
    and its deadseqno will be updated to currseqno.
  • For Delete operation, if entry is missing in the index. An entry
    will be inserted and then marked as deleted with its deadseqno
    updated to currseqno.
  • When a key marked as deleted is Upserted again, its deadseqno will
    be set to ZERO, and deleted flag is cleared.
  • In case of UpsertCAS, CAS should match before entry is cleared from
    delete log.
  • All of the above bahaviour are equally applicable with MVCC enabled.

NOTE: DeleteMin and DeleteMax is not useful when LLRB index is only
holding a subset, called working-set, of the full index.

Malloc: from fair-pools to optimal-pools

At present malloc uses a fair-model to allocate a pool from OS. There
are some considerations while allocating an entire pool from OS:

  • A Pool, for any slab, cannot contain chunks more than Maxchunks.
  • If pool size is too small allocator will end up with too many
    pools for the same chunk-size.
  • If the pool size is too large, then partially utilised pools will
    introduce bad memory-utilisation.

Right now, allocator assumes that number of chunks, allocated by
application, in each slab will be same for all slab-size. This is
probably a good way to start a new arena instance, but for every
new allocation we get to know the histogram of chunk-size for
each slab and with that information we can pick an optimal size
while allocating the next pool from OS.

BUBT: Block diagram of go-routines.

Picture can say thousand words. Create a block diagram of go-routines, how the interact with application logic and underlying socket. Mostly it is important to trace the execution path and its exceptional cases.

Cleanup dummy imports.

There might be imports of the form

import _ "fmt"

or

import "fmt"

var _ = fmt.Sprintf(...)

Remove them. Search the net ask community for alternate ideas.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.