GithubHelp home page GithubHelp logo

idea: layers and column sets about dat HOT 7 CLOSED

dominictarr avatar dominictarr commented on August 16, 2024
idea: layers and column sets

from dat.

Comments (7)

max-mapper avatar max-mapper commented on August 16, 2024

Interesting! So here is how dat indexes data in a table right now:

dat-index-diagram

log is the by-sequence index used for live replication, the middle column is where all versions of all rows are stored, and the latest column is to efficiently export the latest version of all rows (e.g. export DB as a CSV more quickly than searching the middle column)

(in reality I do database friendly things like encode the integer keys in a lexicographically sorting encoding)

It seems like this proposal would change this a bit, would you mind sketching out an index table that shows how you think we should store data using this scheme?

from dat.

darelf avatar darelf commented on August 16, 2024

I have something very similar to dominictarr's workflow description.

One question to ponder, how would you tell that a "fix" to a particular row kept in some sort of layer should or should not "overlay" after the underlying data was refreshed?

There are instances where a "fix" is needed but then is corrected or even superseded by the underlying data refresh at a later time, and other instances where it will always need to be fixed in every future version even when the underlying data changes.

from dat.

max-mapper avatar max-mapper commented on August 16, 2024

@darelf that touches on a thing I've been referring to as 'transforms'. It's yet to be implemented but the idea is to have configurable pre and post save hooks that are chainable and optionally reversible. An example of a transform might be to geocode any column called address that doesnt have a row for latitude or longitude yet. You could register the transform in a pre-hook so the data before geocode would not get written, or post-hook to write the incoming data first and then make a subsequent revision that adds the lat/lon columns.

from dat.

brycebaril avatar brycebaril commented on August 16, 2024

@dominictarr This description reminds me of how Cassandra does its partitioned row store. Is that similar to what you had in mind?

from dat.

dominictarr avatar dominictarr commented on August 16, 2024

@maxogden I'm thinking you can keep that (pretty much) the same, but just have 3 of those that you overlay on top of each other, so that you can replicate each independently.

Here is some pseudocode:

function primaryKey(row) {
  return row[2] + row[1] //or whatever
}

var vt = new VirtualTable(rawTable, {key: primaryKey, readonly: true}) //pass in the prototype table
vt.overlay(EDITS = new Table()) //overlay a table
vt.join(remoteTable, {key: primaryKey, readonly: true}) //join another table to this along primary key. This adds new columns.

vt.get(pk, cb) //return a row from the above 3 combined.
vt.update(pk, row, cb) //update a row - writes are split into the writes to tables that own those columns. and to the top overlay first.

@darelf you'd need to match on the original primary key. If the original data didn't have primary keys, and refreshed in a different order it would be pretty much impossible, but that is life.

@brycebaril I'm not aware of cassandra's row store - link?

from dat.

dominictarr avatar dominictarr commented on August 16, 2024

Oh - I've just remembered that I implemented a basic verison of this a while back in my dat-table module.

You can open muliple files and join them on a key, and also filter a table down to a few columns. This was pretty basic - but was what I needed at the time. This could be rewritten to be streaming, which would be more useful for large data.

https://github.com/dominictarr/dat-table/blob/master/bin.js

from dat.

okdistribute avatar okdistribute commented on August 16, 2024

After a whole year of feedback from the community, we recently published a new version of dat. Would you mind trying out the new dat with npm install -g dat and see how it fits into this discussion? In the meantime, I'm going to close this issue. You can read about the new dat announcement on the website and how it works in the docs.

from dat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.