I've recently had to work with some data that seems very much dat-data. <ol dir="a

I have something very similar to dominictarr's workflow deion. <p dir="auto"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="organization" data-hovercard-

idea: layers and column sets about dat HOT 7 CLOSED

dominictarr commented on August 16, 2024

idea: layers and column sets

from dat.

Comments (7)

max-mapper commented on August 16, 2024

Interesting! So here is how dat indexes data in a table right now:

log is the by-sequence index used for live replication, the middle column is where all versions of all rows are stored, and the latest column is to efficiently export the latest version of all rows (e.g. export DB as a CSV more quickly than searching the middle column)

(in reality I do database friendly things like encode the integer keys in a lexicographically sorting encoding)

It seems like this proposal would change this a bit, would you mind sketching out an index table that shows how you think we should store data using this scheme?

from dat.

darelf commented on August 16, 2024

I have something very similar to dominictarr's workflow description.

One question to ponder, how would you tell that a "fix" to a particular row kept in some sort of layer should or should not "overlay" after the underlying data was refreshed?

There are instances where a "fix" is needed but then is corrected or even superseded by the underlying data refresh at a later time, and other instances where it will always need to be fixed in every future version even when the underlying data changes.

from dat.

max-mapper commented on August 16, 2024

@darelf that touches on a thing I've been referring to as 'transforms'. It's yet to be implemented but the idea is to have configurable pre and post save hooks that are chainable and optionally reversible. An example of a transform might be to geocode any column called address that doesnt have a row for latitude or longitude yet. You could register the transform in a pre-hook so the data before geocode would not get written, or post-hook to write the incoming data first and then make a subsequent revision that adds the lat/lon columns.

from dat.

brycebaril commented on August 16, 2024

@dominictarr This description reminds me of how Cassandra does its partitioned row store. Is that similar to what you had in mind?

from dat.

dominictarr commented on August 16, 2024

@maxogden I'm thinking you can keep that (pretty much) the same, but just have 3 of those that you overlay on top of each other, so that you can replicate each independently.

Here is some pseudocode:

function primaryKey(row) {
  return row[2] + row[1] //or whatever
}

var vt = new VirtualTable(rawTable, {key: primaryKey, readonly: true}) //pass in the prototype table
vt.overlay(EDITS = new Table()) //overlay a table
vt.join(remoteTable, {key: primaryKey, readonly: true}) //join another table to this along primary key. This adds new columns.

vt.get(pk, cb) //return a row from the above 3 combined.
vt.update(pk, row, cb) //update a row - writes are split into the writes to tables that own those columns. and to the top overlay first.

@darelf you'd need to match on the original primary key. If the original data didn't have primary keys, and refreshed in a different order it would be pretty much impossible, but that is life.

@brycebaril I'm not aware of cassandra's row store - link?

from dat.

dominictarr commented on August 16, 2024

Oh - I've just remembered that I implemented a basic verison of this a while back in my dat-table module.

You can open muliple files and join them on a key, and also filter a table down to a few columns. This was pretty basic - but was what I needed at the time. This could be rewritten to be streaming, which would be more useful for large data.

https://github.com/dominictarr/dat-table/blob/master/bin.js

from dat.

okdistribute commented on August 16, 2024

After a whole year of feedback from the community, we recently published a new version of dat. Would you mind trying out the new dat with npm install -g dat and see how it fits into this discussion? In the meantime, I'm going to close this issue. You can read about the new dat announcement on the website and how it works in the docs.

from dat.

idea: layers and column sets about dat HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs