Comments (7)
Interesting! So here is how dat indexes data in a table right now:
log
is the by-sequence index used for live replication, the middle column is where all versions of all rows are stored, and the latest
column is to efficiently export the latest version of all rows (e.g. export DB as a CSV more quickly than searching the middle column)
(in reality I do database friendly things like encode the integer keys in a lexicographically sorting encoding)
It seems like this proposal would change this a bit, would you mind sketching out an index table that shows how you think we should store data using this scheme?
from dat.
I have something very similar to dominictarr's workflow description.
One question to ponder, how would you tell that a "fix" to a particular row kept in some sort of layer should or should not "overlay" after the underlying data was refreshed?
There are instances where a "fix" is needed but then is corrected or even superseded by the underlying data refresh at a later time, and other instances where it will always need to be fixed in every future version even when the underlying data changes.
from dat.
@darelf that touches on a thing I've been referring to as 'transforms'. It's yet to be implemented but the idea is to have configurable pre and post save hooks that are chainable and optionally reversible. An example of a transform might be to geocode any column called address
that doesnt have a row for latitude
or longitude
yet. You could register the transform in a pre-hook so the data before geocode would not get written, or post-hook to write the incoming data first and then make a subsequent revision that adds the lat/lon columns.
from dat.
@dominictarr This description reminds me of how Cassandra does its partitioned row store. Is that similar to what you had in mind?
from dat.
@maxogden I'm thinking you can keep that (pretty much) the same, but just have 3 of those that you overlay on top of each other, so that you can replicate each independently.
Here is some pseudocode:
function primaryKey(row) {
return row[2] + row[1] //or whatever
}
var vt = new VirtualTable(rawTable, {key: primaryKey, readonly: true}) //pass in the prototype table
vt.overlay(EDITS = new Table()) //overlay a table
vt.join(remoteTable, {key: primaryKey, readonly: true}) //join another table to this along primary key. This adds new columns.
vt.get(pk, cb) //return a row from the above 3 combined.
vt.update(pk, row, cb) //update a row - writes are split into the writes to tables that own those columns. and to the top overlay first.
@darelf you'd need to match on the original primary key. If the original data didn't have primary keys, and refreshed in a different order it would be pretty much impossible, but that is life.
@brycebaril I'm not aware of cassandra's row store - link?
from dat.
Oh - I've just remembered that I implemented a basic verison of this a while back in my dat-table module.
You can open muliple files and join them on a key, and also filter a table down to a few columns. This was pretty basic - but was what I needed at the time. This could be rewritten to be streaming, which would be more useful for large data.
https://github.com/dominictarr/dat-table/blob/master/bin.js
from dat.
After a whole year of feedback from the community, we recently published a new version of dat. Would you mind trying out the new dat with npm install -g dat
and see how it fits into this discussion? In the meantime, I'm going to close this issue. You can read about the new dat announcement on the website and how it works in the docs.
from dat.
Related Issues (20)
- Weekly Digest (5 January, 2020 - 12 January, 2020) HOT 1
- Weekly Digest (12 January, 2020 - 19 January, 2020)
- Weekly Digest (19 January, 2020 - 26 January, 2020)
- An in-range update of request is breaking the build 🚨 HOT 1
- dat doctor crashes when running inside docker container HOT 5
- dat share until a threshold of peers have an up to date version
- request is deprecated HOT 4
- Module missing from dat 14.0.0 Linux binary HOT 3
- Link to Dat Desktop in README.md is incorrect. HOT 3
- Error: Could not satisfy length
- Cannot publish a dat, doctor command is missing, any ports should I forward to me? HOT 6
- dat is ignoring all files in folder? HOT 23
- Looking for maintainers HOT 6
- How could Dat protocol be suitable for blockchain or transactional ledger
- dat-14.0.2-win-x64 is not starting HOT 1
- dat not sharing files other than dat.json HOT 18
- dat not connecting on any machine or network
- Install error with dat using npm on MacBook Pro (Intel version)
- Cannot connect to Dat network HOT 3
- Using dat as a background process HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dat.