Comments (12)
What's the alternative to Redis? Is it inventing your own framing protocol? Or is it simply not supporting a command line pipe chain?
from dat.
All of your points and upsides make sense to me. Are there any downsides? This doesn't introduce a dependency on Redis, just a framing protocol that shares its name. (Sort of equivalent to making use of MongoDB's BSON, which some projects do.)
You say you don't think it'll add much downside, and I haven't looked at binary-csv's implementation, but is it already investigating the contents of lines to handle cases like newlines inside of quoted strings?
Also, if possible, I would recommend integrating binary-csv into dat so that you can cut out the double pipe. Doing curl http://some-website.com/huge_data.csv | dat
is a lot simpler of an introduction to people cutting their teeth on dat and the command line generally.
from dat.
Also, if possible, I would recommend integrating binary-csv into dat so that you can cut out the double pipe.
👍
from dat.
The only downside I can really see is there are features in the protocol it doesn't look like you're using, e.g. multibulk replies & errors.
This weekend I wrote multibuffer for similar purposes. I also similarly based it off of the Redis protocol but changed a couple of things based on experience with working with the Redis protocol. Rather than having to scan through the buffer and read chunks I optimized it for Node's Buffer (and bops) interactions by using a fixed-width length prefix for each buffer segment.
I left off the initial frame count because I didn't need it, but it would be easy to add in.
from dat.
The only downside that sprang to mind is being cut off from other newline-delimited text processing tools. This could be mitigated by having converters in the pipe chain, but I would be wary of encouraging people to build "plugins" in this ecosystem as opposed to general purpose unix utilities.
from dat.
Like @brycebaril says, Redis has a lot of extra stuff in there that you probably don't need. However, there are already multiple implementations of a Redis client in node, so the extra stuff might not be a burden.
In your Unix pipe use case, the actual Redis protocol includes responses, but Unix pipes are one-way. So perhaps you can use TCP at that layer.
If you are using TCP, something to keep in mind with the Redis protocol is that massive performance gains can be achieved by allowing pipelining. That is, keeping a window of requests that have been sent but whose responses have not yet been read. However, the current Redis protocol does not support responding to commands out of order, or even reliably identifying which command a response is from. The downside then is that, when there are bugs, you might end up mixing up the commands and their replies. If you are doing write-only, then it probably doesn't matter, but as soon as you start reading data with this protocol, the lack of command id is a serious design consideration.
from dat.
@konklone @waldoj I am most likely going to put CSV and JSON support into dat
core as both of those formats are a) really prevalent and b) non-trivial to parse in a streaming fashion. There will also be a generic newline separated data parser as well. I'd like to limit it to these three, though. Anything else can be built on top and use the redis protocol if it wants to be fast.
This means that you'll have to tell dat what type of data you're importing. The data will be split into rows and the raw rows will be stored. This will make importing super fast. On read the data will be converted to JSON.
Rough ideas for API (optional, longhand and shorthand versions are shown on same line for brevity):
cat data.csv | dat --csv --sep="\n" -s "\n" --delim="," -d ","
cat newline-separated-data | dat --sep="\n" -s "\n" --preview -p
cat stream-of-json | dat --json --path "rows.*"
So basically,
- JSON or CSV parsers built in to dat
- Newline separated raw data or redis-protocol delimited raw data parsers built in to dat
- All data replication/more complex use cases in dat should dogfood the above parsers
@jden Great point. For the case of CSV it's actually impossible to write sane modular command line workflows using newlines because of the special use of newlines within the CSV spec. I think CSV is an outlier though, and shouldn't technically be considered a newline delimited format.
@brycebaril @mranney the extra features are a good point to bring up. For a lot of bulk loading use cases I don't actually care about the response, for example if I am a command line utility piping data to another command line utility on the same machine (from a file into a local database) I just want to give data, not receive conformation that everything went okay. The responses seem more for remote operations. Do you see any issues posed here?
Also regarding the redis protocol I actually was trying to figure out, but couldn't find anything conclusive, if you have to escape newlines with the redis protocol.
For example, will this break?
*1
$9
hey
there
According to the logic of the parser in node_redis it won't break, but I wasn't sure if other parsers are too strict.
from dat.
I'd like to limit it to these three, though.
I totally support this decision. Maybe in a few years, JSON will be old and busted, and something else will be the new hotness, but right now, you've got 95% of the bases covered with maybe 20% of the effort that would be required to support something like XML.
from dat.
Redis protocol is binary safe, so if you say that 9 bytes are to follow, they can all be newlines, or a JPEG, or whatever.
from dat.
I'm going to go with this for now https://github.com/brycebaril/multibuffer-stream
Thanks @brycebaril. I'll keep this issue open as I still want to consider Redis protocol support -- I just don't know yet if I need the response semantics. If I do then it makes sense, if I don't then multibuffer-stream will do just fine
from dat.
Just a quick thought — what about something like find(1)'s -print0 and xargs(1)' -0 options? That is, using a literal NUL (\0) to separate records in the pipeline. See:
http://en.wikipedia.org/wiki/Xargs#The_separator_problem
Not sure how the CSV spec handles a literal \0, but it might be a simple way to handle this.
from dat.
in case anyone is interested I wrote up a bit on multibuffers, which i'll be referring to within dat as '.buff
https://github.com/maxogden/dat/blob/master/notes.md#the-buff-format
@phred thanks for the link, I didn't know that's how print0 and xargs do it. I think the above buff format will work a little bit better as it is a nice compromise between a delimited and a framed format
from dat.
Related Issues (20)
- Weekly Digest (5 January, 2020 - 12 January, 2020) HOT 1
- Weekly Digest (12 January, 2020 - 19 January, 2020)
- Weekly Digest (19 January, 2020 - 26 January, 2020)
- An in-range update of request is breaking the build 🚨 HOT 1
- dat doctor crashes when running inside docker container HOT 5
- dat share until a threshold of peers have an up to date version
- request is deprecated HOT 4
- Module missing from dat 14.0.0 Linux binary HOT 3
- Link to Dat Desktop in README.md is incorrect. HOT 3
- Error: Could not satisfy length
- Cannot publish a dat, doctor command is missing, any ports should I forward to me? HOT 6
- dat is ignoring all files in folder? HOT 23
- Looking for maintainers HOT 6
- How could Dat protocol be suitable for blockchain or transactional ledger
- dat-14.0.2-win-x64 is not starting HOT 1
- dat not sharing files other than dat.json HOT 18
- dat not connecting on any machine or network
- Install error with dat using npm on MacBook Pro (Intel version)
- Cannot connect to Dat network HOT 3
- Using dat as a background process HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dat.