caolan / highland Goto Github PK
View Code? Open in Web Editor NEWHigh-level streams library for Node.js and the browser
Home Page: https://caolan.github.io/highland
License: Apache License 2.0
High-level streams library for Node.js and the browser
Home Page: https://caolan.github.io/highland
License: Apache License 2.0
How can I have multiple consumers receiving data from one object stream in parallel? parallel
only works on streams of streams.
Right now we are limited to only using what highland provides, it would be great if we could extend highland somehow. Either by highland exposing the stream class so we can add our own things to it, or by providing a utility method like _.add('uniq', function(){})
Currently, highland will not really let you take apart the beautiful streaming creations you've created without running into memory leaks if you have used the pipe
method.
I've prototyped a destroy
method as something like:
Stream.prototype.destroy = function () {
_(this._consumers).each(function (consumer) {
this._removeConsumer(consumer);
});
if (this.source) {
this.source._removeConsumer(this);
}
}
The issue is that at https://github.com/caolan/highland/blob/master/lib/index.js#L667, an anonymous function is registered on the drain
event of the target stream. Because the callback is anonymous, it cannot be removed via dest.removeListener('drain', this._onDrainCallback);
(the _onDrainCallback doesn't exist but might be worth creating).
Hi,
I am trying to pipe stream from an array to stdout
var _ = require('highland');
_([1, 2, 3, 4]).pipe(process.stdout);
I tried different node versions, but always get errors
Running node v0.8.26
net.js:497
throw new TypeError('First argument must be a buffer or a string.');
Running node v0.10.25
net.js:612
throw new TypeError('invalid data');
Should I be able to pipe the stream into process.stdout?
See this example:
var s1 = highland([1]);
var s2 = highland([2]);
var out = s1.concat(s2);
out.toArray(function(arr) { console.log(arr); }); // [ 1, 2 ]
console.log(s1.ended); // true
console.log(s2.ended); // true
console.log(out.ended); // undefined - expected true?
I was expecting that out.ended
would also be true, since both its sources have ended?
What's the proper way to protect a limited resource (like number of open fd/sockets) in a map.
_(key_stream, 'string').map(function (key) {
// operation that performs an HTTP request and returns a Promise
}).each(function (res) {
res.then(console.log);
});
The use case here is for each key in an S3 bucket I'd like to perform an operation. I'm treating the key list as a stream of values and then I perform some request, like perhaps retrieving their headers as stored in the S3 bucket.
Is parallel
the appropriate way to prevent the following error?
Possibly unhandled Error: connect EMFILE
at errnoException (net.js:901:11)
at connect (net.js:764:19)
at net.js:842:9
at asyncCallback (dns.js:68:16)
at Object.onanswer [as oncomplete] (dns.js:121:9)
I think with async I would use async.eachLimit
since the stuff inside the map returns right away and allows for the next socket to connect before the request is finished.
I'm working on a highland stream based system for processing asset transformation pipelines. The resulting structure can result in a graph, where a source stream may be fork
ed, processed differently in parallel, and merged back together using concat
.
Here is a reduced test case:
var s = _([1,2]);
var o1 = s.fork().map(function(x){ return x * 2; });
var o2 = s.fork().map(function(x){ return x * 3; });
var out = o1.concat(o2);
out.each(function(x) {
console.log(x);
});
Which I would have expected should be (more or less) equivalent to:
var o1 = _([1,2]).map(function(x){ return x * 2; });
var o2 = _([1,2]).map(function(x){ return x * 3; });
var out = o1.concat(o2);
out.each(function(x) {
console.log(x);
});
This second example outputs "2 4 3 6" as expected, but the first one doesn't output anything.
Is this a bug, or am I missing something?
The docs specify that you can use pipeline as follows:
var through2 = _.pipeline(function (s) {
return s.map(parseJSON).filter(isBlogpost); // etc.
});
this does not seem to be implemented
This issue was spawned by #40
Following along @ http://highlandjs.org/#arrays, I try the first example.
var _ = require("highland")
var shouty = _(['foo', 'bar', 'baz']).map(toUpperCase);
I get the following error on node 0.10.24:
ReferenceError: toUpperCase is not defined
at Object.<anonymous> (/home/jake/code/highland/arrays.js:2:43)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
at node.js:902:3
Same as find() - see #14 - but using the where() syntax to match against properties instead of using a test function.
Hey, Highland looks nice, i'm trying it out currently. What bothers me now is the fact that a stream, that wraps an EventEmitter instance, passes only the first argument with it. It prevents me from doing stuf like
var server = http.createServer()
hls = hl('request', server)
hls.each(function(req, res){
console.log('consuming ')
res.end('Holla') // this fails, since res is undefined
})
is it posible to wrap the params in some way and pass them along?
I'm almost certain I'm doing something wrong here, but with the following in ExpressJS, no result is ever sent to the client and the connection never closes:
app.get( '/whatever', function ( req, res ) {
highlandStream.map( function (v) {
console.log( "Value:", v );
return v;
}).pipe(res);
});
The value will spit out to the console, but nothing happens after that. It would seem the response
stream never gets any data (or at least never does anything with it).
Note: The reason I am filing this here is that Express's response is supposed to be a writeable stream and there are several examples out in the ether that pipe values from other streams to it. I just can't get it to work with a Highland stream.
It seems that highland uses the setImmediate
node function that was only introduced in node >= 0.10.
It's not a problem per se, but it would be nice to say it somewhere (afaIk, I couldn't find any trace of this information).
Same as Stream.take(1)
Returns a new stream with unique values.
This is probably not really a good thing to put in a GH issue, but I figure it will get some attention this way. I recently submitted a question on SO, but I was not able to make a highland.js tag because I don't have enough SO karma points. http://stackoverflow.com/questions/22034943/how-to-use-errors-in-highland-js-map
Just letting you know, feel free to close this issue.
I can imagine that in some cases with combinations of large and slow streams, this can lead to very large memory consumption. It would be useful to have an option to forgo the buffering and just receive the streams in whatever order they arrive, especially with object streams.
I'm getting strange behavior when I do e.g. _(stream).pipe(_.map(function () { /* ... */ }).pipe(response)
. But then, I realized this is not documented anywhere, so maybe I just made it up. Is this supposed to work?
If I wrap a stream that emits an error, the error is uncaught.
This is the stream I'm wrapping: https://github.com/brian-gates/cypher-stream/blob/master/index.js#L36
My syntax is this: _(source).errors(fn). I would expect the errors to be caught, but they are not.
Any help is appreciated.
So I've got this gist, which works fine, all up until: https://gist.github.com/balupton/9187991c53d801f654e3#file-index-js-L120
The .map(log)
before that writes a series of JSON objects to the console as expected:
{ type: 'Feature',
properties: { githubUsername: 'timaschew' },
geometry: { type: 'Point', coordinates: [ 13.395648, 52.506902 ] } }
{ type: 'Feature',
properties: { githubUsername: 'vjpr' },
geometry: { type: 'Point', coordinates: [ 151.059617, -33.838197 ] } }
{ type: 'Feature',
properties: { githubUsername: 'yrassoulli' },
geometry: { type: 'Point', coordinates: [ -79.456961, 43.677633 ] } }
I then expect the .collect()
to merge these JSON objects into an array, which I can then wrap with the map here: https://gist.github.com/balupton/9187991c53d801f654e3#file-index-js-L123
However nothing after the .collect()
is reached.
make highland a subclass of a Transform stream, using the readable stream module. When called with generators they could be called as the _transform and _flush functions, array could be a shortcut for
xs.forEach(function(item){
this.write(item);
},this);
this.end();
with Stream.prototype._transform defaulting to
Stream.prototype._transform = function(chunk, _, next) {
this.push(chunk);
next();
};
This will allow much of the queuing and back pressure logic to be removed, but some of the redirection logic will still need to get implemented.
I'm thinking this is mainly a need to document the end
event somewhere here, but:
Here's a very simple async stream that generates [0,1,2,3,4,5,6,7,8,9,10]:
var val = 0;
function generator(push, next) {
process.nextTick(function() {
push(null, val++);
if(val > 10) {
push(null, highland.nil);
}
next();
}
}
var stream = highland(generator);
Now I want to write a consumer which counts the number of elements in this stream:
var count = 0;
stream
.stopOnError(function(err) {console.log("Bleugh", err.stack);})
.each(function(val) {count++;});
Ok... Now, how do I know when it's safe to read the count? How do I know when the stream is done? This seems to work, so I'm guessing this is how I'm supposed to do it:
var count = 0;
stream
.on("end", function() {console.log("Count", count);})
.stopOnError(function(err) {console.log("Bleugh", err.stack);})
.each(function(val) {count++;});
But since this is undocumented, it seems like something I shouldn't rely on? Is there some method that I missed somewhere? It would be slick if there was a function for this, like say then()
, and if each()
returned this
, then you could:
var count = 0;
stream
.each(function(val) {count++;})
.then(function() {console.log("Count", count);})
.stopOnError(function(err) {console.log("Bleugh", err.stack);});
Which reads nicely... Although would then()
get called even if we stopped on an error?
Or am I using your library completely wrong? :P
Returns the first item which passes a test function, then ends the Stream.
var docs = _([
{type: 'blogpost', title: 'foo'},
{type: 'comment', text: 'wibble'},
{type: 'blogpost', title: 'bar'},
{type: 'comment': text: 'wobble'}
]);
var first_comment = docs.find(function (doc) {
return doc.type === 'comment';
});
first_comment.toArray(function (xs) {
// xs is now [{type: 'comment', text: 'wibble'}]
});
Both async
and underscore
currently suffer from being a "kitchensink" of anything possibly uesful, these modules didn't start out that way but more and more stuff was added to it.
Projects like lodash
or underbar
have mitigated against this effect by splitting the module out across multiple modules / packages. @jdalton & @Matt-Esch have done good work to allow usage of the functionality piece by piece.
This allows consumers of the module to use a tighter subset of the functionality.
@Gozala also had a similar approach with splitting out reducers
and reducible
and then a set of optional functions all over npm like buffer-reduce
, tree-reduce
, dom-reduce
and others.
There probably is still value in having a module that combines everything together in a "kitchensink" for people that prefer that just like lodash
allows you to get it all or get the functions one by one.
Like reduce1 is to reduce, only with scan :)
I was thinking that since highland can wrap event emitters, highland could be used to make a powerful web server abstraction where middleware would be stream handlers. What do you think?
Return a stream with all values sorted.
Should accept a comparator function and do a lexicographic sort by default.
when using _.pipeline(_.group(...))
, the group function is not called
a quick fix is to use _.pipeline(_.pipeline(), _.group(...))
If no function is passed to toArray() should it just default to
function (x) { return x; }
Or add the above as a utility method _.id()
convert javascript objects into lines of text, like https://github.com/dominictarr/event-stream#stringify.
Returns a new stream with all falsy values removed... essentially: filter(function (x) { return x; })
Hello,
I'm having issues using Promise with highland, then trying to use JS Errors: it seems that the errors are always capture by a Promise. Please have a look at the following gist.
In the first example, I create a Stream from a resolved Promise, turn it into an Array (which works as expected) then throw an exception. For some reason, the error bubbles through the Promise (which, depending on the library use, will emit a warning or silence the problem).
In the second example, I use a generator function that generates 10 elements, the first one being a resolved Promise. Then again, for some reason the exception is captured by the promise.
I really can't wrap my head around this issue, I'm probably missing something here.
Thanks (and thanks for a great library).
Well, just thought about using the stream approach in databases results as most of the cases you might need to use the map function to transform the results.
There are many database wrappers that have added the stream support, I have not tested their stream implementation just an idea:
MongoDB https://github.com/mafintosh/mongojs#streaming-cursors
PostgreSQL https://github.com/brianc/node-pg-query-stream
I tried the mongojs
stream approach, but the results is empty somehow:
var _ = require('highland'),
mongojs = require('mongojs'),
db = mongojs('test', ['my']);
// db.my.insert({
// 'version': process.version,
// 'dt': new Date().getTime()
// });
// For testing get the results in
// db.my.find().toArray(function (err, docs) {
// console.log(err, docs);
// });
var find = _(db.my.find());
console.log(find);
_(find)
.map(function (res) {
console.log(res);
})
.errors(function (err, errors) {
console.log(err, errors);
});
The 'async' chapter has the following example:
var getData = _.wrapCallback(fs.readFile);
getData('myfile').map(toUpperCase).map(function (x) {
return {name: x};
});
It seems like:
It's missing var toUpperCase = function(string){return String.prototype.toUpperCase.call(string)}
Once a valid toUpperCase
is added, the function never returns anything, eg:
getData('myfile').map(toUpperCase).map(function (x) {
console.log({name: x})
});
Never logs anything. 'myfile' exists and has contents.
x
is either - it it the file's data? Or a chunk? Unless there's a reason to minify the examples data
or chunk
etc might be a better name.Sorry if this sounds like a grumble, just trying to get Highland working so I can check it out.
parse JSON chunks, like here: https://github.com/dominictarr/event-stream#parse
Using highland for the first time today, it is really awesome. However, I did struggle with the API and spent a lot of time trying to figure out which thing to use, and where to use it, and how to contort them into what I want.
I've taken some time into what my expected usage of highland was: https://gist.github.com/balupton/9187991c53d801f654e3#file-desired-js
And what the actual result is so far:
https://gist.github.com/balupton/9187991c53d801f654e3#file-highland-js
It appears that consume can't support asynchronous tasks. Here's some code that just hangs and does nothing:
var _ = require('highland');
_([1,2,3,4]).consume(function(error, x, push, next){
if(x === _.nil){
push(null, _.nil);
} else {
setTimeout(function(){
push(null, x*10);
next();
}, 20);
}
})
.each(function(x){
console.log(x);
});
It works as expected if we remove the setTimeout
:
var _ = require('highland');
_([1,2,3,4]).consume(function(error, x, push, next){
if(x === _.nil){
push(null, _.nil);
} else {
push(null, x*10);
next();
}
})
.each(function(x){
console.log(x);
});
Is this a bug or is there a different way for doing asynchronous tasks like this?
I have found that the examples on highlandjs.org quickly lose me. On sites like jquery.com, lodash.com (or underscorejs.org) and the like I like to pop open a console and play around with the library to figure things out. I can't do this on highlandjs.org because it's not included in the page when I'm looking at the docs.
It'd be really handy to have it included there for playing around and learning.
Like take(n), only instead of returning the first n
values, it ignores the first n
values and emits the rest.
There's a lot of ambiguity in the examples that slowed me down in understanding what exactly was going on. I think if the examples were fully functional, it would help a lot in quickly learning what this library is capable of.
For example, readFile here is left to the users' imagination. A concrete, working example that I can use out of the box would be more helpful:
var data = _(filenames).map(readFile).parallel(4);
Maybe add some non-trivial examples to the docs, as to many folks it will not be immediatley aparent what a powerful concept streams are.
Something like fetching data from one or more sources (xhr, json file?) then forking, parsing, combining, maybe generating some html elements.
I add this issue partially as a note, hopefully I will get some time to contribute some examples.
Related to #28
The highland equivalent of event-stream#split-matcher
As this is such a common operation maybe it is worthwhile to add into the core, with full lazyness, uft8 mutli-byte support, CRLF's etc.
Note some implementations allow to pass a optional RegExp to use as splitter.
There is also the reverse event-stream#join-separator
Would be helpful for new contributors to know project requirements including details such as:
I often need to use node-csv and JSONStream for processing large files. What is the best way to use these node-style streams with highland so that back-pressure is managed properly?
highlandStream.pipe(csvStream)
returns the destination stream which isn't a highland stream so I can't continue chaining.
I was finally able to get it to work, but it wasn't easy:
// Use _(source) to create a generator
function getFeatures(filename){
var _push, _next;
// Setup JSONStream that generates many events
var featureStream = fs.createReadStream(path.join(sourceDir, filename))
.pipe(jsonStream.parse(['features',true]))
.on('data', function(feature){
// Pause the stream until the generator is called again
// to manage back-pressure properly
featureStream.pause();
_push(null, feature);
_next();
})
.on('end', function(){
_push(null, _.nil);
_next();
});
return _(function(push, next){
_push = push;
_next = next;
// Resume the stream to get the next data event from the json stream
featureStream.resume();
});
};
Besides being a little difficult to setup, it can only be used at the beginning of a stream. If I want to process multiple files this way then I have to concoct another hairy beast that enables each event to spawn a new stream and only thunk when that new stream is done.
_(filenames).consume(function(error, filename, outerPush, outerNext){
if(filename === _.nil){
outerPush(null, _.nil);
outerNext();
} else {
getFeatures(filename)
.consume(function(error, feature, innerPush, innerNext){
if(feature === _.nil){
innerPush(null, _.nil);
innerNext();
// Push the filename out so that we can thunk
// and get the data moving
outerPush(null, filename);
// Let the outer stream know we're done
outerNext();
} else {
innerPush(null, feature);
innerNext();
}
})
.each(function(feature){
// Need to call this to thunk and get data moving
});
}
})
.each(function(filename){
// Need to call this to thunk and get data moving
});
Is there a better way to handle these situations with the current highland api?
Either way, highland is still making our life a lot easier. Thanks for creating it.
Same as drop(1) - see #11
var mystream = _.pipeline(stream1, stream2, stream3);
Where mystream
will write to stream1
and emit values from stream3
. This should work with both Highland and standard Node streams. It should also support partially applied functions:
var mystream = _.pipeline(stream1, _.map(doStuff), stream2, _.filter(etc));
Which actually means that any function which takes a Stream and returns a Stream should work:
var stringifier = _.pipeline(function (s) {
return s.invoke('toString', []);
});
data.pipe(stringifier).pipe(output);
Of course, in the case of Highland's invoke
partial application would achieve the same thing.
This is mostly an idea to explore:
I use Promises a lot with a (synchronous) lazy evaluation library (bluebird + lazyjs) and the result is amazing. But I'd like this to be more streamy and work asynchronously:
So wouldn't it be nice to combine Highland with Promises? Could Highland return a Promise-like Thenable
?
I see #54 suggest this in-passing but I think it merits it's own ticket.
I'm not quite sure what would be the then-value. Probably a thunk? Or maybe a synchronous stream? Maybe it depends on a call your do before then()
?
The handler would be .then(onComplete, onError)
and .catch(onError)
to match like the ES6 Promise spec.
Given a property name and a stream of objects, return a new stream of those property values.
var docs = _([
{type: 'blogpost', title: 'foo'},
{type: 'blogpost', title: 'bar'},
{type: 'asdf', title: 'baz'}
]);
docs.pluck('title').toArray(function (xs) {
// xs is now ['foo', 'bar', 'baz']
});
I was struggling with the current API with my use case:
My stream is made up of rows from a spreadsheet. The first row has the column headers which I want to skip. It seems so simple but I could not figure out how to do it.
Slice is a general way to get the functionality of head, take and last while making the api smaller, more powerful and even intuitive for javascript developers.
slice(1) would solve my use case and stream all rows after the first
slice(0,1) == head()
slice(-1) == last()
slice(0,n) == take(n)
would allow further:
slice(-n) take n elements from end
slice(m,n) take m-n elements
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.