caolan / highland Goto Github PK

View Code? Open in Web Editor NEW

3.4K 3.4K 147.0 5.73 MB

High-level streams library for Node.js and the browser

Home Page: https://caolan.github.io/highland

License: Apache License 2.0

JavaScript 99.90% HTML 0.10%

highland's People

Contributors

Stargazers

Watchers

Forkers

joaquimserafim tschaub pspeter3 evocateur glenjamin calvinmetcalf theefer rjz lewiscowper adamyanalunas eshao dmaevac netconstructor justincy ggoodman cybernetics cortfritz andrewkeig greelgorke c089 dist balupton alexbeletsky vqvu daywiss matchdav web5design apaleslimghost jamesrom jeschkies alex-node dweinstein jeromew spro mpj jgrund lewisjellis svozza eiriklv kimdien tefla doron2402 aokisok maxgurewitz rafapple megawac brian-gates ccorcos cphoover sovrn nodeonly relaxedtomato whoaa512 kjvalencik wayneseymour joepie91 tkh44 markandrewj nathanrsmith pdehaan wonderdogone sethkinast srijs cuulee icio neppord kwangkim realrunner dayoadeyemi sanford2020 homerquan bhou cjpixelkil no-problemo o0x2a jia-xf syyzw juniornodedevs monte-hayward djanowski rhendric kharandziuk noscripter imclab flygsand jonboiser cbforks jatins elliz zenking lpww pavel nagyistge sherpic smhoekstra satnami protometa briancaine wires darky

highland's Issues

Parallel consumption of one stream

How can I have multiple consumers receiving data from one object stream in parallel? parallel only works on streams of streams.

Export Stream class so that we can extend it with our own thingies

Right now we are limited to only using what highland provides, it would be great if we could extend highland somehow. Either by highland exposing the stream class so we can add our own things to it, or by providing a utility method like _.add('uniq', function(){})

Add ability to destroy a highland stream and properly clean-up EventEmitters

Currently, highland will not really let you take apart the beautiful streaming creations you've created without running into memory leaks if you have used the pipe method.

I've prototyped a destroy method as something like:

Stream.prototype.destroy = function () {
  _(this._consumers).each(function (consumer) {
    this._removeConsumer(consumer);
  });

  if (this.source) {
    this.source._removeConsumer(this);
  }
}

The issue is that at https://github.com/caolan/highland/blob/master/lib/index.js#L667, an anonymous function is registered on the drain event of the target stream. Because the callback is anonymous, it cannot be removed via dest.removeListener('drain', this._onDrainCallback); (the _onDrainCallback doesn't exist but might be worth creating).

Trying to pipe stream to output?

Hi,
I am trying to pipe stream from an array to stdout

var _ = require('highland');
_([1, 2, 3, 4]).pipe(process.stdout);

I tried different node versions, but always get errors

Running node v0.8.26
net.js:497
    throw new TypeError('First argument must be a buffer or a string.');

Running node v0.10.25
net.js:612
    throw new TypeError('invalid data');

Should I be able to pipe the stream into process.stdout?

Redirected streams do not send nil

See this example:

var s1 = highland([1]);
var s2 = highland([2]);

var out = s1.concat(s2);

out.toArray(function(arr) { console.log(arr); }); // [ 1, 2 ]
console.log(s1.ended);  // true
console.log(s2.ended);  // true
console.log(out.ended); // undefined - expected true?

I was expecting that out.ended would also be true, since both its sources have ended?

looking for async.eachLimit equivalent?

What's the proper way to protect a limited resource (like number of open fd/sockets) in a map.

_(key_stream, 'string').map(function (key) {
   // operation that performs an HTTP request and returns a Promise
}).each(function (res) {
 res.then(console.log);
});

The use case here is for each key in an S3 bucket I'd like to perform an operation. I'm treating the key list as a stream of values and then I perform some request, like perhaps retrieving their headers as stored in the S3 bucket.

Is parallel the appropriate way to prevent the following error?

Possibly unhandled Error: connect EMFILE
    at errnoException (net.js:901:11)
    at connect (net.js:764:19)
    at net.js:842:9
    at asyncCallback (dns.js:68:16)
    at Object.onanswer [as oncomplete] (dns.js:121:9)

I think with async I would use async.eachLimit since the stuff inside the map returns right away and allows for the next socket to connect before the request is finished.

Concatenating streams forked from the same source doesn't resume

I'm working on a highland stream based system for processing asset transformation pipelines. The resulting structure can result in a graph, where a source stream may be forked, processed differently in parallel, and merged back together using concat.

Here is a reduced test case:

var s = _([1,2]);

var o1 = s.fork().map(function(x){ return x * 2; });
var o2 = s.fork().map(function(x){ return x * 3; });

var out = o1.concat(o2);

out.each(function(x) {
  console.log(x);
});

Which I would have expected should be (more or less) equivalent to:

var o1 = _([1,2]).map(function(x){ return x * 2; });
var o2 = _([1,2]).map(function(x){ return x * 3; });

var out = o1.concat(o2);

out.each(function(x) {
  console.log(x);
});

This second example outputs "2 4 3 6" as expected, but the first one doesn't output anything.

Is this a bug, or am I missing something?

_.pipeline does not accept a function like in the docs

The docs specify that you can use pipeline as follows:

var through2 = _.pipeline(function (s) {
    return s.map(parseJSON).filter(isBlogpost); // etc.
});

this does not seem to be implemented

Improve docs on backpressure when forking and re-joining streams

This issue was spawned by #40

Problem with first arrays example

Following along @ http://highlandjs.org/#arrays, I try the first example.

var _ = require("highland")
var shouty = _(['foo', 'bar', 'baz']).map(toUpperCase);

I get the following error on node 0.10.24:

ReferenceError: toUpperCase is not defined
    at Object.<anonymous> (/home/jake/code/highland/arrays.js:2:43)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:902:3

Add findWhere()

Same as find() - see #14 - but using the where() syntax to match against properties instead of using a test function.

Usage with EventEmitter: pass all arguments to the Stream

Hey, Highland looks nice, i'm trying it out currently. What bothers me now is the fact that a stream, that wraps an EventEmitter instance, passes only the first argument with it. It prevents me from doing stuf like

var server = http.createServer()

hls = hl('request', server)

hls.each(function(req, res){
  console.log('consuming ')
  res.end('Holla') // this fails, since res is undefined
})

is it posible to wrap the params in some way and pass them along?

With ExpressJS, connection never closes

I'm almost certain I'm doing something wrong here, but with the following in ExpressJS, no result is ever sent to the client and the connection never closes:

app.get( '/whatever', function ( req, res ) {
  highlandStream.map( function (v) {
    console.log( "Value:", v );
    return v;
  }).pipe(res);
});

The value will spit out to the console, but nothing happens after that. It would seem the response stream never gets any data (or at least never does anything with it).

Note: The reason I am filing this here is that Express's response is supposed to be a writeable stream and there are several examples out in the ether that pipe values from other streams to it. I just can't get it to work with a Highland stream.

Incompatibility with node < 0.10

It seems that highland uses the setImmediate node function that was only introduced in node >= 0.10.

It's not a problem per se, but it would be nice to say it somewhere (afaIk, I couldn't find any trace of this information).

Add head() / first()

Same as Stream.take(1)

Add uniq()

Returns a new stream with unique values.

No highland.js tag on stack overflow.

This is probably not really a good thing to put in a GH issue, but I figure it will get some attention this way. I recently submitted a question on SO, but I was not able to make a highland.js tag because I don't have enough SO karma points. http://stackoverflow.com/questions/22034943/how-to-use-errors-in-highland-js-map

Just letting you know, feel free to close this issue.

Parallel without buffering

I can imagine that in some cases with combinations of large and slow streams, this can lead to very large memory consumption. It would be useful to have an option to forgo the buffering and just receive the streams in whatever order they arrive, especially with object streams.

Problem with `_.map`?

I'm getting strange behavior when I do e.g. _(stream).pipe(_.map(function () { /* ... */ }).pipe(response). But then, I realized this is not documented anywhere, so maybe I just made it up. Is this supposed to work?

Handling source stream errors

If I wrap a stream that emits an error, the error is uncaught.

This is the stream I'm wrapping: https://github.com/brian-gates/cypher-stream/blob/master/index.js#L36

My syntax is this: _(source).errors(fn). I would expect the errors to be caught, but they are not.

Any help is appreciated.

Not working after a `collect()`

So I've got this gist, which works fine, all up until: https://gist.github.com/balupton/9187991c53d801f654e3#file-index-js-L120

The .map(log) before that writes a series of JSON objects to the console as expected:

{ type: 'Feature',
  properties: { githubUsername: 'timaschew' },
  geometry: { type: 'Point', coordinates: [ 13.395648, 52.506902 ] } }
{ type: 'Feature',
  properties: { githubUsername: 'vjpr' },
  geometry: { type: 'Point', coordinates: [ 151.059617, -33.838197 ] } }
{ type: 'Feature',
  properties: { githubUsername: 'yrassoulli' },
  geometry: { type: 'Point', coordinates: [ -79.456961, 43.677633 ] } }

I then expect the .collect() to merge these JSON objects into an array, which I can then wrap with the map here: https://gist.github.com/balupton/9187991c53d801f654e3#file-index-js-L123

However nothing after the .collect() is reached.

use streams2 api

make highland a subclass of a Transform stream, using the readable stream module. When called with generators they could be called as the _transform and _flush functions, array could be a shortcut for

xs.forEach(function(item){
    this.write(item);
},this);
this.end();

with Stream.prototype._transform defaulting to

Stream.prototype._transform = function(chunk, _, next) {
    this.push(chunk);
    next();
};

This will allow much of the queuing and back pressure logic to be removed, but some of the redirection logic will still need to get implemented.

Working with async streams

I'm thinking this is mainly a need to document the end event somewhere here, but:

Here's a very simple async stream that generates [0,1,2,3,4,5,6,7,8,9,10]:

var val = 0;
function generator(push, next) {
    process.nextTick(function() {
        push(null, val++);
        if(val > 10) {
            push(null, highland.nil);
        }
        next();
    }
}
var stream = highland(generator);

Now I want to write a consumer which counts the number of elements in this stream:

var count = 0;
stream
    .stopOnError(function(err) {console.log("Bleugh", err.stack);})
    .each(function(val) {count++;});

Ok... Now, how do I know when it's safe to read the count? How do I know when the stream is done? This seems to work, so I'm guessing this is how I'm supposed to do it:

var count = 0;
stream
    .on("end", function() {console.log("Count", count);})
    .stopOnError(function(err) {console.log("Bleugh", err.stack);})
    .each(function(val) {count++;});

But since this is undocumented, it seems like something I shouldn't rely on? Is there some method that I missed somewhere? It would be slick if there was a function for this, like say then(), and if each() returned this, then you could:

var count = 0;
stream
    .each(function(val) {count++;})
    .then(function() {console.log("Count", count);})
    .stopOnError(function(err) {console.log("Bleugh", err.stack);});

Which reads nicely... Although would then() get called even if we stopped on an error?

Or am I using your library completely wrong? :P

Add find()

Returns the first item which passes a test function, then ends the Stream.

var docs = _([
    {type: 'blogpost', title: 'foo'},
    {type: 'comment', text: 'wibble'},
    {type: 'blogpost', title: 'bar'},
    {type: 'comment': text: 'wobble'}
]);

var first_comment = docs.find(function (doc) {
    return doc.type === 'comment';
});

first_comment.toArray(function (xs) {
    // xs is now [{type: 'comment', text: 'wibble'}]
});

Consider splitting the package out across multiple packages / modules

Both async and underscore currently suffer from being a "kitchensink" of anything possibly uesful, these modules didn't start out that way but more and more stuff was added to it.

Projects like lodash or underbar have mitigated against this effect by splitting the module out across multiple modules / packages. @jdalton & @Matt-Esch have done good work to allow usage of the functionality piece by piece.

This allows consumers of the module to use a tighter subset of the functionality.

@Gozala also had a similar approach with splitting out reducers and reducible and then a set of optional functions all over npm like buffer-reduce , tree-reduce , dom-reduce and others.

There probably is still value in having a module that combines everything together in a "kitchensink" for people that prefer that just like lodash allows you to get it all or get the functions one by one.

Add scan1

Like reduce1 is to reduce, only with scan :)

Does a web framework based on highland make sense?

I was thinking that since highland can wrap event emitters, highland could be used to make a powerful web server abstraction where middleware would be stream handlers. What do you think?

Add sort()

Return a stream with all values sorted.

Should accept a comparator function and do a lexicographic sort by default.

_.pipeline's first argument is ignored

when using _.pipeline(_.group(...)), the group function is not called

a quick fix is to use _.pipeline(_.pipeline(), _.group(...))

Should there be a default behaviour to the toArray() thunk?

If no function is passed to toArray() should it just default to

function (x) { return x; }

Or add the above as a utility method _.id()

Add stringify()

convert javascript objects into lines of text, like https://github.com/dominictarr/event-stream#stringify.

Add compact()

Returns a new stream with all falsy values removed... essentially: filter(function (x) { return x; })

Promise and Error handling

Hello,

I'm having issues using Promise with highland, then trying to use JS Errors: it seems that the errors are always capture by a Promise. Please have a look at the following gist.

In the first example, I create a Stream from a resolved Promise, turn it into an Array (which works as expected) then throw an exception. For some reason, the error bubbles through the Promise (which, depending on the library use, will emit a warning or silence the problem).

In the second example, I use a generator function that generates 10 elements, the first one being a resolved Promise. Then again, for some reason the exception is captured by the promise.

I really can't wrap my head around this issue, I'm probably missing something here.

Thanks (and thanks for a great library).

Database streams

Well, just thought about using the stream approach in databases results as most of the cases you might need to use the map function to transform the results.

There are many database wrappers that have added the stream support, I have not tested their stream implementation just an idea:

MongoDB https://github.com/mafintosh/mongojs#streaming-cursors
PostgreSQL https://github.com/brianc/node-pg-query-stream

I tried the mongojs stream approach, but the results is empty somehow:

var _       = require('highland'),
    mongojs = require('mongojs'),
    db      = mongojs('test', ['my']);

// db.my.insert({
//     'version': process.version,
//     'dt':      new Date().getTime()
// });

// For testing get the results in
// db.my.find().toArray(function (err, docs) {
//     console.log(err, docs);
// });

var find = _(db.my.find());
console.log(find);

_(find)
.map(function (res) {
    console.log(res);
})
.errors(function (err, errors) {
    console.log(err, errors);
});

async example missing code / failing

The 'async' chapter has the following example:

var getData = _.wrapCallback(fs.readFile);
getData('myfile').map(toUpperCase).map(function (x) {
  return {name: x};
});

It seems like:

It's missing var toUpperCase = function(string){return String.prototype.toUpperCase.call(string)}
Once a valid toUpperCase is added, the function never returns anything, eg:

getData('myfile').map(toUpperCase).map(function (x) {
console.log({name: x})
});

Never logs anything. 'myfile' exists and has contents.

I'm not sure what the example is trying to do - all that's mentioned is 'we can run the exact same code from the Array examples on it'. I don't know what an x is either - it it the file's data? Or a chunk? Unless there's a reason to minify the examples data or chunk etc might be a better name.

Sorry if this sounds like a grumble, just trying to get Highland working so I can check it out.

add parse()

parse JSON chunks, like here: https://github.com/dominictarr/event-stream#parse

API could be a bit more intuitive

Using highland for the first time today, it is really awesome. However, I did struggle with the API and spent a lot of time trying to figure out which thing to use, and where to use it, and how to contort them into what I want.

I've taken some time into what my expected usage of highland was: https://gist.github.com/balupton/9187991c53d801f654e3#file-desired-js

And what the actual result is so far:
https://gist.github.com/balupton/9187991c53d801f654e3#file-highland-js

asynchronous tasks inside of consume

It appears that consume can't support asynchronous tasks. Here's some code that just hangs and does nothing:

var _ = require('highland');

_([1,2,3,4]).consume(function(error, x, push, next){
  if(x === _.nil){
    push(null, _.nil);
  } else {
    setTimeout(function(){
      push(null, x*10);
      next();
    }, 20);
  }
})
.each(function(x){
  console.log(x);
});

It works as expected if we remove the setTimeout:

var _ = require('highland');

_([1,2,3,4]).consume(function(error, x, push, next){
  if(x === _.nil){
    push(null, _.nil);
  } else {
    push(null, x*10);
    next();
  }
})
.each(function(x){
  console.log(x);
});

Is this a bug or is there a different way for doing asynchronous tasks like this?

Include highland.js on highlandjs.org

I have found that the examples on highlandjs.org quickly lose me. On sites like jquery.com, lodash.com (or underscorejs.org) and the like I like to pop open a console and play around with the library to figure things out. I can't do this on highlandjs.org because it's not included in the page when I'm looking at the docs.

It'd be really handy to have it included there for playing around and learning.

Add drop(n)

Like take(n), only instead of returning the first n values, it ignores the first n values and emits the rest.

Use more concrete examples

There's a lot of ambiguity in the examples that slowed me down in understanding what exactly was going on. I think if the examples were fully functional, it would help a lot in quickly learning what this library is capable of.

For example, readFile here is left to the users' imagination. A concrete, working example that I can use out of the box would be more helpful:

var data = _(filenames).map(readFile).parallel(4);

Addition of non-trivial examples.

Maybe add some non-trivial examples to the docs, as to many folks it will not be immediatley aparent what a powerful concept streams are.

Something like fetching data from one or more sources (xhr, json file?) then forking, parsing, combining, maybe generating some html elements.

I add this issue partially as a note, hopefully I will get some time to contribute some examples.

Related to #28

Stream.split() method to split string streams?

The highland equivalent of event-stream#split-matcher

As this is such a common operation maybe it is worthwhile to add into the core, with full lazyness, uft8 mutli-byte support, CRLF's etc.

Note some implementations allow to pass a optional RegExp to use as splitter.

There is also the reverse event-stream#join-separator

ADD: Contributing.md guide

Would be helpful for new contributors to know project requirements including details such as:

do they need to lint their code
how to run browser tests, which browsers are we aiming for compatibility on?
update docs or dist, etc.

Using node streams with highland

I often need to use node-csv and JSONStream for processing large files. What is the best way to use these node-style streams with highland so that back-pressure is managed properly?

highlandStream.pipe(csvStream) returns the destination stream which isn't a highland stream so I can't continue chaining.

I was finally able to get it to work, but it wasn't easy:

// Use _(source) to create a generator
function getFeatures(filename){ 

  var _push, _next;

  // Setup JSONStream that generates many events
  var featureStream = fs.createReadStream(path.join(sourceDir, filename))
        .pipe(jsonStream.parse(['features',true]))
        .on('data', function(feature){
          // Pause the stream until the generator is called again
          // to manage back-pressure properly
          featureStream.pause();
          _push(null, feature);
          _next();
        })
        .on('end', function(){
          _push(null, _.nil);
          _next();
        });

  return _(function(push, next){    
    _push = push;
    _next = next;
    // Resume the stream to get the next data event from the json stream
    featureStream.resume();
  });
};

Besides being a little difficult to setup, it can only be used at the beginning of a stream. If I want to process multiple files this way then I have to concoct another hairy beast that enables each event to spawn a new stream and only thunk when that new stream is done.

_(filenames).consume(function(error, filename, outerPush, outerNext){
  if(filename === _.nil){
    outerPush(null, _.nil);
    outerNext();
  } else {
    getFeatures(filename)
      .consume(function(error, feature, innerPush, innerNext){
        if(feature === _.nil){
          innerPush(null, _.nil);
          innerNext();
          // Push the filename out so that we can thunk
          // and get the data moving
          outerPush(null, filename);
          // Let the outer stream know we're done
          outerNext();
        } else {
          innerPush(null, feature);
          innerNext();
        }
      })
      .each(function(feature){
        // Need to call this to thunk and get data moving
      });
  }
})
.each(function(filename){
  // Need to call this to thunk and get data moving
});

Is there a better way to handle these situations with the current highland api?

Either way, highland is still making our life a lot easier. Thanks for creating it.

Add tail() / rest()

Same as drop(1) - see #11

_.pipeline() - for creating a pipeline of streams

var mystream = _.pipeline(stream1, stream2, stream3);

Where mystream will write to stream1 and emit values from stream3. This should work with both Highland and standard Node streams. It should also support partially applied functions:

var mystream = _.pipeline(stream1, _.map(doStuff), stream2, _.filter(etc));

Which actually means that any function which takes a Stream and returns a Stream should work:

var stringifier = _.pipeline(function (s) {
    return s.invoke('toString', []);
});

data.pipe(stringifier).pipe(output);

Of course, in the case of Highland's invoke partial application would achieve the same thing.

Mesh with Promise flow?

This is mostly an idea to explore:

I use Promises a lot with a (synchronous) lazy evaluation library (bluebird + lazyjs) and the result is amazing. But I'd like this to be more streamy and work asynchronously:

So wouldn't it be nice to combine Highland with Promises? Could Highland return a Promise-like Thenable?

I see #54 suggest this in-passing but I think it merits it's own ticket.

I'm not quite sure what would be the then-value. Probably a thunk? Or maybe a synchronous stream? Maybe it depends on a call your do before then()?

The handler would be .then(onComplete, onError) and .catch(onError) to match like the ES6 Promise spec.

Add pluck()

Given a property name and a stream of objects, return a new stream of those property values.

var docs = _([
    {type: 'blogpost', title: 'foo'},
    {type: 'blogpost', title: 'bar'},
    {type: 'asdf', title: 'baz'}
]);

docs.pluck('title').toArray(function (xs) {
    // xs is now ['foo', 'bar', 'baz']
});

use slice to replace head take and last

I was struggling with the current API with my use case:
My stream is made up of rows from a spreadsheet. The first row has the column headers which I want to skip. It seems so simple but I could not figure out how to do it.

Slice is a general way to get the functionality of head, take and last while making the api smaller, more powerful and even intuitive for javascript developers.

slice(1) would solve my use case and stream all rows after the first

slice(0,1) == head()
slice(-1) == last()
slice(0,n) == take(n)

would allow further:
slice(-n) take n elements from end
slice(m,n) take m-n elements

caolan / highland Goto Github PK

highland's People

Contributors

Stargazers

Watchers

Forkers

highland's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs