GithubHelp home page GithubHelp logo

scramjetorg / framework-js Goto Github PK

View Code? Open in Web Editor NEW
36.0 11.0 0.0 5.72 MB

Simple yet powerful live data computation framework.

Home Page: https://www.scramjet.org

License: MIT License

JavaScript 16.93% TypeScript 82.21% Shell 0.07% Gherkin 0.79%
react javascript typescript event-stream stream es6 promise data-stream reactive-programming spark-streaming stream2 nodejs transformations

framework-js's Introduction

Scramjet Framework TypeScript

npm version Tests Known Vulnerabilities License Version GitHub stars Donate

โญ Star us on GitHub โ€” it motivates us a lot! ๐Ÿš€

Scramjet Framework

Scramjet is a simple reactive stream programming framework. The code is written by chaining functions that transform the streamed data, including well known map, filter and reduce.

The main advantage of Scramjet is running asynchronous operations on your data streams concurrently. It allows you to perform the transformations both synchronously and asynchronously by using the same API - so now you can "map" your stream from whatever source and call any number of API's consecutively.

This is a pre-release of the next major version (v5) of JavaScript Scramjet Framework.

We are open to your feedback! We encourage you to report issues with any ideas, suggestions and features you would like to see in this version. You can also upvote (+1) existing ones to show us the direction we should take in developing Scramjet Framework.

Not interested in JavaScript/TypeScript version? Check out Scramjet Framework in Python!

Table of contents

Installation

Simply run:

npm i @scramjet/framework

And then you can require it in the JS/TS code like:

sample-file.ts

import { DataStream } from "@scramjet/framework";

You can also use nightly build as npm dependency by referring to nightly branch (which is the latest build) from this repository:

package.json

{
    "dependencies": {
        "scramjet": "scramjetorg/framework-js#nightly"
    }
}

After adding Scramjet Framework as dependency it needs to be installed via npm (or similar):

npm i

You can also build Scramjet Framework yourself. Please refer to Development Setup section for more details.

Usage

Scramjet streams are similar and behave similar to native nodejs streams and to streams in any programing language in general. They allow operating on streams of data (were each separate data part is called a chunk) and process it in any way through transforms like mapping or filtering.

Let's take a look on how to create and operate on Scramjet streams.

If you would like to dive deeper, please refer to streams source files.

Creating Scramjet streams

The basic method for creating Scramjet streams is from() static method. It accepts iterables (both sync and async) and native nodejs streams. As for iterables it can be a simple array, generator or anything iterable:

import { DataStream } from "scramjet";

const stream = DataStream.from(["foo", "bar", "baz"]);

Scramjet streams are asynchronous iterables itself, which means one stream can be created from another:

import { DataStream } from "scramjet";

const stream1 = DataStream.from(["foo", "bar", "baz"]);
const stream2 = DataStream.from(stream1);

They can be also created from native nodejs Readables:

import { createReadStream } from "fs";
import { DataStream } from "scramjet";

const stream = DataStream.from(createReadStream("path/to/file"));

The more "manual" approach is creating streams using constructor:

import { DataStream } from "scramjet";

const stream = new DataStream();

Such approach is useful when one needs to manually write data to a stream or use it as a pipe destination:

import { DataStream } from "scramjet";

const stream = new DataStream();
stream.write("foo");

const stream2 = new DataStream();
stream.pipe(stream2);

Getting data from Scramjet streams

Similar as to creating Scramjet streams, there are specific methods which allow getting data out of them. Those are sometimes called sink methods as they allow data to flow through and out of the stream. As those methods needs to wait for the stream end, they return a Promise which needs to be awaited and is resolved when all data from source is processed.

import { DataStream } from "scramjet";

const stream1 = DataStream.from(["foo", "bar", "baz"]);
await stream1.toArray(); // ["foo", "bar", "baz"]

const stream2 = DataStream.from(["foo", "bar", "baz"]);
await stream2.toFile("path/to/file"); // Writes to a file, resolves when done.

const stream3 = DataStream.from(["foo", "bar", "baz"]);
await stream3.reduce(
    (prev, curr) => `${ prev }-${ curr }`,
    ""
); // "foo-bar-baz"

As Scramjet streams are asynchronous iterables they can be iterated too:

import { DataStream } from "scramjet";

const stream = DataStream.from(["foo", "bar", "baz"]);

for await (const chunk of stream) {
    console.log(chunk);
}
// Logs:
// "foo"
// "bar"
// "baz"

Similar to writing, there is also more "manual" way of reading from streams using .read() method:

import { DataStream } from "scramjet";

const stream = DataStream.from(["foo", "bar", "baz"]);

await stream.read(); // "foo"
await stream.read(); // "bar"

Read returns a Promise which waits until there is something ready to be read from a stream.

Basic operations

The whole idea of stream processing is an ability to quickly and efficiently transform data which flows through the stream. Let's take a look at basic operations (called transforms) and what they do:

Mapping

Mapping stream data is basically the same as mapping an array. It allows to map a chunk to a new value:

import { DataStream } from "scramjet";

DataStream
    .from(["foo", "bar", "baz"])
    .map(chunk => chunk.repeat(2))
    .toArray(); // ["foofoo", "barbar", "bazbaz"]

The result of the map transform could be of different type than initial chunks:

import { DataStream } from "scramjet";

DataStream
    .from(["foo", "bar", "baz"])
    .map(chunk => chunk.charCodeAt(0))
    .toArray(); // [102, 98, 98]

DataStream
    .from(["foo", "bar", "baz"])
    .map(chunk => chunk.split(""))
    .toArray(); // [["f", "o", "o"], ["b", "a", "r"], ["b", "a", "z"]]

Filtering

Filtering allows to filter out any unnecessary chunks:

import { DataStream } from "scramjet";

DataStream
    .from([1, 2, 3, 4, 5, 6])
    .filter(chunk => chunk % 2 === 0)
    .toArray(); // [2, 4, 6]

Grouping

Batching allows to group chunks into arrays, effectively changing chunks number flowing though the stream:

import { DataStream } from "scramjet";

DataStream
    .from([1, 2, 3, 4, 5, 6, 7, 8])
    .batch(chunk => chunk % 2 === 0)
    .toArray(); // [[1, 2], [3, 4], [5, 6], [7, 8]]

Whenever callback function passed to .batch() call returns true, new group is emitted.

Flattening

Operation opposite to batching is flattening. At the moment, Scramjet streams provides .flatMap() method which allows first to map chunks and then flatten the resulting arrays:

import { DataStream } from "scramjet";

DataStream
    .from(["foo", "bar", "baz"])
    .flatMap(chunk => chunk.split(""))
    .toArray(); // ["f", "o", "o", "b", "a", "r", "b", "a", "z"]

But it can be also used to only flatten the stream by providing a callback which only passes values through:

import { DataStream } from "scramjet";

DataStream
    .from([1, 2, 3, 4, 5, 6, 7, 8])
    .batch(chunk => chunk % 2 === 0)
    .flatMap(chunk => chunk)
    .toArray(); // [1, 2, 3, 4, 5, 6, 7, 8]

Piping

Piping is essential for operating on streams. Scramjet streams can be both used as pipe source and destination. They can be also combined with native nodejs streams having native streams as pipe source or destination.

import { DataStream } from "scramjet";

const stream1 = DataStream.from([1, 2, 3, 4, 5, 6, 7, 8]);
const stream2 = new DataStream();

stream1.pipe(stream2); // All data flowing through "stream1" will be passed to "stream2".
import { createReadStream } from "fs";
import { DataStream } from "scramjet";

const readStream = createReadStream("path/to/file"));
const scramjetStream = new DataStream();

readStream.pipe(scramjetStream); // All file contents read by native nodejs stream will be passed to "scramjetStream".
import { createWriteStream } from "fs";
import { DataStream } from "scramjet";

const scramjetStream = DataStream.from([1, 2, 3, 4, 5, 6, 7, 8]);

scramjetStream.pipe(createWriteStream("path/to/file")); // All data flowing through "scramjetStream" will be written to a file via native nodejs stream.

Requesting Features

Anything missing? Or maybe there is something which would make using Scramjet Framework much easier or efficient? Don't hesitate to fill up a new feature request! We really appreciate all feedback.

Reporting Bugs

If you have found a bug, inconsistent or confusing behavior please fill up a new bug report.

Contributing

You can contribute to this project by giving us feedback (reporting bugs and requesting features) and also by writing code yourself! We have some introductory issues labeled with good first issue which should be a perfect starter.

The easiest way is to create a fork of this repository and then create a pull request with all your changes. In most cases, you should branch from and target main branch.

Please refer to Development Setup section on how to setup this project.

Development Setup

Project setup

  1. Install nodejs (14.x).

Refer to official docs. Alternatively you may use Node version manager like nvm.

  1. Clone this repository:
git clone [email protected]:scramjetorg/framework-js.git
  1. Install project dependencies:
npm i

Commands

There are multiple npm commands available which helps run tests, build the project and help during development.

Running tests

npm run test

Runs all tests from test directory. It runs build internally so it doesn't have to be run manually.

npm run test:unit[:w]

Runs all unit tests (test/unit directory). It runs build internally so it doesn't have to be run manually. When run with :w it will watch for changes, rebuild and rerun test automatically. To run unit tests without rebuilding the project use npm run test:run:unit.

npm run test:unit:d -- build/test/.../test.js [--host ...] [--port ...]

Runs specified test file in a debug mode. It runs build internally so it doesn't have to be run manually. This is the same as running npm run build && npx ava debug --break build/test/.../test.js [--host ...] [--port ...]. Then it can be inspected e.g. via Chrome inspector by going to chrome://inspect.

npm run test:bdd

Runs all BDD tests (test/bdd directory). It runs build internally so it doesn't have to be run manually. To run BDD tests without rebuilding the project use npm run test:run:bdd.

Running single test file or specific tests

Single test file can be run by passing its path to test command:

npm run test:unit -- build/test/ifca/common.spec.js

While specific test cases can be run using -m (match) option:

npm run test:unit -- -m "*default*"

Both can be mixed to run specific tests from a given file or folder:

npm run test:unit -- build/test/ifca/common.spec.js -m "*default*"

Building the project

npm run build[:w]

Transpiles .ts sources and tests (src and test directories) and outputs JS files to build directory. When run with :w it will watch for changes and rebuild automatically.

npm run dist

Builds dist files - similar to build but skips test directory and additionally generates source maps.

Miscellaneous

npm run lint

Lints src and test directories. Used as a pre-commit hook.

npm run lint:f

Fixes lint warnings/errors in src and test files.

npm run coverage

Checks code coverage, generates HTML report and serves it on 8080 port.

npm run coverage:check

Checks code coverage. Will fail if it is below a threshold defined in package.json. Useful as a CI job.

npm run coverage:generate

framework-js's People

Contributors

daro1337 avatar f1ames avatar iaforek avatar michalcz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

framework-js's Issues

Introduce MultiStream/MuxStream

Feature description

Introduce MultiStream/MuxStream similar to MultiStrean from v4.

The idea behind a MultiStream is being able to mux and demux streams when needed. MultiStream is an object consisting of multiple streams than can be refined or muxed.

Use case

Merging streams from various sources so all data can be processed by the same pipeline. Distributing data into multiple streams.


If you are interested in this issue or this stream type fits your use case, please upvote with ๐Ÿ‘.

Introduce DataStream.flatten() method

Feature description

Introduce DataStream.flatten() method similar to v4:

A shorthand for streams of arrays or iterables to flatten them.

More efficient equivalent of: .flatmap(i => i); Works on streams of async iterables too.


If you are interested in this issue or this type of transforms fits your use case, please upvote with ๐Ÿ‘.

Introduce DataStream.assign() method

Feature description

Introduce DataStream.assign() method similar to v4:

Transforms stream objects by assigning the properties from the returned data along with data from original ones.

The original objects are unaltered.


If you are interested in this issue or this type of transforms fits your use case, please upvote with ๐Ÿ‘.

Introduce DataStream.empty() method

Feature description

Introduce DataStream.empty() method similar to v4:

Called only before the stream ends without passing any items.


If you are interested in this issue or this type of transforms fits your use case, please upvote with ๐Ÿ‘.

Introduce WindowStream

Feature description

Introduce WindowStream similar to WindowStream from v4:

In essence it's a stream of Array's containing a list of items - a window.

It would be a stream for moving window calculation with some simple methods.

Use case

Any data calculations which requires moving window (e.g. calculating moving averages).


If you are interested in this issue or this stream type fits your use case, please upvote with ๐Ÿ‘.

Introduce FileStream

Feature description

Introduce FileStream similar to FSStream proposal from v4.

FileStream items would point to filesystem items like files, directories, etc and should provides methods like stat, contents.

The concept is similar to Vinyl objects so the stream should allow filesystem items manipulations (similar to how it is done in Gulp, Grunt, etc).

Use case

Manipulating files in batches, listing filesystem, files pipelines.


If you are interested in this issue or this stream type fits your use case, please upvote with ๐Ÿ‘.

Batching by amount should allow to adjust step size

Feature description

In v4 .batch() is a simple transform which groups chunks by a given number:

// input: [1,2,3,4,5,6,7,8,9,10]
// batch(amount)
.batch(2)  -> [1,2], [3,4], [5,6], ...

and in pre-v5 it accepts a callback but we plan to introduce batching by amount the same as v4 too.

Now, it could be extended by adding optional step param which will allow to change how chunks are grouped:

// input: [1,2,3,4,5,6,7,8,9,10]
// batch(amount, step = amount)
.batch(2)    -> [1,2], [3,4], [5,6], ...
.batch(2, 2) -> [1,2], [3,4], [5,6], ...
.batch(2, 1) -> [1,2], [2,3], [3,4], ...
.batch(3, 2) -> [1, 2, 3], [3, 4, 5], [5, 6, 7], ...

Use case

The use case is - creating a WindowStream out of any other stream with ability to decide how each window frame content is grouped. And in the context of the above proposal it could be:

anyStream.batch(3, 2).as(WindowStream)

Introduce DataStream.all() method

Feature description

Introduce DataStream.all() method similar to v4:

Processes a number of functions in parallel, returns a stream of arrays of results.

This method is to allow running multiple asynchronous operations and receive all the results at one, just like Promise.all behaves.


If you are interested in this issue or this type of transforms fits your use case, please upvote with ๐Ÿ‘.

Introduce BinaryStream/BufferStream

Feature description

Introduce BinaryStream similar to BufferStream from v4.

By default, each chunk flowing through the stream would be a single byte. The stream should allow manipulating binary data (grouping, splitting, parsing, shifting, etc). Also it should be possible to convert such stream into other types (most notably StringStream and DataStream).

Use case

Various file types transformations, parsing binary formats, binary data manipulation (images, videos, etc).


If you are interested in this issue or this stream type fits your use case, please upvote with ๐Ÿ‘.

Unstable tests

I have noticed recently that CI is being unstable and some tests are failing from time to time. It seems to be the case of pipe tests mostly:

  streams โ€บ data โ€บ native-interface โ€บ Piped DataStream can be unpiped via '.unpipe(instance)' #2

  build/test/unit/streams/data/native-interface.spec.js:109

   108:             await stream.end();                                
   109:             t.deepEqual(await stream.toArray(), ["fo", "o\n"]);
   110:             resolve();                                         

  Difference:

    [
      'fo',
  +   `oโŠ
  +   `,
    ]

  โ€บ ReadStream.<anonymous> (build/test/unit/streams/data/native-interface.spec.js:109:15)



  streams โ€บ data โ€บ native-interface โ€บ Piped DataStream can be unpiped via '.unpipe()' #2

  build/test/unit/streams/data/native-interface.spec.js:145

   144:             await stream.end();                                
   145:             t.deepEqual(await stream.toArray(), ["fo", "o\n"]);
   146:             resolve();                                         

  Difference:

    [
      'fo',
  +   `oโŠ
  +   `,
    ]

  โ€บ ReadStream.<anonymous> (build/test/unit/streams/data/native-interface.spec.js:145:15)

  โ”€

  2 tests failed
  5 tests skipped

This may mean there is some issue with how it works internally.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.