Gulp is fond of Node streams |

If you’re not familiar enough with browserify options and features, check out my previous post explaining browserify in-depth.

I love streaming

Gulp is a stream-based project builder. Basically: a bunch of modules that take one input and output something, which is taken by another module etc. until the whole process is done.

First thing to do, create our gulpfile.js at the root of our project and require() what we need: gulp and browserify.
Once, there was a gulp-browserify plugin. It’s still downloadable but it is not maintained anymore so don’t use it. You have to use browserify itself, that’s it.

var browserify = require('browserify');
var gulp = require('gulp');

gulp.task('default', function() {
    console.log('todo');
});

> gulp
[14:16:44] Starting 'default'...
todo
[14:16:44] Finished 'default' after 85 µs

Before throwing lines of code and browserify, let’s explain a bit more how this stream management work with some examples.

If you already know well the streaming process and want to switch directly to how browserify works in Gulp, check out my next blog post.

stream.Readable

Gulp has some functions to create streams, such as .src() and .dest() you probably saw somewhere.

  gulp.src('test.js')

That returns a stream.Readable object, a stream which anything can read from. To read/consume from it, you can listen to the data event. To know if you have consumed everything, you can listen to the end event. Check out the Node documentation for more info : stream.Readable.

Let’s listen to it and log what are the incoming data:

  gulp.src('test.js').on('data', function(chunk) {
    console.log(chunk);
  }).on('end', function() {
    console.log('no more data');
  });

>
no more data

This is the ASCII code of each letter of the file test.js, then it displays no more data because the end event was triggered. You can see the data are encapsulated in a Buffer itself encapsulated in a File (which is a vinyl).

The event data is triggered for each file coming from .src() :

gulp.src(['test.js', 'test2.js']).on('data', function(chunk) {
    console.log('more data incoming:', chunk);
  }).on('end', function() {
    console.log('no more data');
  });

more data incoming: >
more data incoming: >
no more data

As we can see, the event data is triggered several times, then end once and for all.

What’s your stdin?

process.stdin is a readable stream listening to the standard input of the program (default: the keyboard!). To terminate it, you need to send an End Of Transmission signal (EOT) by typing Ctrl+D on an Unix machine (on Windows, it seems to be Ctrl+Z but that does not work). Or you can directly send some data doing echo TEST | gulp.

Let’s try to use it by replacing gulp.src() by process.stdin:

process.stdin.on('data', function(chunk) {
    console.log('you typed: ', chunk);
  }).on('end', function() {
    console.log('no more typing');
  });

hey buddy
you typed:  
^Dno more typing

That works the same way as with src(), because both are simply readable streams.

Put it out with pipe()

Let’s focus on the pipe function that is uberly used in any gulp file. It acts simply as a .. pipe, between a readable stream (that implements this function) and a writable stream (which is the function argument). The content is transmitted between the source and the target through a simple string or a Buffer (or using a Javascript object if the stream is using the Object Mode). This is why it’s used so much, gulp is just a bunch of stream readers/writers that are mostly used to transform your source code into what you want.

Let’s do something simple first : redirect the input into the output (stdin to stdout)

gulp.task('default', function() {
  process.stdin.pipe(process.stdout);
});

> gulp
hey
hey
stop repeating me
stop repeating me

You start the program, then you type something, it will be echo’ed on the console. Behind the scene, pipe() listens to the readable stream events such as the ones we just used: data and end, and write the data incoming into the writeable stream.

Transform me

Let’s go back to the src() function. We said it returns a stream of vinyl files (in Object Mode). To use it, you need to connect another processor that can handle this kind of stream. For instance, let’s create a simple FileToString to console.log the content of each file passing by the stream. To create it, you have to add var stream = require('stream'); which is a standard node package. To create a transform, you need to inherit from stream.Transform and implement the function _transform (it has an underscore because you should NOT called it yourself, it’s kinda private), as shown in this example :

function FileToString() {
    if (!(this instanceof FileToString)) return new FileToString(); // can be called without new
    stream.Transform.call(this, {
        objectMode: true // mandatory to work with vinyl file stream
    });
}
util.inherits(FileToString, stream.Transform);

FileToString.prototype._transform = function(chunk, encoding, callback) {
    console.log("| FileToString", chunk);
    var buf = chunk.contents; // refer to the internal Buffer of the File
    this.push(buf.toString()); // push it back for the next processor
    callback(); // or callback(null, buf.toString()) : that does the .push()
};

    ...
    gulp.src(['test.js', 'test2.js'])
        .pipe(FileToString())

That displays something like:

| FileToString >
| FileToString >

If we don’t enable objectMode, we’ll get this error:

events.js:72
        throw er; // Unhandled 'error' event
              ^
TypeError: Invalid non-string/buffer chunk
    at validChunk (_stream_writable.js:153:14)
    at FileToString.Writable.write (_stream_writable.js:182:12)

We don’t even pass through _transform, the error occurs internally before.

Don’t do buffer.toString()

You could see I used buf.toString() to get the content of the buffer but that’s not the right way. If you deal with UTF-8 charset, you could potentially have non-terminated UTF8 character at the end of your buffer (which will be terminated in the next buffer, check this example), therefore the toString would render odd things. You always need to use a StringDecoder and handle the _flush method in your transform for the leftover.

Here is a more complete transform example :

function StringDecoderTransform() {
    if (!(this instanceof StringDecoderTransform)) return new StringDecoderTransform();
    stream.Transform.call(this, {
        objectMode: true
    });

    this.decoder = new StringDecoder('utf8');
}
util.inherits(StringDecoderTransform, stream.Transform);

StringDecoderTransform.prototype._transform = function(chunk, encoding, callback) {
    console.log("| StringDecoderTransform", chunk);
    var buf = chunk.contents;
    this.push(this.decoder.write(buf)); // decoder.write returns a string
    callback();
};

StringDecoderTransform.prototype._flush = function(callback) {
    var leftOver = this.decoder.end();
    console.log("| StringDecoderTransform flush", leftOver);
    this.push(leftOver);
    callback();
};

| StringDecoderTransform >
| StringDecoderTransform >
| StringDecoderTransform >
| StringDecoderTransform flush

As you can see, _flush() is called only once, when the stream ends.

A transform is a Duplex Stream

A transform has input, and an output. It reads and writes, it’s a duplex stream. We were talking about objectMode needed to be true when working with object instead of string/buffers but that’s not entirely true. objectMode is actually the combinaison of 2 properties (be careful of your Node version, you need the 0.12.0 at least): readableObjectMode and writableObjectMode. You can read an object, and output a string.

Let’s write multiple processor, each reading/writing in different modes :

gulp.src(['test.js']) 
        .pipe(FileToUppercaseStringArray()) // input: vinyl stream (object) | output : arrays of uppercase strings (object)
        .pipe(StringsJoinerTransform())     // input: arrays (object) | output: string (non-object)
        .pipe(ExpectsStringTransform())     // input: string (non-object)

// input: vinyl stream (object) | output : arrays of uppercase strings (object)
function FileToUppercaseStringArray() {
    if (!(this instanceof FileToUppercaseStringArray)) return new FileToUppercaseStringArray();
    stream.Transform.call(this, {
        writableObjectMode: true, // write in me with objects
        readableObjectMode: true // read from me objects (arrays)
    });
    this.decoder = new StringDecoder();
}
util.inherits(FileToUppercaseStringArray, stream.Transform);
FileToUppercaseStringArray.prototype._transform = function(chunk, encoding, callback) {
    console.log('| FileToUppercaseStringArray input', chunk.contents);
    this.push(this.decoder.write(buf).toUpperCase().split(/\r?\n/));
    callback();
};

// -----------------------------------------------------------------------------------

// input: arrays (object) | output: string (non-object)
function StringsJoinerTransform() {
    if (!(this instanceof StringsJoinerTransform)) return new StringsJoinerTransform();
    stream.Transform.call(this, {
        writableObjectMode: true, // write in me with arrays, so objects
        readableObjectMode: false // read buffer/string from me
    });
}
util.inherits(StringsJoinerTransform, stream.Transform);
StringsJoinerTransform.prototype._transform = function(chunk, encoding, callback) {
    console.log('| StringsJoinerTransform input:', chunk)
    callback(null, chunk.join(' | ')); // outputs a simple string
};

// -----------------------------------------------------------------------------------

// input: string (non-object)
function ExpectsStringTransform() {
    if (!(this instanceof ExpectsStringTransform)) return new ExpectsStringTransform();
    stream.Transform.call(this, {
        writableObjectMode: false // write in me with buffer/string, no object
    });
}
util.inherits(ExpectsStringTransform, stream.Transform);
ExpectsStringTransform.prototype._transform = function(chunk, encoding, callback) {
    console.log('| ExpectsStringTransform input:', chunk)
    callback();
};

Here is its output :

| FileToUppercaseStringArray input >
| StringsJoinerTransform     input: [ 'VAR _ = REQUIRE(\'LODASH\');',
                                      'VAR ARR = [3, 43, 24, 10];',
                                      'CONSOLE.LOG(_.FIND(ARR, FUNCTION(ITEM) {',
                                      '    RETURN ITEM > 10;',
                                      '}));',
                                      '' ]
| ExpectsStringTransform     input:

We can see each different inputs for each processors : Vinyl File > Array of strings > Buffer.

Because we declare that you can read Buffer/strings from StringsJoinerTransform, its next processor, ExpectsStringTransform, gets a Buffer. But we could just send a plain string by setting readableObjectMode: true for StringsJoinerTransform and writableObjectMode: false for ExpectsStringTransform, with this result :

| ExpectsStringTransform input: VAR _ = REQUIRE('LODASH'); | VAR ARR = [3, 43, 24, 10]; | ...

As we can see, we got a string instead of a Buffer.

To check what’s under the hood and have examples, the official documentation is the perfect place but don’t lose yourself.

Unidirectional file stream

When you need to create a file which the content is coming from a stream, you can use the File System module available in Node core: var fs = require('fs');
It contains a bunch of methods sync and async, to do any kind of operations on files/folders/paths. Let’s focus on the streaming ones.

Let’s create the default read and write stream, and check what are the events :

var reader = fs.createReadStream('big.js');
               .on('open',     function() { console.log('stream is opened'); });
               .on('close',    function() { console.log('stream is closed'); });
               .on('readable', function() { console.log('stream is readable') });
               .on('data',     function() { console.log('stream has data'); })
               .on('end',      function() { console.log('stream is ending'); });
               .on('error',    function() { console.log('stream is in error'); });

stream is opened
stream has data
stream has data
...
stream has data
stream is readable
stream is ending
stream is closed

Let’s pipe it into a writable stream, copying a file for instance :

    var reader = fs.createReadStream('big.js');
    reader.on('open', function() { console.log('reader is opened'); });
    reader.on('readable', function() { console.log('reader is readable'); });
    reader.on('data', function(chunk) { console.log('reader has data:', chunk.length, 'bytes'); })
    reader.on('end', function() { console.log('reader is ending'); });
    reader.on('close', function() {console.log('reader is closed'); });
    reader.on('error', function() { console.log('reader is in error'); });

    var writer = fs.createWriteStream('big_copy.js');
    var originalWrite = writer._write.bind(writer); // just keep a ref to the original _write
    writer._write = function(chunk, enc, cb) {
        console.log('-- writer is writing', chunk.length, 'bytes');
        originalWrite(chunk, enc, cb);
    };
    writer.on('open', function() { console.log('-- writer is opened | total bytes written:', this.bytesWritten); });
    writer.on('drain', function() { console.log('-- writer is drained | total bytes written:', this.bytesWritten); });
    writer.on('finish', function() { console.log('-- writer has finished | total bytes written:', this.bytesWritten); });
    writer.on('pipe', function(readable) { console.log('-- writer is being piped by a readable stream'); });
    writer.on('unpipe', function(readable) { console.log('-- writer is not more being piped by a readable stream'); });
    writer.on('error', function(err) { console.log('-- writer is drained | total bytes written:', this.bytesWritten); });
    reader.pipe(writer);

The output reveals when the events are triggered :

-- writer is being piped by a readable stream
reader is opened
-- writer is opened | total bytes written: 0
reader has data: 65536 bytes
-- writer is writing 65536 bytes
-- writer is drained | total bytes written: 65536
reader has data: 65536 bytes
-- writer is writing 65536 bytes
-- writer is drained | total bytes written: 131072
reader has data: 65536 bytes
-- writer is writing 65536 bytes
-- writer is drained | total bytes written: 196608
reader has data: 65536 bytes
-- writer is writing 65536 bytes
reader is readable                     // reader calling readable for the first time
-- writer is drained | total bytes written: 262144
...
reader has data: 65536 bytes
-- writer is writing 65536 bytes
reader is readable
-- writer is drained | total bytes written: 524288
reader has data: 54398 bytes
-- writer is writing 54398 bytes
reader is readable
-- writer is drained | total bytes written: 578686
reader is ending
-- writer has finished | total bytes written: 578686
-- writer is not more being piped by a readable stream
reader is closed

Note: if you have the content already in memory (and if it’s not too big to not ), don’t use the streaming methods, fs.writeFile() or fs.readFile() are doing the job without streams.

To manually send data over a writable stream without pipe(), you can use the write() method (read() for a reader) :

var writer = fs.createWriteStream('writer.js');
...
writer.write("it's a trap"); // takes a string
writer.write(crypto.randomBytes(1000)); // randomBytes returns a Buffer
writer.close(); // need to ensure everything is flushed

You can see that there a double write of the same sequence in this output :

-- writer is writing  11 bytes
-- writer is opened | total bytes written: 0
-- writer is writing  11 bytes
-- writer is writing  1000 bytes
-- writer has finished | total bytes written: 1011

It’s because it was not even opened when the first write was executed, so nothing was written. After it’s opened, the writer catches up and sends it again (that wouldn’t happened if the write() were started after the opening). Moreover, you need to call close() yourself to ensure all the data are flushed.

Applications with browserify

Now that we understand much more the streaming piece, let’s refocus on how browserify works with gulp (I’ve started the post with that :-)), using some famous processors.

Check out my next blog post !

Gulp is fond of Node streams