If you’re not familiar enough with browserify options and features, check out my previous post explaining browserify in-depth.
I love streaming
Gulp is a stream-based project builder. Basically: a bunch of modules that take one input and output something, which is taken by another module etc. until the whole process is done.
First thing to do, create our gulpfile.js
at the root of our project and require()
what we need: gulp and browserify.
Once, there was a gulp-browserify plugin. It’s still downloadable but it is not maintained anymore so don’t use it. You have to use browserify itself, that’s it.
var browserify = require('browserify'); var gulp = require('gulp'); gulp.task('default', function() { console.log('todo'); });
> gulp [14:16:44] Starting 'default'... todo [14:16:44] Finished 'default' after 85 µs
Before throwing lines of code and browserify, let’s explain a bit more how this stream management work with some examples.
If you already know well the streaming process and want to switch directly to how browserify works in Gulp, check out my next blog post.
stream.Readable
Gulp has some functions to create streams, such as .src()
and .dest()
you probably saw somewhere.
gulp.src('test.js')
That returns a stream.Readable
object, a stream which anything can read from. To read/consume from it, you can listen to the data event. To know if you have consumed everything, you can listen to the end event. Check out the Node documentation for more info : stream.Readable.
Let’s listen to it and log what are the incoming data:
gulp.src('test.js').on('data', function(chunk) { console.log(chunk); }).on('end', function() { console.log('no more data'); });
> no more data
This is the ASCII code of each letter of the file test.js
, then it displays no more data because the end event was triggered. You can see the data are encapsulated in a Buffer itself encapsulated in a File (which is a vinyl).
The event data is triggered for each file coming from .src()
:
gulp.src(['test.js', 'test2.js']).on('data', function(chunk) { console.log('more data incoming:', chunk); }).on('end', function() { console.log('no more data'); });
more data incoming:> more data incoming: > no more data
As we can see, the event data is triggered several times, then end once and for all.
What’s your stdin?
process.stdin
is a readable stream listening to the standard input of the program (default: the keyboard!). To terminate it, you need to send an End Of Transmission signal (EOT) by typing Ctrl+D
on an Unix machine (on Windows, it seems to be Ctrl+Z
but that does not work). Or you can directly send some data doing echo TEST | gulp
.
Let’s try to use it by replacing gulp.src()
by process.stdin
:
process.stdin.on('data', function(chunk) { console.log('you typed: ', chunk); }).on('end', function() { console.log('no more typing'); });
hey buddy you typed:^Dno more typing
That works the same way as with src()
, because both are simply readable streams.
Put it out with pipe()
Let’s focus on the pipe
function that is uberly used in any gulp file. It acts simply as a .. pipe, between a readable stream (that implements this function) and a writable stream (which is the function argument). The content is transmitted between the source and the target through a simple string or a Buffer (or using a Javascript object if the stream is using the Object Mode). This is why it’s used so much, gulp is just a bunch of stream readers/writers that are mostly used to transform your source code into what you want.
Let’s do something simple first : redirect the input into the output (stdin to stdout)
gulp.task('default', function() { process.stdin.pipe(process.stdout); });
> gulp hey hey stop repeating me stop repeating me
You start the program, then you type something, it will be echo’ed on the console. Behind the scene, pipe()
listens to the readable stream events such as the ones we just used: data and end, and write the data incoming into the writeable stream.
Transform me
Let’s go back to the src()
function. We said it returns a stream of vinyl files (in Object Mode). To use it, you need to connect another processor that can handle this kind of stream. For instance, let’s create a simple FileToString
to console.log
the content of each file passing by the stream. To create it, you have to add var stream = require('stream');
which is a standard node package. To create a transform, you need to inherit from stream.Transform and implement the function _transform
(it has an underscore because you should NOT called it yourself, it’s kinda private), as shown in this example :
function FileToString() { if (!(this instanceof FileToString)) return new FileToString(); // can be called without new stream.Transform.call(this, { objectMode: true // mandatory to work with vinyl file stream }); } util.inherits(FileToString, stream.Transform); FileToString.prototype._transform = function(chunk, encoding, callback) { console.log("| FileToString", chunk); var buf = chunk.contents; // refer to the internal Buffer of the File this.push(buf.toString()); // push it back for the next processor callback(); // or callback(null, buf.toString()) : that does the .push() }; ... gulp.src(['test.js', 'test2.js']) .pipe(FileToString())
That displays something like:
| FileToString> | FileToString >
If we don’t enable objectMode
, we’ll get this error:
events.js:72 throw er; // Unhandled 'error' event ^ TypeError: Invalid non-string/buffer chunk at validChunk (_stream_writable.js:153:14) at FileToString.Writable.write (_stream_writable.js:182:12)
We don’t even pass through _transform
, the error occurs internally before.
Don’t do buffer.toString()
You could see I used buf.toString()
to get the content of the buffer but that’s not the right way. If you deal with UTF-8 charset, you could potentially have non-terminated UTF8 character at the end of your buffer (which will be terminated in the next buffer, check this example), therefore the toString
would render odd things. You always need to use a StringDecoder
and handle the _flush
method in your transform for the leftover.
Here is a more complete transform example :
function StringDecoderTransform() { if (!(this instanceof StringDecoderTransform)) return new StringDecoderTransform(); stream.Transform.call(this, { objectMode: true }); this.decoder = new StringDecoder('utf8'); } util.inherits(StringDecoderTransform, stream.Transform); StringDecoderTransform.prototype._transform = function(chunk, encoding, callback) { console.log("| StringDecoderTransform", chunk); var buf = chunk.contents; this.push(this.decoder.write(buf)); // decoder.write returns a string callback(); }; StringDecoderTransform.prototype._flush = function(callback) { var leftOver = this.decoder.end(); console.log("| StringDecoderTransform flush", leftOver); this.push(leftOver); callback(); };
| StringDecoderTransform> | StringDecoderTransform > | StringDecoderTransform > | StringDecoderTransform flush
As you can see, _flush()
is called only once, when the stream ends.
A transform is a Duplex Stream
A transform has input, and an output. It reads and writes, it’s a duplex stream. We were talking about objectMode
needed to be true
when working with object instead of string/buffers but that’s not entirely true. objectMode
is actually the combinaison of 2 properties (be careful of your Node version, you need the 0.12.0 at least): readableObjectMode
and writableObjectMode
. You can read an object, and output a string.
Let’s write multiple processor, each reading/writing in different modes :
gulp.src(['test.js']) .pipe(FileToUppercaseStringArray()) // input: vinyl stream (object) | output : arrays of uppercase strings (object) .pipe(StringsJoinerTransform()) // input: arrays (object) | output: string (non-object) .pipe(ExpectsStringTransform()) // input: string (non-object)
// input: vinyl stream (object) | output : arrays of uppercase strings (object) function FileToUppercaseStringArray() { if (!(this instanceof FileToUppercaseStringArray)) return new FileToUppercaseStringArray(); stream.Transform.call(this, { writableObjectMode: true, // write in me with objects readableObjectMode: true // read from me objects (arrays) }); this.decoder = new StringDecoder(); } util.inherits(FileToUppercaseStringArray, stream.Transform); FileToUppercaseStringArray.prototype._transform = function(chunk, encoding, callback) { console.log('| FileToUppercaseStringArray input', chunk.contents); this.push(this.decoder.write(buf).toUpperCase().split(/\r?\n/)); callback(); }; // ----------------------------------------------------------------------------------- // input: arrays (object) | output: string (non-object) function StringsJoinerTransform() { if (!(this instanceof StringsJoinerTransform)) return new StringsJoinerTransform(); stream.Transform.call(this, { writableObjectMode: true, // write in me with arrays, so objects readableObjectMode: false // read buffer/string from me }); } util.inherits(StringsJoinerTransform, stream.Transform); StringsJoinerTransform.prototype._transform = function(chunk, encoding, callback) { console.log('| StringsJoinerTransform input:', chunk) callback(null, chunk.join(' | ')); // outputs a simple string }; // ----------------------------------------------------------------------------------- // input: string (non-object) function ExpectsStringTransform() { if (!(this instanceof ExpectsStringTransform)) return new ExpectsStringTransform(); stream.Transform.call(this, { writableObjectMode: false // write in me with buffer/string, no object }); } util.inherits(ExpectsStringTransform, stream.Transform); ExpectsStringTransform.prototype._transform = function(chunk, encoding, callback) { console.log('| ExpectsStringTransform input:', chunk) callback(); };
Here is its output :
| FileToUppercaseStringArray input> | StringsJoinerTransform input: [ 'VAR _ = REQUIRE(\'LODASH\');', 'VAR ARR = [3, 43, 24, 10];', 'CONSOLE.LOG(_.FIND(ARR, FUNCTION(ITEM) {', ' RETURN ITEM > 10;', '}));', '' ] | ExpectsStringTransform input:
We can see each different inputs for each processors : Vinyl File > Array of strings > Buffer.
Because we declare that you can read Buffer/strings from StringsJoinerTransform
, its next processor, ExpectsStringTransform
, gets a Buffer. But we could just send a plain string by setting readableObjectMode: true
for StringsJoinerTransform
and writableObjectMode: false
for ExpectsStringTransform
, with this result :
| ExpectsStringTransform input: VAR _ = REQUIRE('LODASH'); | VAR ARR = [3, 43, 24, 10]; | ...
As we can see, we got a string instead of a Buffer.
To check what’s under the hood and have examples, the official documentation is the perfect place but don’t lose yourself.
Unidirectional file stream
When you need to create a file which the content is coming from a stream, you can use the File System module available in Node core: var fs = require('fs');
It contains a bunch of methods sync and async, to do any kind of operations on files/folders/paths. Let’s focus on the streaming ones.
Let’s create the default read and write stream, and check what are the events :
var reader = fs.createReadStream('big.js'); .on('open', function() { console.log('stream is opened'); }); .on('close', function() { console.log('stream is closed'); }); .on('readable', function() { console.log('stream is readable') }); .on('data', function() { console.log('stream has data'); }) .on('end', function() { console.log('stream is ending'); }); .on('error', function() { console.log('stream is in error'); });
stream is opened stream has data stream has data ... stream has data stream is readable stream is ending stream is closed
Let’s pipe it into a writable stream, copying a file for instance :
var reader = fs.createReadStream('big.js'); reader.on('open', function() { console.log('reader is opened'); }); reader.on('readable', function() { console.log('reader is readable'); }); reader.on('data', function(chunk) { console.log('reader has data:', chunk.length, 'bytes'); }) reader.on('end', function() { console.log('reader is ending'); }); reader.on('close', function() {console.log('reader is closed'); }); reader.on('error', function() { console.log('reader is in error'); }); var writer = fs.createWriteStream('big_copy.js'); var originalWrite = writer._write.bind(writer); // just keep a ref to the original _write writer._write = function(chunk, enc, cb) { console.log('-- writer is writing', chunk.length, 'bytes'); originalWrite(chunk, enc, cb); }; writer.on('open', function() { console.log('-- writer is opened | total bytes written:', this.bytesWritten); }); writer.on('drain', function() { console.log('-- writer is drained | total bytes written:', this.bytesWritten); }); writer.on('finish', function() { console.log('-- writer has finished | total bytes written:', this.bytesWritten); }); writer.on('pipe', function(readable) { console.log('-- writer is being piped by a readable stream'); }); writer.on('unpipe', function(readable) { console.log('-- writer is not more being piped by a readable stream'); }); writer.on('error', function(err) { console.log('-- writer is drained | total bytes written:', this.bytesWritten); }); reader.pipe(writer);
The output reveals when the events are triggered :
-- writer is being piped by a readable stream reader is opened -- writer is opened | total bytes written: 0 reader has data: 65536 bytes -- writer is writing 65536 bytes -- writer is drained | total bytes written: 65536 reader has data: 65536 bytes -- writer is writing 65536 bytes -- writer is drained | total bytes written: 131072 reader has data: 65536 bytes -- writer is writing 65536 bytes -- writer is drained | total bytes written: 196608 reader has data: 65536 bytes -- writer is writing 65536 bytes reader is readable // reader calling readable for the first time -- writer is drained | total bytes written: 262144 ... reader has data: 65536 bytes -- writer is writing 65536 bytes reader is readable -- writer is drained | total bytes written: 524288 reader has data: 54398 bytes -- writer is writing 54398 bytes reader is readable -- writer is drained | total bytes written: 578686 reader is ending -- writer has finished | total bytes written: 578686 -- writer is not more being piped by a readable stream reader is closed
Note: if you have the content already in memory (and if it’s not too big to not ), don’t use the streaming methods, fs.writeFile()
or fs.readFile()
are doing the job without streams.
To manually send data over a writable stream without pipe()
, you can use the write()
method (read()
for a reader) :
var writer = fs.createWriteStream('writer.js'); ... writer.write("it's a trap"); // takes a string writer.write(crypto.randomBytes(1000)); // randomBytes returns a Buffer writer.close(); // need to ensure everything is flushed
You can see that there a double write of the same sequence in this output :
-- writer is writing11 bytes -- writer is opened | total bytes written: 0 -- writer is writing 11 bytes -- writer is writing 1000 bytes -- writer has finished | total bytes written: 1011
It’s because it was not even opened when the first write was executed, so nothing was written. After it’s opened, the writer catches up and sends it again (that wouldn’t happened if the write()
were started after the opening). Moreover, you need to call close()
yourself to ensure all the data are flushed.
Applications with browserify
Now that we understand much more the streaming piece, let’s refocus on how browserify works with gulp (I’ve started the post with that :-)), using some famous processors.
Check out my next blog post !