Why use file streams
Imagine such a scenario, I want to process a 10G file, but my memory size is only 2G, what should I do?
We can read the file in 5 times, and only read 2G of data each time, so that this problem can be solved, then the process of reading in segments is stream!
In node stream
, the module encapsulates the basic operations of the stream, and the file stream is also directly dependent on this module. Here we use the file stream to understand in depthstream
file readable stream
Read the file and read the contents of the file into the memory bit by bit.
How to use
Let's take a look at the basic usage first.
const fs = require('fs')
const rs = fs.createReadStream('./w-test.js')
rs.on('data', (chunk) => {
console.log(chunk)
})
rs.on('close', () => {
console.log('close')
})
As shown in the above code, we have fs.createStream()
created a readable stream to read the w-test.js file.
At that on('data')
time , the file data will be read automatically, and the content of 64kb is read by default each time. You can also highWaterMark
dynamically change the threshold of each content process through parameters.
close
The event is automatically triggered when the file is read .
The following code is the parameter that createReadStream
can be configured
const rs = fs.createReadStream('./w-test.js', {
flags: 'r', // 文件系统表示,这里是指以可读的形式操作文件
encoding: null, // 编码方式
autoClose: false, // 读取完毕时是否自动触发 close 事件
start: 0, // 开始读取的位置
end: 2, // 结束读取的位置
highWaterMark: 2 // 每次读取的内容大小
})
Note: Both start and end are included, ie [start, end].
In fact, fs.crateReadStream
it returns an instance of fs.ReadStream
the class , so the above code is equivalent to:
const rs = new fs.ReadStream('./w-test.js', {
flags: 'r', // 文件系统表示,这里是指以可读的形式操作文件
encoding: null, // 编码方式
autoClose: false, // 读取完毕时是否自动触发 close 事件
start: 0, // 开始读取的位置
end: 2, // 结束读取的位置
highWaterMark: 2 // 每次读取的内容大小
})
Implementation of file-readable streams
After understanding the usage, we should try to get it done in principle. Next, we write a readable stream by hand.
fs.read / fs.open
The essence of a readable stream is to read file data in batches, and fs.read()
the method can control the content size of the read file
fs.read(fd, buffer, offset, length, position, callback)
fd: the file descriptor to read
buffer: the buffer where the data is to be written (write the read file content into this buffer)
offset: the offset to start writing in the buffer (from the index of the buffer to start writing)
length: the number of bytes read (read several bytes from the file)
postion: Specify the position to start reading from the file (start reading from the first few bytes of the file)
callback: callback function
-
err
-
bytesRead: the actual number of bytes read
Reading a file requires a file identifier, we should use fs.open
to get
path: file path
flags: filesystem flags, default: 'r'. It means what operation to do on the file, the common ones are as follows:
-
r: open the file for reading
-
w: open the file for writing
-
a: open the file for appending
mode: file operation permission, default value: 0o666 (readable and writable).
callback: callback function. The parameters carried on the function are as follows:
-
err: If it fails, the value is the reason for the error
-
fd (number): file descriptor, this value is used when reading and writing files
initialization
First of all, ReadStream
it is a class. From the performance point of view, this class can listen to events on('data')
, so we should let it inherit EventEmitter
from the following code:
class ReadStream extends EventEmitter {
constructor() {
super();
}
}
Then we initialize the parameters and open the file, the following code (the key code will be commented in the code):
class ReadStream extends EventEmitter {
constructor(path, options = {}) {
super()
// 解析参数
this.path = path
this.flags = options.flags ?? 'r'
this.encoding = options.encoding ?? 'utf8'
this.autoClose = options.autoClose ?? true
this.start = options.start ?? 0
this.end = options.end ?? undefined
this.highWaterMark = options.highWaterMark ?? 16 * 1024
// 文件的偏移量
this.offset = this.start
// 是否处于流动状态,调用 pause 或 resume 方法时会用到,下文会讲到
this.flowing = false
// 打开文件
this.open()
// 当绑定新事件时会触发 newListener
// 这里当绑定 data 事件时,自动触发文件的读取
this.on('newListener', (type) => {
if (type === 'data') {
// 标记为开始流动
this.flowing = true
// 开始读取文件
this.read()
}
})
}
}
Before reading the file, we have to open the file, ie
this.open()
.on('newListener')
It is an event of EventEmitter, which will be triggered whenever we bind a new eventnewListener
, for example: whenon('data')
we ,newListener
the event will be triggered, and the type is 'data'.Here, when we listen to
data
the event binding (ieon('data')
), we start to read the filethis.read()
, thatthis.read()
is, our core method.
open
open
Methods as below:
open() {
fs.open(this.path, this.flags, (err, fd) => {
if (err) {
// 文件打开失败触发 error 事件
this.emit('error', err)
return
}
// 记录文件标识符
this.fd = fd
// 文件打开成功后触发 open 事件
this.emit('open')
})
}
Record the file identifier when the file is opened, iethis.fd
read
read
Methods as below:
read() {
// 由于 ```fs.open``` 是异步操作,
// 所以当调用 read 方法时,文件可能还没有打开
// 所以我们要等 open 事件触发之后,再次调用 read 方法
if (typeof this.fd !== 'number') {
this.once('open', () => this.read())
return
}
// 申请一个 highWaterMark 字节的 buffer,
// 用来存储从文件读取的内容
const buf = Buffer.alloc(this.highWaterMark)
// 开始读取文件
// 每次读取时,都记录下文件的偏移量
fs.read(this.fd, buf, 0, buf.length, this.offset, (err, bytesRead) => {
this.offset += bytesRead
// bytesRead 为实际读取的文件字节数
// 如果 bytesRead 为 0,则代表没有读取到内容,即读取完毕
if (bytesRead) {
// 每次读取都触发 data 事件
this.emit('data', buf.slice(0, bytesRead))
// 如果处于流动状态,则继续读取
// 这里当调用 pause 方法时,会将 this.flowing 置为 false
this.flowing && this.read()
} else {
// 读取完毕后触发 end 事件
this.emit('end')
// 如果可以自动关闭,则关闭文件并触发 close 事件
this.autoClose && fs.close(this.fd, () => this.emit('close'))
}
})
}
Each line of code above has comments, I believe it is not difficult to understand, here are a few key points to pay attention to
You must wait for the file to be opened before you can start reading the file, but file opening is an asynchronous operation, and we don't know the specific opening completion time, so we will trigger an
on('open')
eventopen
trigger re-call againread()
fs.read()
The method has been mentioned before, you can take a look at the core method of handwrittenthis.flowing
Attributes are used to determine whether it is fluid, and will be controlled by the correspondingpasue()
method andresume()
. Let's take a look at these two methods.
pause
pause() {
this.flowing =false
}
resume
resume() {
if (!this.flowing) {
this.flowing = true
this.read()
}
}
full code
const { EventEmitter } = require('events')
const fs = require('fs')
class ReadStream extends EventEmitter {
constructor(path, options = {}) {
super()
this.path = path
this.flags = options.flags ?? 'r'
this.encoding = options.encoding ?? 'utf8'
this.autoClose = options.autoClose ?? true
this.start = options.start ?? 0
this.end = options.end ?? undefined
this.highWaterMark = options.highWaterMark ?? 16 * 1024
this.offset = this.start
this.flowing = false
this.open()
this.on('newListener', (type) => {
if (type === 'data') {
this.flowing = true
this.read()
}
})
}
open() {
fs.open(this.path, this.flags, (err, fd) => {
if (err) {
this.emit('error', err)
return
}
this.fd = fd
this.emit('open')
})
}
pause() {
this.flowing =false
}
resume() {
if (!this.flowing) {
this.flowing = true
this.read()
}
}
read() {
if (typeof this.fd !== 'number') {
this.once('open', () => this.read())
return
}
const buf = Buffer.alloc(this.highWaterMark)
fs.read(this.fd, buf, 0, buf.length, this.offset, (err, bytesRead) => {
this.offset += bytesRead
if (bytesRead) {
this.emit('data', buf.slice(0, bytesRead))
this.flowing && this.read()
} else {
this.emit('end')
this.autoClose && fs.close(this.fd, () => this.emit('close'))
}
})
}
}
file writable stream
As the name suggests, write the content to the file bit by bit.
fs.write
fd: the file descriptor to be written
buffer: write the contents of the specified buffer to the file
offset: Specify the write position of the buffer (write the content read from the offset index of the buffer to the file)
length: specifies the number of bytes to write
position: the offset of the file (write from the position byte of the file)
How to use
// 使用方式 1:
const ws = fs.createWriteStream('./w-test.js')
// 使用方式 2:
const ws = new WriteStream('./w-test.js', {
flags: 'w',
encoding: 'utf8',
autoClose: true,
highWaterMark: 2
})
// 写入文件
const flag = ws.write('2')
ws.on('drain', () => console.log('drain'))
ws.write()
write to the file. There is a return value here, representing whether the maximum cache has been reached. When we call multiplewrite()
times , the file is not written immediately for each call, but only one write operation can be performed at the same time, so the rest will be written into the cache until the last write is completed Then fetch and execute them sequentially from the cache. Therefore, there will be a maximum cache size at this time, which is 64kb by default. The return value here represents whether it can continue to write, that is: whether the maximum cache has been reached. true means that writing can continue.ws.on('drain')
, if the callws.write()
returns false, the 'drain' event is fired when data can continue to be written to the stream.
Implementation of a file-writable stream
initialization
First define WriteStream
the class and inherit it EventEmitter
, then initialize the parameters. _Pay attention to the code comments_
const { EventEmitter } = require('events')
const fs = require('fs')
class WriteStream extends EventEmitter {
constructor(path, options = {}) {
super()
// 初始化参数
this.path = path
this.flags = options.flags ?? 'w'
this.encoding = options.encoding ?? 'utf8'
this.autoClose = options.autoClose ?? true
this.highWaterMark = options.highWaterMark ?? 16 * 1024
this.offset = 0 // 文件读取偏移量
this.cache = [] // 缓存的要被写入的内容
// 将要被写入的总长度,包括缓存中的内容长度
this.writtenLen = 0
// 是否正在执行写入操作,
// 如果正在写入,那以后的操作需放入 this.cache
this.writing = false
// 是否应该触发 drain 事件
this.needDrain = false
// 打开文件
this.open()
}
}
open()
Same code as ReadStream.
open() {
fs.open(this.path, this.flags, (err, fd) => {
if (err) {
this.emit('error', err)
return
}
this.fd = fd
this.emit('open')
})
}
write()
perform write operation
write(chunk, encoding, cb = () => {}) {
// 初始化被写入的内容
// 如果时字符串,则转为 buffer
chunk = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk, encoding)
// 计算要被写入的长度
this.writtenLen += chunk.length
// 判断是否已经超过 highWaterMark
const hasLimit = this.writtenLen >= this.highWaterMark
// 是否需要触发 drain
// 如果超过 highWaterMark,则代表需要触发
this.needDrain = hasLimit
// 如果没有正在写入的内容,则调用 _write 直接开始写入
// 否则放入 cache 中
// 写入完成后,调用 clearBuffer,从缓存中拿取最近一次内容开始写入
if (!this.writing) {
this.writing = true
this._write(chunk, () => {
cb()
this.clearBuffer()
})
} else {
this.cache.push({
chunk: chunk,
cb
})
}
return !hasLimit
}
// 写入操作
_write(chunk, cb) {
if (typeof this.fd !== 'number') {
this.once('open', () => this._write(chunk, cb))
return
}
// 写入文件
fs.write(this.fd, chunk, 0, chunk.length, this.offset, (err, bytesWritten) => {
if (err) {
this.emit('error', err)
return
}
// 计算偏移量
this.offset += bytesWritten
// 写入完毕,则减去当前写入的长度
this.writtenLen -= bytesWritten
cb()
})
}
First initialize the content to be written, only support buffer and string, if it is a string, it will be directly converted to buffer.
Calculate the total length to be written, i.e.
this.writtenLen += chunk.length
Determine whether the highWaterMark has been exceeded
Determine whether to trigger drain
Determine whether there is already content being written, if not, call
_write()
directly to write, if there is, put it in the cache. When is finished_write()
writing , callclearBuffer()
the method to fetch the first cached contentthis.cache
from and write. The clearBuffer method looks like this
clearBuffer()
clearBuffer() {
// 取出缓存
const data = this.cache.shift()
if (data) {
const { chunk, cb } = data
// 继续进行写入操作
this._write(chunk, () => {
cb()
this.clearBuffer()
})
return
}
// 触发 drain
this.needDrain && this.emit('drain')
// 写入完毕,将writing置为false
this.writing = false
// needDrain 置为 false
this.needDrain = false
}
full code
const { EventEmitter } = require('events')
const fs = require('fs')
class WriteStream extends EventEmitter {
constructor(path, options = {}) {
super()
this.path = path
this.flags = options.flags ?? 'w'
this.encoding = options.encoding ?? 'utf8'
this.autoClose = options.autoClose ?? true
this.highWaterMark = options.highWaterMark ?? 16 * 1024
this.offset = 0
this.cache = []
this.writtenLen = 0
this.writing = false
this.needDrain = false
this.open()
}
open() {
fs.open(this.path, this.flags, (err, fd) => {
if (err) {
this.emit('error', err)
return
}
this.fd = fd
this.emit('open')
})
}
clearBuffer() {
const data = this.cache.shift()
if (data) {
const { chunk, cb } = data
this._write(chunk, () => {
cb()
this.clearBuffer()
})
return
}
this.needDrain && this.emit('drain')
this.writing = false
this.needDrain = false
}
write(chunk, encoding, cb = () => {}) {
chunk = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk, encoding)
this.writtenLen += chunk.length
const hasLimit = this.writtenLen >= this.highWaterMark
this.needDrain = hasLimit
if (!this.writing) {
this.writing = true
this._write(chunk, () => {
cb()
this.clearBuffer()
})
} else {
this.cache.push({
chunk: chunk,
cb
})
}
return !hasLimit
}
_write(chunk, cb) {
if (typeof this.fd !== 'number') {
this.once('open', () => this._write(chunk, cb))
return
}
fs.write(this.fd, chunk, 0, chunk.length, this.offset, (err, bytesWritten) => {
if (err) {
this.emit('error', err)
return
}
this.offset += bytesWritten
this.writtenLen -= bytesWritten
cb()
})
}
}
- END -
About Qi Wu Troupe
Qi Wu Troupe is the largest front-end team of 360 Group, and participates in the work of W3C and ECMA members (TC39) on behalf of the group. Qi Wu Troupe attaches great importance to talent training, and has various development directions such as engineers, lecturers, translators, business interface people, and team leaders for employees to choose from, and provides corresponding technical, professional, general, and leadership training course. Qi Dance Troupe welcomes all kinds of outstanding talents to pay attention to and join Qi Dance Troupe with an open and talent-seeking attitude.