In-depth understanding of file flow in node

Why use file streams

Imagine such a scenario, I want to process a 10G file, but my memory size is only 2G, what should I do?

We can read the file in 5 times, and only read 2G of data each time, so that this problem can be solved, then the process of reading in segments is stream!

In node stream, the module encapsulates the basic operations of the stream, and the file stream is also directly dependent on this module. Here we use the file stream to understand in depthstream

file readable stream

Read the file and read the contents of the file into the memory bit by bit.

How to use

Let's take a look at the basic usage first.

const fs = require('fs')

const rs = fs.createReadStream('./w-test.js')

rs.on('data', (chunk) => {
  console.log(chunk)
})

rs.on('close', () => {
  console.log('close')
})

As shown in the above code, we have fs.createStream()created a readable stream to read the w-test.js file.

At that on('data')time , the file data will be read automatically, and the content of 64kb is read by default each time. You can also highWaterMarkdynamically change the threshold of each content process through parameters.

closeThe event is automatically triggered when the file is read .

The following code is the parameter that createReadStreamcan be configured

const rs = fs.createReadStream('./w-test.js', {
  flags: 'r', // 文件系统表示,这里是指以可读的形式操作文件
  encoding: null, // 编码方式
  autoClose: false, // 读取完毕时是否自动触发 close 事件
  start: 0, // 开始读取的位置
  end: 2, // 结束读取的位置
  highWaterMark: 2 // 每次读取的内容大小
})

Note: Both start and end are included, ie [start, end].

In fact, fs.crateReadStreamit returns an instance of fs.ReadStreamthe class , so the above code is equivalent to:

const rs = new fs.ReadStream('./w-test.js', {
  flags: 'r', // 文件系统表示,这里是指以可读的形式操作文件
  encoding: null, // 编码方式
  autoClose: false, // 读取完毕时是否自动触发 close 事件
  start: 0, // 开始读取的位置
  end: 2, // 结束读取的位置
  highWaterMark: 2 // 每次读取的内容大小
})

Implementation of file-readable streams

After understanding the usage, we should try to get it done in principle. Next, we write a readable stream by hand.

fs.read / fs.open

The essence of a readable stream is to read file data in batches, and fs.read()the method can control the content size of the read file

fs.read(fd, buffer, offset, length, position, callback)
  • fd: the file descriptor to read

  • buffer: the buffer where the data is to be written (write the read file content into this buffer)

  • offset: the offset to start writing in the buffer (from the index of the buffer to start writing)

  • length: the number of bytes read (read several bytes from the file)

  • postion: Specify the position to start reading from the file (start reading from the first few bytes of the file)

  • callback: callback function

    • err

    • bytesRead: the actual number of bytes read

Reading a file requires a file identifier, we should use fs.opento get

  • path: file path

  • flags: filesystem flags, default: 'r'. It means what operation to do on the file, the common ones are as follows:

    • r: open the file for reading

    • w: open the file for writing

    • a: open the file for appending

  • mode: file operation permission, default value: 0o666 (readable and writable).

  • callback: callback function. The parameters carried on the function are as follows:

    • err: If it fails, the value is the reason for the error

    • fd (number): file descriptor, this value is used when reading and writing files

initialization

First of all, ReadStreamit is a class. From the performance point of view, this class can listen to events on('data'), so we should let it inherit EventEmitterfrom the following code:

class ReadStream extends EventEmitter {
  constructor() {
    super();
  }
}

Then we initialize the parameters and open the file, the following code (the key code will be commented in the code):

class ReadStream extends EventEmitter {
  constructor(path, options = {}) {
    super()

    // 解析参数
    this.path = path
    this.flags = options.flags ?? 'r'
    this.encoding = options.encoding ?? 'utf8'
    this.autoClose = options.autoClose ?? true
    this.start = options.start ?? 0
    this.end = options.end ?? undefined
    this.highWaterMark = options.highWaterMark ?? 16 * 1024

    // 文件的偏移量
    this.offset = this.start

    // 是否处于流动状态,调用 pause 或 resume 方法时会用到,下文会讲到
    this.flowing = false

    // 打开文件
    this.open()

    // 当绑定新事件时会触发 newListener
    // 这里当绑定 data 事件时,自动触发文件的读取
    this.on('newListener', (type) => {
      if (type === 'data') {
        // 标记为开始流动
        this.flowing = true
        // 开始读取文件
        this.read()
      }
    })
  }
}
  • Before reading the file, we have to open the file, ie this.open().

  • on('newListener')It is an event of EventEmitter, which will be triggered whenever we bind a new event newListener, for example: when on('data')we , newListenerthe event will be triggered, and the type is 'data'.

  • Here, when we listen to datathe event binding (ie on('data')), we start to read the file this.read(), that this.read()is, our core method.

open

openMethods as below:

open() {
  fs.open(this.path, this.flags, (err, fd) => {
    if (err) {
      // 文件打开失败触发 error 事件
      this.emit('error', err)
      return
    }

    // 记录文件标识符
    this.fd = fd
    // 文件打开成功后触发 open 事件
    this.emit('open')
  })
}

Record the file identifier when the file is opened, iethis.fd

read

readMethods as below:

read() {
  // 由于 ```fs.open``` 是异步操作,
  // 所以当调用 read 方法时,文件可能还没有打开
  // 所以我们要等 open 事件触发之后,再次调用 read 方法
  if (typeof this.fd !== 'number') {
    this.once('open', () => this.read())
    return
  }

  // 申请一个 highWaterMark 字节的 buffer,
  // 用来存储从文件读取的内容
  const buf = Buffer.alloc(this.highWaterMark)

  // 开始读取文件
  // 每次读取时,都记录下文件的偏移量
  fs.read(this.fd, buf, 0, buf.length, this.offset, (err, bytesRead) => {
    this.offset += bytesRead

    // bytesRead 为实际读取的文件字节数
    // 如果 bytesRead 为 0,则代表没有读取到内容,即读取完毕
    if (bytesRead) {
      // 每次读取都触发 data 事件
      this.emit('data', buf.slice(0, bytesRead))
      // 如果处于流动状态,则继续读取
      // 这里当调用 pause 方法时,会将 this.flowing 置为 false
      this.flowing && this.read()
    } else {
      // 读取完毕后触发 end 事件
      this.emit('end')

      // 如果可以自动关闭,则关闭文件并触发 close 事件
      this.autoClose && fs.close(this.fd, () => this.emit('close'))
    }
  })
}

Each line of code above has comments, I believe it is not difficult to understand, here are a few key points to pay attention to

  • You must wait for the file to be opened before you can start reading the file, but file opening is an asynchronous operation, and we don't know the specific opening completion time, so we will trigger an on('open')event opentrigger re-call againread()

  • fs.read()The method has been mentioned before, you can take a look at the core method of handwritten

  • this.flowingAttributes are used to determine whether it is fluid, and will be controlled by the corresponding pasue()method and resume(). Let's take a look at these two methods.

pause

pause() {
  this.flowing =false
}

resume

resume() {
  if (!this.flowing) {
    this.flowing = true
    this.read()
  }
}

full code

const { EventEmitter } = require('events')
const fs = require('fs')

class ReadStream extends EventEmitter {
  constructor(path, options = {}) {
    super()

    this.path = path
    this.flags = options.flags ?? 'r'
    this.encoding = options.encoding ?? 'utf8'
    this.autoClose = options.autoClose ?? true
    this.start = options.start ?? 0
    this.end = options.end ?? undefined
    this.highWaterMark = options.highWaterMark ?? 16 * 1024
    this.offset = this.start
    this.flowing = false

    this.open()

    this.on('newListener', (type) => {
      if (type === 'data') {
        this.flowing = true
        this.read()
      }
    })
  }

  open() {
    fs.open(this.path, this.flags, (err, fd) => {
      if (err) {
        this.emit('error', err)
        return
      }

      this.fd = fd
      this.emit('open')
    })
  }

  pause() {
    this.flowing =false
  }

  resume() {
    if (!this.flowing) {
      this.flowing = true
      this.read()
    }
  }

  read() {
    if (typeof this.fd !== 'number') {
      this.once('open', () => this.read())
      return
    }

    const buf = Buffer.alloc(this.highWaterMark)
    fs.read(this.fd, buf, 0, buf.length, this.offset, (err, bytesRead) => {
      this.offset += bytesRead
      if (bytesRead) {
        this.emit('data', buf.slice(0, bytesRead))
        this.flowing && this.read()
      } else {
        this.emit('end')
        this.autoClose && fs.close(this.fd, () => this.emit('close'))
      }
    })
  }
}

file writable stream

As the name suggests, write the content to the file bit by bit.

fs.write

  • fd: the file descriptor to be written

  • buffer: write the contents of the specified buffer to the file

  • offset: Specify the write position of the buffer (write the content read from the offset index of the buffer to the file)

  • length: specifies the number of bytes to write

  • position: the offset of the file (write from the position byte of the file)

How to use

// 使用方式 1:
const ws = fs.createWriteStream('./w-test.js')

// 使用方式 2:
const ws = new WriteStream('./w-test.js', {
  flags: 'w',
  encoding: 'utf8',
  autoClose: true,
  highWaterMark: 2
})

// 写入文件
const flag = ws.write('2')

ws.on('drain', () => console.log('drain'))
  • ws.write()write to the file. There is a return value here, representing whether the maximum cache has been reached. When we call multiple write()times , the file is not written immediately for each call, but only one write operation can be performed at the same time, so the rest will be written into the cache until the last write is completed Then fetch and execute them sequentially from the cache. Therefore, there will be a maximum cache size at this time, which is 64kb by default. The return value here represents whether it can continue to write, that is: whether the maximum cache has been reached. true means that writing can continue.

  • ws.on('drain'), if the call ws.write()returns false, the 'drain' event is fired when data can continue to be written to the stream.

Implementation of a file-writable stream

initialization

First define WriteStreamthe class and inherit it EventEmitter, then initialize the parameters. _Pay attention to the code comments_

const { EventEmitter } = require('events')
const fs = require('fs')

class WriteStream extends EventEmitter {
  constructor(path, options = {}) {
    super()

    // 初始化参数
    this.path = path
    this.flags = options.flags ?? 'w'
    this.encoding = options.encoding ?? 'utf8'
    this.autoClose = options.autoClose ?? true
    this.highWaterMark = options.highWaterMark ?? 16 * 1024

    this.offset = 0 // 文件读取偏移量
    this.cache = [] // 缓存的要被写入的内容

    // 将要被写入的总长度,包括缓存中的内容长度
    this.writtenLen = 0

    // 是否正在执行写入操作,
    // 如果正在写入,那以后的操作需放入 this.cache
    this.writing = false

    // 是否应该触发 drain 事件
    this.needDrain = false

    // 打开文件
    this.open()
  }
}

open()

Same code as ReadStream.

open() {
  fs.open(this.path, this.flags, (err, fd) => {
    if (err) {
      this.emit('error', err)
      return
    }

    this.fd = fd
    this.emit('open')
  })
}

write()

perform write operation

write(chunk, encoding, cb = () => {}) {
  // 初始化被写入的内容
  // 如果时字符串,则转为 buffer
  chunk = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk, encoding)
  // 计算要被写入的长度
  this.writtenLen += chunk.length
  // 判断是否已经超过 highWaterMark
  const hasLimit = this.writtenLen >= this.highWaterMark

  // 是否需要触发 drain
  // 如果超过 highWaterMark,则代表需要触发
  this.needDrain = hasLimit

  // 如果没有正在写入的内容,则调用 _write 直接开始写入
  // 否则放入 cache 中
  // 写入完成后,调用 clearBuffer,从缓存中拿取最近一次内容开始写入
  if (!this.writing) {
    this.writing = true
    this._write(chunk, () => {
      cb()
      this.clearBuffer()
    })
  } else {
    this.cache.push({
      chunk: chunk,
      cb
    })
  }

  return !hasLimit
}

// 写入操作
_write(chunk, cb) {
  if (typeof this.fd !== 'number') {
    this.once('open', () => this._write(chunk, cb))
    return
  }

  // 写入文件
  fs.write(this.fd, chunk, 0, chunk.length, this.offset, (err, bytesWritten) => {
    if (err) {
      this.emit('error', err)
      return
    }

    // 计算偏移量
    this.offset += bytesWritten
    // 写入完毕,则减去当前写入的长度
    this.writtenLen -= bytesWritten
    cb()
  })
}
  1. First initialize the content to be written, only support buffer and string, if it is a string, it will be directly converted to buffer.

  2. Calculate the total length to be written, i.e.this.writtenLen += chunk.length

  3. Determine whether the highWaterMark has been exceeded

  4. Determine whether to trigger drain

  5. Determine whether there is already content being written, if not, call _write()directly to write, if there is, put it in the cache. When is finished _write()writing , call clearBuffer()the method to fetch the first cached content this.cachefrom and write. The clearBuffer method looks like this

clearBuffer()

clearBuffer() {
  // 取出缓存
  const data = this.cache.shift()
  if (data) {
    const { chunk, cb } = data
    // 继续进行写入操作
    this._write(chunk, () => {
      cb()
      this.clearBuffer()
    })
    return
  }

  // 触发 drain
  this.needDrain && this.emit('drain')
  // 写入完毕,将writing置为false
  this.writing = false
  // needDrain 置为 false
  this.needDrain = false
}

full code

const { EventEmitter } = require('events')
const fs = require('fs')

class WriteStream extends EventEmitter {
  constructor(path, options = {}) {
    super()

    this.path = path
    this.flags = options.flags ?? 'w'
    this.encoding = options.encoding ?? 'utf8'
    this.autoClose = options.autoClose ?? true
    this.highWaterMark = options.highWaterMark ?? 16 * 1024

    this.offset = 0
    this.cache = []
    this.writtenLen = 0
    this.writing = false
    this.needDrain = false

    this.open()
  }

  open() {
    fs.open(this.path, this.flags, (err, fd) => {
      if (err) {
        this.emit('error', err)
        return
      }

      this.fd = fd
      this.emit('open')
    })
  }

  clearBuffer() {
    const data = this.cache.shift()
    if (data) {
      const { chunk, cb } = data
      this._write(chunk, () => {
        cb()
        this.clearBuffer()
      })
      return
    }

    this.needDrain && this.emit('drain')
    this.writing = false
    this.needDrain = false
  }

  write(chunk, encoding, cb = () => {}) {
    chunk = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk, encoding)
    this.writtenLen += chunk.length
    const hasLimit = this.writtenLen >= this.highWaterMark
    this.needDrain = hasLimit

    if (!this.writing) {
      this.writing = true
      this._write(chunk, () => {
        cb()
        this.clearBuffer()
      })
    } else {
      this.cache.push({
        chunk: chunk,
        cb
      })
    }

    return !hasLimit
  }

  _write(chunk, cb) {
    if (typeof this.fd !== 'number') {
      this.once('open', () => this._write(chunk, cb))
      return
    }

    fs.write(this.fd, chunk, 0, chunk.length, this.offset, (err, bytesWritten) => {
      if (err) {
        this.emit('error', err)
        return
      }

      this.offset += bytesWritten
      this.writtenLen -= bytesWritten
      cb()
    })
  }
}

- END -

About Qi Wu Troupe

Qi Wu Troupe is the largest front-end team of 360 Group, and participates in the work of W3C and ECMA members (TC39) on behalf of the group. Qi Wu Troupe attaches great importance to talent training, and has various development directions such as engineers, lecturers, translators, business interface people, and team leaders for employees to choose from, and provides corresponding technical, professional, general, and leadership training course. Qi Dance Troupe welcomes all kinds of outstanding talents to pay attention to and join Qi Dance Troupe with an open and talent-seeking attitude.

2203fa2bd17263d2290216558a8b9f50.png

Guess you like

Origin blog.csdn.net/qiwoo_weekly/article/details/130537865