Kotlin lazy collection operation - sequence Sequence

Aggregate Operation Functions and Sequences

Before understanding the Kotlin lazy collection, let's take a look at some collection operation functions in the Kotlin annotation library.

Define a data model Person and Book classes:

data class Person(val name: String, val age: Int) 

data class Book(val title: String, val authors: List<String>)

filter and map operations:

    val people = listOf<Person>(
            Person("xiaowang", 30),
            Person("xiaozhang", 32),
            Person("xiaoli", 28)
        )
        //大于 30 岁的人的名字集合列表
     people.filter { it.age >= 30 }.map(Person::name)

count operation:

   val people = listOf<Person>(
            Person("xiaowang", 30),
            Person("xiaozhang", 32),
            Person("xiaoli", 28)
        )
        //小于 30 岁人的个数
   people.count { it.age < 30 }

flatmap operation:

      val books = listOf<Book>(
            Book("Java 语言程序设计", arrayListOf("xiaowang", "xiaozhang")),
            Book("Kotlin 语言程序设计", arrayListOf("xiaoli", "xiaomao")),
        )
        // 所有书的名字集合列表
        books.flatMap { it.authors }.toList()

In the above functions, each step of operation will create an intermediate collection, that is, the intermediate results of each step will be temporarily stored in a temporary collection .

filter function source code:

public inline fun <T> Iterable<T>.filter(predicate: (T) -> Boolean): List<T> {
    //创建一个新的集合列表
    return filterTo(ArrayList<T>(), predicate)
}

public inline fun <T, C : MutableCollection<in T>> Iterable<T>.filterTo(destination: C, predicate: (T) -> Boolean): C {
    for (element in this) if (predicate(element)) destination.add(element)
    return destination
}

Source code of map function:

public inline fun <T, R> Iterable<T>.map(transform: (T) -> R): List<R> {
    //创建一个新的集合列表
    return mapTo(ArrayList<R>(collectionSizeOrDefault(10)), transform)
}

public inline fun <T, R, C : MutableCollection<in R>> Iterable<T>.mapTo(destination: C, transform: (T) -> R): C {
    for (item in this)
        destination.add(transform(item))
    return destination
}

If there are too many elements to be operated, assuming that there are more than 50 or 100 people or books, then the function chain call 如:fliter{}.map{}will become inefficient and waste memory.

To solve the above problems, Kotlin provides a lazy collection operation Sequenceinterface . This interface represents a list of elements that can be enumerated one by one. Sequence provides only one method, iterator, to get values ​​from the sequence.

public interface Sequence<out T> {
    /**
     * Returns an [Iterator] that returns the values from the sequence.
     *
     * Throws an exception if the sequence is constrained to be iterated once and `iterator` is invoked the second time.
     */
    public operator fun iterator(): Iterator<T>
}

public inline fun <T> Sequence(crossinline iterator: () -> Iterator<T>): Sequence<T> = object : Sequence<T> {
    override fun iterator(): Iterator<T> = iterator()
}

/**
 * Creates a sequence that returns all elements from this iterator. The sequence is constrained to be iterated only once.
 *
 * @sample samples.collections.Sequences.Building.sequenceFromIterator
 */
public fun <T> Iterator<T>.asSequence(): Sequence<T> = Sequence { this }.constrainOnce()

Element evaluation in a sequence is lazy. Therefore, sequences can be used to perform chained operations on collection elements more efficiently without the need to create additional collections to hold intermediate results produced during the process . How this inertia comes about will be explained in detail later.

You can call the extension function asSequence to convert any collection into a sequence, and call toList to do the reverse conversion.

 val people = listOf<Person>(
            Person("xiaowang", 30),
            Person("xiaozhang", 32),
            Person("xiaoli", 28)
        )
  people.asSequence().filter { it.age >= 30 }.map(Person::name).toList()
 val books = listOf<Book>(
            Book("Java 语言程序设计", arrayListOf("xiaowang", "xiaozhang")),
            Book("Kotlin 语言程序设计", arrayListOf("xiaoli", "xiaomao")),
        )
 books.asSequence().flatMap { it.authors }.toList()

Sequence middle and end operations

Sequence operations fall into two categories: intermediate and terminal. An intermediate operation returns another sequence that knows how to transform the elements of the original sequence. The one-time terminal returns a result, which may be a collection, element, number, or any other object obtained from the transformation sequence of the initial collection.

Intermediate operations are always lazy .

Let's understand this inertia from an example :

listOf(1, 2, 3, 4).asSequence().map {
            println("map${it}")
            it * it
        }.filter {
            println("filter${it}")
            it % 2 == 0
        }

The above code will not output anything to the console (because there is no terminal operation).

listOf(1, 2, 3, 4).asSequence().map {
            println("map${it}")
            it * it
        }.filter {
            println("filter${it}")
            it % 2 == 0
        }.toList()
 
 控制台输出:
2023-01-01 20:23:05.071 17000-17000/com.wangjiang.example D/TestSequence: map1
2023-01-01 20:23:05.071 17000-17000/com.wangjiang.example D/TestSequence: filter1
2023-01-01 20:23:05.071 17000-17000/com.wangjiang.example D/TestSequence: map2
2023-01-01 20:23:05.071 17000-17000/com.wangjiang.example D/TestSequence: filter4
2023-01-01 20:23:05.071 17000-17000/com.wangjiang.example D/TestSequence: map3
2023-01-01 20:23:05.071 17000-17000/com.wangjiang.example D/TestSequence: filter9
2023-01-01 20:23:05.071 17000-17000/com.wangjiang.example D/TestSequence: map4
2023-01-01 20:23:05.071 17000-17000/com.wangjiang.example D/TestSequence: filter16

At the time .toList()of , mapthe and filtertransformation is performed, and the elements are performed one by one. Not all elements are executed after the map operation is completed, and then the filter operation is performed.

Why elements are executed one by one, first look at toList()the method :

public fun <T> Sequence<T>.toList(): List<T> {
    return this.toMutableList().optimizeReadOnlyList()
}

public fun <T> Sequence<T>.toMutableList(): MutableList<T> {
    return toCollection(ArrayList<T>())
}

public fun <T, C : MutableCollection<in T>> Sequence<T>.toCollection(destination: C): C {
    for (item in this) {
        destination.add(item)
    }
    return destination
}

In the last toCollectionmethod for (item in this), it is actually calling the iterator Sequencein Iteratorto perform element iteration. Among them, this thiscomes filter, that is, filteruses Iteratorthe element iteration of . Take a look filter:

public fun <T> Sequence<T>.filter(predicate: (T) -> Boolean): Sequence<T> {
    return FilteringSequence(this, true, predicate)
}

internal class FilteringSequence<T>(
    private val sequence: Sequence<T>,
    private val sendWhen: Boolean = true,
    private val predicate: (T) -> Boolean
) : Sequence<T> {

    override fun iterator(): Iterator<T> = object : Iterator<T> {
        val iterator = sequence.iterator()
        var nextState: Int = -1 // -1 for unknown, 0 for done, 1 for continue
        var nextItem: T? = null

        private fun calcNext() {
            while (iterator.hasNext()) {
                val item = iterator.next()
                if (predicate(item) == sendWhen) {
                    nextItem = item
                    nextState = 1
                    return
                }
            }
            nextState = 0
        }

        override fun next(): T {
            if (nextState == -1)
                calcNext()
            if (nextState == 0)
                throw NoSuchElementException()
            val result = nextItem
            nextItem = null
            nextState = -1
            @Suppress("UNCHECKED_CAST")
            return result as T
        }

        override fun hasNext(): Boolean {
            if (nextState == -1)
                calcNext()
            return nextState == 1
        }
    }
}

filterwill use the Sequenceprevious sequence.iterator()element to iterate. Look again map:

public fun <T, R> Sequence<T>.map(transform: (T) -> R): Sequence<R> {
    return TransformingSequence(this, transform)
}

internal class TransformingSequence<T, R>
constructor(private val sequence: Sequence<T>, private val transformer: (T) -> R) : Sequence<R> {
    override fun iterator(): Iterator<R> = object : Iterator<R> {
        val iterator = sequence.iterator()
        override fun next(): R {
            return transformer(iterator.next())
        }

        override fun hasNext(): Boolean {
            return iterator.hasNext()
        }
    }

    internal fun <E> flatten(iterator: (R) -> Iterator<E>): Sequence<E> {
        return FlatteningSequence<T, R, E>(sequence, transformer, iterator)
    }
}

Also use the Sequenceprevious sequence.iterator()element iteration. So by analogy, the source asSequence()converted iterator().

Let's customize a Sequenceto verify the above conjecture:

listOf(1, 2, 3, 4).asSequence().mapToString {
            Log.d("TestSequence","mapToString${it}")
            it.toString()
        }.toList()

    fun <T> Sequence<T>.mapToString(transform: (T) -> String): Sequence<String> {
        return TransformingStringSequence(this, transform)
    }

    class TransformingStringSequence<T>
    constructor(private val sequence: Sequence<T>, private val transformer: (T) -> String) : Sequence<String> {
        override fun iterator(): Iterator<String> = object : Iterator<String> {
            val iterator = sequence.iterator()
            override fun next(): String {
                val next = iterator.next()
                Log.d("TestSequence","next:${next}")
                return transformer(next)
            }

            override fun hasNext(): Boolean {
                return iterator.hasNext()
            }
        }
    }

控制台输出:
2023-01-01 20:43:43.899 21797-21797/com.wangjiang.example D/TestSequence: next:1
2023-01-01 20:43:43.899 21797-21797/com.wangjiang.example D/TestSequence: mapToString1
2023-01-01 20:43:43.899 21797-21797/com.wangjiang.example D/TestSequence: next:2
2023-01-01 20:43:43.899 21797-21797/com.wangjiang.example D/TestSequence: mapToString2
2023-01-01 20:43:43.899 21797-21797/com.wangjiang.example D/TestSequence: next:3
2023-01-01 20:43:43.899 21797-21797/com.wangjiang.example D/TestSequence: mapToString3
2023-01-01 20:43:43.899 21797-21797/com.wangjiang.example D/TestSequence: next:4
2023-01-01 20:43:43.899 21797-21797/com.wangjiang.example D/TestSequence: mapToString4

So this is Sequencewhy is applied when the result is obtained, that is, when the terminal operation is called, each element is processed in turn, which is why it is called a lazy collection operation.

After a series of sequence operations, each element is processed one by one, so prioritizing filterthe sequence can actually reduce the total number of transformations. Because each sequence sequence.iterator()uses to iterate.

create sequence

On collection operations, you can use collections to directly call asSequence()Convert to Sequence. So it is not a collection, there is a transformation similar to a collection, how to operate it.

The following is an example of finding the sum of all natural numbers from 1 to 100:

val naturalNumbers = generateSequence(0) { it + 1 }
val numbersTo100 = naturalNumbers.takeWhile { it <= 100 }
val sum = numbersTo100.sum()
println(sum)
控制台输出:
5050

First look at the generateSequencesource code :

public fun <T : Any> generateSequence(seed: T?, nextFunction: (T) -> T?): Sequence<T> =
    if (seed == null)
        EmptySequence
    else
        GeneratorSequence({ seed }, nextFunction)

private class GeneratorSequence<T : Any>(private val getInitialValue: () -> T?, private val getNextValue: (T) -> T?) : Sequence<T> {
    override fun iterator(): Iterator<T> = object : Iterator<T> {
        var nextItem: T? = null
        var nextState: Int = -2 // -2 for initial unknown, -1 for next unknown, 0 for done, 1 for continue

        private fun calcNext() {
            //getInitialValue 获取的到就是 generateSequence 的第一个参数 0
            //getNextValue 获取到的就是 generateSequence 的第二个参数 it+1,这个it 就是 nextItem!!
            nextItem = if (nextState == -2) getInitialValue() else getNextValue(nextItem!!)
            nextState = if (nextItem == null) 0 else 1
        }

        override fun next(): T {
            if (nextState < 0)
                calcNext()

            if (nextState == 0)
                throw NoSuchElementException()
            val result = nextItem as T
            // Do not clean nextItem (to avoid keeping reference on yielded instance) -- need to keep state for getNextValue
            nextState = -1
            return result
        }

        override fun hasNext(): Boolean {
            if (nextState < 0)
                calcNext()
            return nextState == 1
        }
    }
}

The above code is actually to create an Sequenceinterface implementation class, implement its iteratorinterface method, and return an Iteratoriterator .

public fun <T> Sequence<T>.takeWhile(predicate: (T) -> Boolean): Sequence<T> {
    return TakeWhileSequence(this, predicate)
}

internal class TakeWhileSequence<T>
constructor(
    private val sequence: Sequence<T>,
    private val predicate: (T) -> Boolean
) : Sequence<T> {
    override fun iterator(): Iterator<T> = object : Iterator<T> {
        val iterator = sequence.iterator()
        var nextState: Int = -1 // -1 for unknown, 0 for done, 1 for continue
        var nextItem: T? = null

        private fun calcNext() {
            if (iterator.hasNext()) {
                //iterator.next() 调用的就是上一个 GeneratorSequence 的 next 方法,而返回值就是它的 it+1
                val item = iterator.next()
                //判断条件,也就是 it <= 100 -> item <= 100
                if (predicate(item)) {
                    nextState = 1
                    nextItem = item
                    return
                }
            }
            nextState = 0
        }

        override fun next(): T {
            if (nextState == -1)
                calcNext() // will change nextState
            if (nextState == 0)
                throw NoSuchElementException()
            @Suppress("UNCHECKED_CAST")
            val result = nextItem as T

            // Clean next to avoid keeping reference on yielded instance
            nextItem = null
            nextState = -1
            return result
        }

        override fun hasNext(): Boolean {
            if (nextState == -1)
                calcNext() // will change nextState
            return nextState == 1
        }
    }
}

TakeWhileSequenceIn nextthe method of , the internal method will be called first calcNext, and the method of this method is GeneratorSequencecalled next, so that the current value it+1 (the previous one is 0+1, and the next one is 1+1), after getting the value Judge again it <= 100 -> item <= 100.

public fun Sequence<Int>.sum(): Int {
    var sum: Int = 0
    for (element in this) {
        sum += element
    }
    return sum
}

sumThe method is the terminal operation of the sequence, which is to obtain the result. for (element in this) , call the iterator in the Sequenceprevious Iteratorto iterate over the elements, and so on, until the iterator Sequencein Iteratorto iterate over the elements.

Summarize

The collection operation functions provided by the Kotlin standard library: filter, map, flatmap, etc., will create a temporary list to store intermediate results during operation. When there are many collection elements, this chain operation will become inefficient. In order to solve this problem, Kotlin provides a lazy collection operation Sequenceinterface . Only when the terminal operation is called, that is, when the result is obtained, the elements in the sequence will be executed one by one. After the first element is processed, the Process the second element so that intermediate operations are deferred. And because each element is executed sequentially, the filter transformation can be done first, and then the map transformation can be done, which helps to reduce the total number of transformations.

Guess you like

Origin blog.csdn.net/wangjiang_qianmo/article/details/128513130