Java Stream `generate()` how to "include" the first "excluded" element

Luca Abbati :

Assume this usage scenario for a Java stream, where data is added from a data source. Data source can be a list of values, like in the example below, or a paginated REST api. It doesn't matter, at the moment.

import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Stream;

public class Main {
    public static void main(String[] args) {
        final List<Boolean> dataSource = List.of(true, true, true, false, false, false, false);
        final AtomicInteger index = new AtomicInteger();
        Stream
            .generate(() -> {
                boolean value = dataSource.get(index.getAndIncrement());
                System.out.format("--> Executed expensive operation to retrieve data: %b\n", value);
                return value;
            })
            .takeWhile(value -> value == true)
            .forEach(data -> System.out.printf("--> Using: %b\n", data));
    }
}

If you run this code your output will be

--> Executed expensive operation to retrieve data: true
--> Using: true
--> Executed expensive operation to retrieve data: true
--> Using: true
--> Executed expensive operation to retrieve data: true
--> Using: true
--> Executed expensive operation to retrieve data: false

As you can see the last element, the one that evaluated to false, did not get added to the stream, as expected.

Now assume that the generate() method loads pages of data from a REST api. In that case the value true/false is a value on page N indicating if page N + 1 exists, something like a has_more field. Now, I want the last page returned by the API to be added to the stream, but I do not want to perform another expensive operation to read an empty page, because I already know that there are no more pages.

What is the most idiomatic way to do this using the Java Stream API? Every workaround I can think of requires a call to the API to be executed.


UPDATE

In addition to the approaches listed in Inclusive takeWhile() for Streams there is another ugly way to achieve this.

public static void main(String[] args) {
    final List<Boolean> dataSource = List.of(true, true, true, false, false, false, false);
    final AtomicInteger index = new AtomicInteger();
    final AtomicBoolean hasMore = new AtomicBoolean(true);
    Stream
        .generate(() -> {
            if (!hasMore.get()) {
                return null;
            }
            boolean value = dataSource.get(index.getAndIncrement());
            hasMore.set(value);
            System.out.format("--> Executed expensive operation to retrieve data: %b\n", value);
            return value;
        })
        .takeWhile(Objects::nonNull)
        .forEach(data -> System.out.printf("--> Using: %b\n", data));
}
Holger :

You are using the wrong tool for your job. As already noticable in your code example, the Supplier passed to Stream.generate has to go great lengths to maintain the index it needs for fetching pages.

What makes matters worse, is that Stream.generate creates an unordered Stream:

Returns an infinite sequential unordered stream where each element is generated by the provided Supplier. This is suitable for generating constant streams, streams of random elements, etc.

You’re not returning constant or random values nor anything else that would be independent of the order.

This has a significant impact on the semantics of takeWhile:

Otherwise returns, if this stream is unordered, a stream consisting of a subset of elements taken from this stream that match the given predicate.

This makes sense if you think about it. If there is at least one element rejected by the predicate, it could be encountered at an arbitrary position for an unordered stream, so an arbitrary subset of elements encountered before it, including the empty set, would be a valid prefix.

But since there is no “before” or “after” for an unordered stream, even elements produced by the generator after the rejected one could be included by the result.

In practice, you are unlikely to encounter such effects for a sequential stream, but it doesn’t change the fact that Stream.generate(…) .takeWhile(…) is semantically wrong for your task.


From your example code, I conclude that pages do not contain their own number nor a "getNext" method, so we have to maintain the number and the "hasNext" state for creating a stream.

Assuming an example setup like

class Page {
    private String data;
    private boolean hasNext;

    public Page(String data, boolean hasNext) {
        this.data = data;
        this.hasNext = hasNext;
    }

    public String getData() {
        return data;
    }

    public boolean hasNext() {
        return hasNext;
    }

}
private static String[] SAMPLE_PAGES = { "foo", "bar", "baz" };
public static Page getPage(int index) {
    Objects.checkIndex(index, SAMPLE_PAGES.length);
    return new Page(SAMPLE_PAGES[index], index + 1 < SAMPLE_PAGES.length);
}

You can implement a correct stream like

Stream.iterate(Map.entry(0, getPage(0)), Objects::nonNull,
        e -> e.getValue().hasNext()? Map.entry(e.getKey()+1, getPage(e.getKey()+1)): null)
    .map(Map.Entry::getValue)
    .forEach(page -> System.out.println(page.getData()));

Note that Stream.iterate creates an ordered stream:

Returns a sequential ordered Stream produced by iterative application of the given next function to an initial element, conditioned on satisfying the given hasNext predicate.

Of course, things would be much easier if the page knew its own number, e.g.

Stream.iterate(getPage(0), Objects::nonNull,
               p -> p.hasNext()? getPage(p.getPageNumber()+1): null)
    .forEach(page -> System.out.println(page.getData()));

or if there was a method to get from an existing Page to the next Page, e.g.

Stream.iterate(getPage(0), Objects::nonNull, p -> p.hasNext()? p.getNextPage(): null)
    .forEach(page -> System.out.println(page.getData()));

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=131882&siteId=1