Java8新特性---Stream（流）

Stream

流是Java API 的新成员，它允许你以声明性方式处理数据集合。你可以把他看成遍历数据集的高级迭代器，他其实是一连串支持连续、并行聚集操作的元素。同样，流还可以透明的并行处理。

流和简单实例

public class Dish {

    private final String name;
    private final boolean vegetarian;
    private final int calories;
    private final Type type;

    public Dish(String name, boolean vegetarian, int calories, Type type) {
        super();
        this.name = name;
        this.vegetarian = vegetarian;
        this.calories = calories;
        this.type = type;
    }

    public String getName() {
        return name;
    }

    public boolean isVegetarian() {
        return vegetarian;
    }

    public int getCalories() {
        return calories;
    }

    public Type getType() {
        return type;
    }

    public enum Type {
        MEAT, FISH, OTHER;
    }
}

public static List<Dish> menu = Arrays.asList(
            new Dish("pork", false, 800, Dish.Type.MEAT),
            new Dish("beaf", false, 700, Dish.Type.MEAT), 
            new Dish("chicken", false, 400, Dish.Type.MEAT),
            new Dish("french fries", true, 530, Dish.Type.OTHER), 
            new Dish("rice", true, 350, Dish.Type.OTHER),
            new Dish("season fruit", true, 120, Dish.Type.OTHER), 
            new Dish("pizza", true, 550, Dish.Type.OTHER),
            new Dish("prawns", false, 300, Dish.Type.FISH), 
            new Dish("salmon", false, 450, Dish.Type.FISH));

目标:获得卡路里大于300的三种菜名

public class Client {

    public static List<Dish> menu = Arrays.asList(
            new Dish("pork", false, 800, Dish.Type.MEAT),
            new Dish("beaf", false, 700, Dish.Type.MEAT), 
            new Dish("chicken", false, 400, Dish.Type.MEAT),
            new Dish("french fries", true, 530, Dish.Type.OTHER), 
            new Dish("rice", true, 350, Dish.Type.OTHER),
            new Dish("season fruit", true, 120, Dish.Type.OTHER), 
            new Dish("pizza", true, 550, Dish.Type.OTHER),
            new Dish("prawns", false, 300, Dish.Type.FISH), 
            new Dish("salmon", false, 450, Dish.Type.FISH));

    public static List<String> filterDishName() {
        List<String> dishNames = 
                menu.stream()                               \\ 获取流
                .filter(dish -> dish.getCalories() > 300)   \\ 筛选大于300
                .map(Dish::getName)                         \\ 获取菜名
                .limit(3)                                   \\ 取三位
                .collect(Collectors.toList());              \\ 终端转换为列表
        return dishNames;
    }

    public static void main(String[] args){
        filterDishName().forEach(System.out::println);
    }

}

// console输出
pork
beaf
chicken

通过流的使用，很简单的就能完成从前需要一大段代码才能实现的功能。

构建流

值创建流

Stream<String> stream = Stream.of("hello","world");

数组创建流

String[] s = {"hello","world"};
Arrays.stream(s);

文件生成流

// 查看文件有多少不重复的词
try {
            Files.lines(Paths.get("data.txt"))
                .flatMap(line -> Arrays.stream(line.split(" ")))
                .distinct()
                .count();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

函数生成无限流

Stream.iterate(0, i -> i+2)
            .limit(10);
Stream.generate(Math::random)
            .limit(10);

使用流

筛选

menu.stream()
            .filter(dish -> dish.getCalories() > 300)   \\ 按条件筛选
            .distinct()                                 \\ 选出不同的元素
            .skip(2L)                                   \\ 跳过两个 
            .limit(3L);                                 \\ 只取三个

映射

流映射

<R> Stream<R> map(Function<? super T, ? extends R> mapper);

参数为Function，映射后转换成另一种类型返回

menu.stream()
            .map(Dish::getName)
            .limit(3)
            .collect(Collectors.toList());

流的扁平化

流的扁平化其实就是将转化出来的多个流扁平到同一个流中进行处理。

示例：从字符串数组取出所有不重复的字母

public static void main(String[] args){
    String[] arrays = {"hello", "world"};
    List<String> list = Arrays.stream(arrays)
            .map(word -> word.split(""))        // 转换成字母数组
            .flatMap(Arrays::stream)            // 将两个数组流合成一个数组流
            .distinct()                         // 取唯一
            .collect(Collectors.toList());
    list.forEach(e ->  System.out.print(e));
}

流转换过程

查找

任意匹配

boolean anyMatch(Predicate<? super T> predicate);

所有匹配

boolean anyMatch(Predicate<? super T> predicate);

没有匹配

boolean noneMatch(Predicate<? super T> predicate);

查找任意一个元素
```
Optional<T> findAny();
```
查找第一个
```
Optional<T> findFirst();
```
和 findAny 差别在于，如果并行运行不在意顺序可以使用 findAny，如果在意顺序则是 findFirst。

归约

Lambda反复结合每个元素，直到流被归约成一个值

T reduce(T identity, BinaryOperator<T> accumulator);

实例：将卡路里大于300的菜连接在一起组成一个字符串

public static void main(String[] args){
        String dishNames = menu.stream()
            .filter(dish -> dish.getCalories() > 300)
            .map(Dish::getName)
            .reduce("", (a,b) -> {
                if("".equals(a)){
                    return b;
                } else {
                    return a+","+b;
                }
            });
        System.out.println(dishNames);
    }

数值流

由于流中处理数值数据，必须要将数值装箱成Integer，处理时转化成int拆箱计算。无疑是对性能的浪费，所以就有了专门三个原始类型特化流接口：IntStream,DoubleStream,LongStream。

映射到数值流

menu.stream()
.mapToInt(Dish::getCalories);

转换为对象流

menu.stream()
.mapToInt(Dish::getCalories)
.boxed();

数值范围
```
IntStream.rangeClosed(1, 100);
```

用流收集数据

前篇都是如何操作流，这些操作不会消耗流，目的是建立一条流水线。而最后的终端操作目的是消耗流，最终得到一个结果。

<R, A> R collect(Collector<? super T, A, R> collector);

首先，抛开这个方法，我们先看看jdk为我们提供的几个常用的汇总收集器：

归约汇总

数值汇总

Collectors.summingInt(ToIntFunction<? super T> mapper)

int sumCalories = 
                menu.stream()
                .filter(dish -> dish.getCalories() > 300)
                .limit(3)
                .collect(Collectors.summingInt(Dish::getCalories));

连接字符串

Collector<CharSequence, ?, String> joining(CharSequence delimiter,
                                                             CharSequence prefix,
                                                             CharSequence suffix)

String dishNames = 
                menu.stream()
                .filter(dish -> dish.getCalories() > 300)
                .map(Dish::getName)
                .limit(3)
                .collect(Collectors.joining(","));

分组

分组的概念也很清晰，类似于数据库中使用groupby根据属性对流进行分组的做法。

Map<Type, List<Dish>> maps = 
            menu.stream()
            .collect(Collectors.groupingBy(Dish::getType));

对菜根据类型分组，得到Map。

自定义分组：

Map<String, List<Dish>> maps = 
                    menu.stream().collect(Collectors.groupingBy(dish -> {
                        if( dish.getCalories() >500) 
                            return "big";
                        else 
                            return "ok";
            }));

多层分组:

Map<Type, Map<String,List<Dish>>> maps = 
                    menu.stream().collect(Collectors.groupingBy(Dish::getType,
                            Collectors.groupingBy(dish -> {
                                if( dish.getCalories() >500) 
                                    return "big";
                                else 
                                    return "ok";
                            })

                            ));

分区

和分组概念差别在于使用true，false区分分组差异

Map<Boolean, List<Dish>> maps = 
                    menu.stream().collect(Collectors.partitioningBy(Dish::isVegetarian));

自定义收集器

虽然jdk为我们提供了常用的收集器实现，但是如果需要定制特殊功能实现，仍然需要自定义收集器。那么接下来看看如何实现Collector接口。

/**
* T代表流中收集的项目的泛型
* A是累加器的类型
* R是收集操作得到的对象
*/
public interface Collector<T, A, R> {

    Supplier<A> supplier();

    BiConsumer<A, T> accumulator();

    BinaryOperator<A> combiner();

    Function<A, R> finisher();

    Set<Characteristics> characteristics();
}

所以接下来实现一个简单的收集器，将流中数据收集在一起，放在列表中。实现后，也可以跟Collectors.toList()对照，比较实现过程如何。

public class ToListCollector<T> implements Collector<T, List<T>, List<T>> {

    // 创建空的累加器实例，数据收集时使用此实例。
    // 因此在这个收集列表的收集器中，应当创建一个列表
    @Override
    public Supplier<List<T>> supplier() {
        // return () -> new ArrayList<>();
        return ArrayList::new;
    }

    // 执行归约操作，当遍历到第n个元素时，参数为已经累加n-1次的累加器和第n个元素
    // 因此在当前收集器中，使用列表添加即可。
    @Override
    public BiConsumer<List<T>, T> accumulator() {
        // return (list, t) -> list.add(t);
        return List::add;
    }

    // 对两个并行的流如何归约
    // 当前只要将一个列表添加到另一个即可。
    @Override
    public BinaryOperator<List<T>> combiner() {
        return (list1, list2) -> {
            list1.addAll(list2);
            return list1;
        };
    }

    // 所有元素归约后，进行的转换操作
    // 当前收集器，收集成list即可，无需转换
    @Override
    public Function<List<T>, List<T>> finisher() {
        // return Function.identity();
        return (list) -> list;
    }

    // 定义收集器的行为，关于流是否可以归约等
    // UNORDERED 归约不受项目遍历和累积顺序影响
    // CONCURRENT 可以并行归约
    // IDENTITY_FINISH 便是是恒等函数，无需最后的转换函数。
    @Override
    public Set<java.util.stream.Collector.Characteristics> characteristics() {
        // TODO Auto-generated method stub
        return Collections.unmodifiableSet(
                EnumSet.of(Collector.Characteristics.IDENTITY_FINISH, 
                        Collector.Characteristics.CONCURRENT));
    }

}

测试示例：

public static void main(String[] args){
        List<Dish> dishes = 
        menu.stream()
            .filter(dish -> dish.getCalories() > 300)
            .collect(new ToListCollector<>());
        dishes.forEach(System.out::println);
    }

运行结果也没有问题，实现成功。

并行流

简单的转换

public static long one(long n){
        return LongStream.rangeClosed(1L, n)
                .limit(n)
                .reduce(0l, Long::sum);
    }

    public static long parallel(long n){
        return LongStream.rangeClosed(1L, n)
                .limit(n)
                .parallel()
                .reduce(0l, Long::sum);
    }

使用并行流计算和比单个流计算和的比较。

不过实际使用中，需要考虑流本身实现是否适合并行拆分计算，才能充分发挥并行流并行的优势。