从源码看Android常用的数据结构 ( 二, List篇 )

总述

List 接口的官方注释

A {@code List} is a collection which maintains an ordering for its elements. Every
element in the {@code List} has an index. Each element can thus be accessed by its
index, with the first index being zero. Normally, {@code List}s allow duplicate
elements, as compared to Sets, where elements have to be unique.

List是一个保持其元素顺序的集合. 每个List中的元素具有index索引, 每个元素都可以通过从0开始的index访问. 
通常, List允许元素重复, 而Sets则要求元素必须唯一.

实现 List 接口中日常使用比较多的类有以下几个:
ArrayList, LinkedList(也同时实现了Queue), Vector, CopyOnWriteArrayList, Stack.

大体比较下他们的不同, 具体的请看后面的单独分析

ArrayList

“啥, ArrayList还用得着分析?”
本来我也是这么想的, 后来看了下源码, 发现还有几处奇妙的地方, 摘录部分源码

public class ArrayList<E> extends AbstractList<E> implements Cloneable, Serializable, RandomAccess {
    private static final int MIN_CAPACITY_INCREMENT = 12;
    transient Object[] array;

    @Override public boolean add(E object) {
        Object[] a = array;
        int s = size;
        if (s == a.length) {
            Object[] newArray = new Object[s +
                    (s < (MIN_CAPACITY_INCREMENT / 2) ?
                     MIN_CAPACITY_INCREMENT : s >> 1)];
            System.arraycopy(a, 0, newArray, 0, s);
            array = a = newArray;
        }
        a[s] = object;
        size = s + 1;
        modCount++;
        return true;
    }

    @Override public E remove(int index) {
        Object[] a = array;
        int s = size;
        if (index >= s) {
            throwIndexOutOfBoundsException(index, s);
        }
        @SuppressWarnings("unchecked") E result = (E) a[index];
        System.arraycopy(a, index + 1, a, index, --s - index);
        a[s] = null;  // Prevent memory leak
        size = s;
        modCount++;
        return result;
    }

    private void writeObject(ObjectOutputStream stream) throws IOException {
        stream.defaultWriteObject();
        stream.writeInt(array.length);
        for (int i = 0; i < size; i++) {
            stream.writeObject(array[i]);
        }
    }

    private void readObject(ObjectInputStream stream) throws IOException, ClassNotFoundException {
        stream.defaultReadObject();
        int cap = stream.readInt();
        if (cap < size) {
            throw new InvalidObjectException(
                    "Capacity: " + cap + " < size: " + size);
        }
        array = (cap == 0 ? EmptyArray.OBJECT : new Object[cap]);
        for (int i = 0; i < size; i++) {
            array[i] = stream.readObject();
        }
    }   

    private class ArrayListIterator implements Iterator<E> {
        private int remaining = size;
        private int expectedModCount = modCount;

        public boolean hasNext() {
            return remaining != 0;
        }

        @SuppressWarnings("unchecked") public E next() {
            ArrayList<E> ourList = ArrayList.this;
            int rem = remaining;
            if (ourList.modCount != expectedModCount) {
                throw new ConcurrentModificationException();
            }
            if (rem == 0) {
                throw new NoSuchElementException();
            }
            remaining = rem - 1;
            return (E) ourList.array[removalIndex = ourList.size - rem];
        }

        public void remove() {
            Object[] a = array;
            int removalIdx = removalIndex;
            if (modCount != expectedModCount) {
                throw new ConcurrentModificationException();
            }
            if (removalIdx < 0) {
                throw new IllegalStateException();
            }
            System.arraycopy(a, removalIdx + 1, a, removalIdx, remaining);
            a[--size] = null;  // Prevent memory leak
            removalIndex = -1;
            expectedModCount = ++modCount;
        }
    }

从源码中可以看到几个值得注意的地方

ArrayList 使用 Object[] 存储数据, get 读取速度非常快;
add 方法中会有条件的进行扩容, 如果当前size<6, 就增加12个, 如果多于6个, 就增加原size的一半, 而每次扩容都要执行一次System.arraycopy, 这步操作耗时较多, 所以要求我们在使用ArrayList的时候尽量声明初始大小, 减少扩容次数;
每次 remove 必定执行 System.arraycopy, 所以如果需要频繁删减元素的话还是另请高明吧;
注意到 Object[] 前面增加了 transient 不允许 Serializable 默认序列化, 奇怪为什么最关键的数据集合不允许序列化. 原来后面使用了writeObject/readObject方法, 这两个方法会替代Serializable的默认序列化而进行自己的序列化操作. 那这样复杂的操作有什么意义呢? 这和ArrayList的扩容有关系, Object[] 扩容后会有很多空位置, 如果直接序列化可能会浪费大量的内存空间, 所以使用这种方式仅序列化实际存储的数据. 另外, writeObject/readObject是private的, 之所以能生效是因为反射机制(其实Serializable本身就是反射机制实现的, 所以自己序列化的话最好还是使用parcelable);
ArrayList迭代器使用一个 remaining 剩余size来实现 next 逻辑, 效率比较高, 但是遍历时还是不如只有一句 array[index] ( 还有比这更快的吗? ), 所以遍历ArrayList 时使用 for(int i=0; i<list.size(); i++) 比 for(Object o : list) 要快一些.

LinkedList

LinkedList is an implementation of {@link List}, backed by a doubly-linked list.
All optional operations including adding, removing, and replacing elements are supported.
<p>All elements are permitted, including null.
<p>This class is primarily useful if you need queue-like behavior. It may also be useful
as a list if you expect your lists to contain zero or one element, but still require the
ability to scale to slightly larger numbers of elements. In general, though, you should
probably use {@link ArrayList} if you don't need the queue-like behavior.

LinkedList 是由 Link 实现的一种双向链表, 支持包括增删改查等所有操作, 而且支持包括null在内的所有数据类型.
如果您需要类似于队列的行为, 这个类一般比较有用. 如果需要列表从一两个原色扩容到稍大的容量时也可以使用它.
但一般来说, 如果您不需要类似队列的行为, 那么您应该使用ArrayList.

LinkedList 作为一种链表, 核心就在于 Link 这个内部类

    private static final class Link<ET> {
        ET data;

        Link<ET> previous, next;

        Link(ET o, Link<ET> p, Link<ET> n) {
            data = o;
            previous = p;
            next = n;
        }
    }

每个 Link 在链表称为一个结点, 内部存有元素的数据, 以及该元素的前后两个元素( 可以前后双向查询, 所以叫双向表 ).
链式存储结构的每个结点在内存中自行分配, 无需像线性表一样按顺序排放, 这样就避免了导致线性表插入/删除效率低下的数组拷贝操作.
那么链表具体是怎么实现的呢, 看一下源码( 只简单摘录了实例化和基本的增删改查 )

    /**
    * list的大小
    */
    transient int size = 0;
    /**
    * 根节点, 各种操作均从根节点开始
    */
    transient Link<E> voidLink;

    /**
    * 构造方法, 实例化voidLink, 并把 previous/next 均指向其本身, 构成双向表
    */
    public LinkedList() {
        voidLink = new Link<E>(null, null, null);
        voidLink.previous = voidLink;
        voidLink.next = voidLink;
    }

   /**
 * 增加操作, 无需扩容, 只需要修改两个previous/next即可, 效率高
   */
    @Override
    public boolean add(E object) {
        Link<E> oldLast = voidLink.previous;
        Link<E> newLink = new Link<E>(object, oldLast, voidLink);
        voidLink.previous = newLink;
        oldLast.next = newLink;
        size++;
        modCount++;
        return true;
    }

    @Override
    public E remove(int location) {
        if (location >= 0 && location < size) {
            Link<E> link = voidLink;
            // 二分法, previous/next 双向查询
            if (location < (size / 2)) {
                for (int i = 0; i <= location; i++) {
                    link = link.next;
                }
            } else {
                for (int i = size; i > location; i--) {
                    link = link.previous;
                }
            }
            Link<E> previous = link.previous;
            Link<E> next = link.next;
            previous.next = next;
            next.previous = previous;
            size--;
            modCount++;
            return link.data;
        }
        throw new IndexOutOfBoundsException();
    }  

    @Override
    public E get(int location) {
        if (location >= 0 && location < size) {
            Link<E> link = voidLink;
            // 二分法, previous/next 双向查询
            if (location < (size / 2)) {
                for (int i = 0; i <= location; i++) {
                    link = link.next;
                }
            } else {
                for (int i = size; i > location; i--) {
                    link = link.previous;
                }
            }
            return link.data;
        }
        throw new IndexOutOfBoundsException();
    }

    @Override
    public E set(int location, E object) {
        if (location >= 0 && location < size) {
            Link<E> link = voidLink;
            // 二分法, previous/next 双向查询
            if (location < (size / 2)) {
                for (int i = 0; i <= location; i++) {
                    link = link.next;
                }
            } else {
                for (int i = size; i > location; i--) {
                    link = link.previous;
                }
            }
            E result = link.data;
            link.data = object;
            return result;
        }
        throw new IndexOutOfBoundsException();
    }  

     private static final class LinkIterator<ET> implements ListIterator<ET> {
        ...

        public boolean hasNext() {
            return link.next != list.voidLink;
        }

        public ET next() {
            if (expectedModCount == list.modCount) {
                LinkedList.Link<ET> next = link.next;
                if (next != list.voidLink) {
                    lastLink = link = next;
                    pos++;
                    return link.data;
                }
                throw new NoSuchElementException();
            }
            throw new ConcurrentModificationException();
        }
    }

可以看到查找时首先需要通过二分法循环定位到position位置, 与ArrayList相比, 查询效率较慢;
LinkedList 的增删操作避免了数组拷贝, 在元素数量较大的情况下增删效率大大提升.
因为其链表的特性, 使用迭代器 for(Object o : list) 的方式比 for(int i=0; i<list.size(); i++) 的效率要高很多.

Vector

Vector is an implementation of {@link List}, backed by an array and synchronized.
All optional operations including adding, removing, and replacing elements are supported.
<p>All elements are permitted, including null.
<p>This class is equivalent to {@link ArrayList} with synchronized operations. This has a 
performance cost, and the synchronization is not necessarily meaningful to your application:
synchronizing each call to {@code get}, for example, is not equivalent to synchronizing on the
list and iterating over it (which is probably what you intended). If you do need very highly 
concurrent access, you should also consider {@link java.util.concurrent.CopyOnWriteArrayList}.

vector是List接口的实现类, 基于数组和同步特性, 支持包括添加、移除和替换元素在内的所有操作,允许包括NULL在
内所有元素. 
此类相当于具有同步操作的AlayList, 同步特性会造成更高的性能成本,  而且并不一定有意义：例如,  同步每次get操作
(for{synchronized{get}})并不等同于在列表上同步并迭代它(synchronized{for{get}}, 这可能是您想要的). 
如果确实需要做非常高的并发访问,  你也应该考虑CopyOnWriteArrayList.

Vector与ArrayList有以下几点不同
1. Vector的构造方法除了capacity, 还多出了capacityIncrement参数 public Vector(int capacity, int capacityIncrement), capacityIncrement用于控制每次扩容时增加的元素数, 如果capacityIncrement<=0, Vector的大小会翻倍, 而ArrayList大多数情况都是扩容一半;
2. Vector的元素数组没有transient修饰 protected Object[] elementData, 所以会执行默认的序列化;
3. Vector的包括增删改查在内的所有方法均添加了同步锁, 效率较差的同步方式;
4. Vetor使用了 AbstractList 内的 SimpleListIterator, 并没有自己重写.

由以上2/3可知, Vector提供了同步操作, 如果无须同步特性时, 效率要比ArrayList要差.
其实同步场景下使用 Collections.synchronizedList() 和 CopyOnWriteArrayList 也要比 Vector 要更合适, 后续说明.

CopyOnWriteArrayList

 A thread-safe random-access list.
 一种线程安全的随机访问列表.

 <p>Read operations (including {@link #get}) do not block and may overlap with
 update operations. Reads reflect the results of the most recently completed
 operations. Aggregate operations like {@link #addAll} and {@link #clear} are
 atomic; they never expose an intermediate state.
 包括get在内的读操作不会阻塞, 可能会和更新操作同时进行, 读到的值是最近完成的操作后的.
 像addAll/clear等聚合操作都是原子性的, 不会返回中间态(注, 这是因为外部加锁的原因).

 <p>Iterators of this list never throw {@link ConcurrentModificationException}. 
 When an iterator is created, it keeps a copy of the list's contents. It is 
 always safe to iterate this list, but iterations may not reflect the latest 
 state of the list.
 这种List不会抛出ConcurrentModificationException异常. 当其迭代器创建时, 会拷贝其内容. 
 迭代此列表永远都是安全的, 只不过结果有可能不是最新的.

 <p>Iterators returned by this list and its sub lists cannot modify the
 underlying list. In particular, {@link Iterator#remove}, {@link
 ListIterator#add} and {@link ListIterator#set} all throw {@link
 UnsupportedOperationException}.
 这个List及其子类List的迭代器都不允许修改List内的原内容, 特别是remove/add/set操作均直接抛出
 UnsupportedOperationException异常.

 <p>This class offers extended API beyond the {@link List} interface. It
 includes additional overloads for indexed search ({@link #indexOf} and {@link
 #lastIndexOf}) and methods for conditional adds ({@link #addIfAbsent} and
 {@link #addAllAbsent}).
 这个类在List接口基础上又扩展了很多API, 包括索引查找方法indexOf/lastIndesOf的额外重载方法, 
 以及按条件查找方法addIfAbsent/addAllAbsent.

下面看源码分析其特性

与ArrayList一致的 transient 修饰的元素数组 private transient volatile Object[] elements, 降低其序列化时的内存开销;
无锁读/有锁写的设计使其在保证同步特性的同时降低了性能损耗public synchronized E remove(int index) public synchronized boolean add(E e) public E get(int index) public boolean contains(Object o);

迭代器只允许读, 不允许写

static class CowIterator<E> implements ListIterator<E> {
    public void add(E object) {
        throw new UnsupportedOperationException();
    }
    @SuppressWarnings("unchecked")
    public E next() {
        if (index < to) {
            return (E) snapshot[index++];
        } else {
            throw new NoSuchElementException();
        }
    }
    public void set(E object) {
        throw new UnsupportedOperationException();
    }
}

读性能很好, 而写操作执行了clone以及大量的arraycopy, 性能较差(类似于ArrayList);

public E get(int index) {
    return (E) elements[index];
}
public synchronized E set(int index, E e) {
    Object[] newElements = elements.clone();
    @SuppressWarnings("unchecked")
    E result = (E) newElements[index];
    newElements[index] = e;
    elements = newElements;
    return result;
}
/**
* 利用COW机制实时更改容器大小
*/
public synchronized boolean add(E e) {
    Object[] newElements = new Object[elements.length + 1];
    System.arraycopy(elements, 0, newElements, 0, elements.length);
    newElements[elements.length] = e;
    elements = newElements;
    return true;
}
public synchronized void add(int index, E e) {
    Object[] newElements = new Object[elements.length + 1];
    System.arraycopy(elements, 0, newElements, 0, index);
    newElements[index] = e;
    System.arraycopy(elements, index, newElements, index + 1, elements.length - index);
    elements = newElements;
}
private void removeRange(int from, int to) {
    Object[] newElements = new Object[elements.length - (to - from)];
    System.arraycopy(elements, 0, newElements, 0, from);
    System.arraycopy(elements, to, newElements, from, elements.length - to);
    elements = newElements;
}

CopyOnWriteArrayList 是 COW( 写时拷贝 )机制的典范, COW机制是一种延时懒惰策略, Object[] snapshot = elements; 操作在单线程情况以及多线程只读时只会设置一个指向, 但在多线程写操作时就会进行一次拷贝, 写操作完成之后, 再将原容器的引用指向新的容器. 这样做的好处是我们可以对CopyOnWrite容器进行并发的读写, 而不需要给读加锁, 提高了效率, 因为读操作时容器不会添加任何元素. 所以CopyOnWrite容器也是一种读写分离的思想, 读和写不同的容器. 这也是JMM(java内存模型)的要求, JMM相关知识请看我的另一篇博客java 关于volatile, 指令重排, synchronized的心得.

由上可见, CopyOnWriteArrayList 以较小的性能损耗实现了同步操作, 是多线程下要求频繁读少量写的场景的最佳选择.

另外再说一个上面提到过的 Collections.synchronizedList() , 这个类可以给任意List的基本共有方法赋予同步操作, 看一下源码

     SynchronizedList(List<E> l) {
         super(l);
         list = l;
     }
     SynchronizedList(List<E> l, Object mutex) {
         super(l, mutex);
         list = l;
     }
     @Override public void add(int location, E object) {
          synchronized (mutex) {
              list.add(location, object);
          }
      }
      ...
      @Override public E get(int location) {
          synchronized (mutex) {
              return list.get(location);
          }
      }

与 CopyOnWriteArrayList 相比 , Collections.synchronizedList(ArrayList) 的写操作性能较好, 但读操作 CopyOnWriteArrayList 更胜一筹, 而Vector并没有什么优势, 参考: https://blog.csdn.net/zljjava/article/details/48139465

Stack

堆栈, 使用LIFO(后进先出)规则提供弹出/压入操作的集合.
源码里的Stack有两个, 一个是public class Stack<E> extends Vector<E>, 另一个是public class Stack<T> extends ArrayList<T>, 大同小异, 继承 Vector 的 Stack 的方法加了 Synchronized 锁实现同步, 下面摘录三个核心方法

    /**
     * 获取栈顶的元素, 但不移除
     */
    @SuppressWarnings("unchecked")
    public synchronized E peek() {
        try {
            return (E) elementData[elementCount - 1];
        } catch (IndexOutOfBoundsException e) {
            throw new EmptyStackException();
        }
    }

    /**
     * 弹出栈顶元素, 集合中不再保留
     */
    @SuppressWarnings("unchecked")
    public synchronized E pop() {
        if (elementCount == 0) {
            throw new EmptyStackException();
        }
        final int index = --elementCount;
        final E obj = (E) elementData[index];
        elementData[index] = null;
        modCount++;
        return obj;
    }

    /**
     * 把一个元素压入栈顶
     */
    public E push(E object) {
        addElement(object);
        return object;
    }