03 从JDK源码看String类(1)

概况

Java 语言使用 String 类用来代表字符串，实际上 String 对象的值是一个常量，一旦创建后不能被改变。正式因为其不可变，所以它是线程安全地，可以多个线程共享。

相信对于 String 的使用大家都再熟悉不过的了，这里就了解下 JDK 中怎么实现 String 类的。

继承结构

类定义

public final class String implements java.io.Serializable, Comparable<String>, CharSequence

String 类被声明为 final，说明它不能再被继承。同时它实现了三个接口，分别为 Serializable、Comparable 和 CharSequence。其中 Serializable 接口表明其可以序列化；InputStream 被定为 public 且 abstract 的类，实现了Closeable接口。Closeable 接口表示 InputStream 可以被close，接口定义如下：

/**
 * A {@code Closeable} is a source or destination of data that can be closed.
 * The close method is invoked to release resources that the object is
 * holding (such as open files).
 *
 * @since 1.5
 */
public interface Closeable extends AutoCloseable {

    /**
     * Closes this stream and releases any system resources associated
     * with it. If the stream is already closed then invoking this
     * method has no effect.
     *
     * <p> As noted in {@link AutoCloseable#close()}, cases where the
     * close may fail require careful attention. It is strongly advised
     * to relinquish the underlying resources and to internally
     * <em>mark</em> the {@code Closeable} as closed, prior to throwing
     * the {@code IOException}.
     *
     * @throws IOException if an I/O error occurs
     */
    public void close() throws IOException;
}

主要属性

/** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;

    /**
     * Class String is special cased within the Serialization Stream Protocol.
     *
     * A String instance is written into an ObjectOutputStream according to
     * <a href="{@docRoot}/../platform/serialization/spec/output.html">
     * Object Serialization Specification, Section 6.2, "Stream Elements"</a>
     */
    private static final ObjectStreamField[] serialPersistentFields =
        new ObjectStreamField[0];

    public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                         = new CaseInsensitiveComparator();

主要的看到，分别是：

value 用于存储字符串对象的值。
hash 为字符串对象的哈希值，默认值为0。
CASE_INSENSITIVE_ORDER 表示用于排序的比较器。

内部类

该内部类主要是提供排序的比较器，实现了Comparator接口和compare方法，另外一个readResolve方法用于替换反序列化时的对象。compare核心方法的逻辑是，根据两者编码是否相同做处理，如果相同则分 Latin1 或 UTF16 两种情况比较，类似地，如果两者编码不同，则需要用 Latin1 编码与 UTF16 编码比较，而 UTF16 则要与 Latin1 比较。

private static class CaseInsensitiveComparator
            implements Comparator<String>, java.io.Serializable {
        private static final long serialVersionUID = 8575799808933029326L;

        public int compare(String s1, String s2) {
            byte v1[] = s1.value;
            byte v2[] = s2.value;
            if (s1.coder() == s2.coder()) {
                return s1.isLatin1() ? StringLatin1.compareToCI(v1, v2)
                                     : StringUTF16.compareToCI(v1, v2);
            }
            return s1.isLatin1() ? StringLatin1.compareToCI_UTF16(v1, v2)
                                 : StringUTF16.compareToCI_Latin1(v1, v2);
        }
        private Object readResolve() { return CASE_INSENSITIVE_ORDER; }
    }

通过自己点击源码查看，确实有compare方法的实现和readResolve方法的实现。不过compare实现的核心思路是：取s1和s2长度的较小值，然后对两个字符串从头开始逐一位置的字符进行比较。如果相等，直接返回两个字符串长度只查；反之，如果两个不相等，那么分别转大写小写再次进行比较，如果整个遍历较小的长度都不相等，那么返回两个对应的字符串之差作为最后的结果。

private static class CaseInsensitiveComparator
            implements Comparator<String>, java.io.Serializable {
        // use serialVersionUID from JDK 1.2.2 for interoperability
        private static final long serialVersionUID = 8575799808933029326L;

        public int compare(String s1, String s2) {
            int n1 = s1.length();
            int n2 = s2.length();
            int min = Math.min(n1, n2);
            for (int i = 0; i < min; i++) {
                char c1 = s1.charAt(i);
                char c2 = s2.charAt(i);
                if (c1 != c2) {
                    c1 = Character.toUpperCase(c1);
                    c2 = Character.toUpperCase(c2);
                    if (c1 != c2) {
                        c1 = Character.toLowerCase(c1);
                        c2 = Character.toLowerCase(c2);
                        if (c1 != c2) {
                            // No overflow because of numeric promotion
                            return c1 - c2;
                        }
                    }
                }
            }
            return n1 - n2;
        }

        /** Replaces the de-serialized object. */
        private Object readResolve() { return CASE_INSENSITIVE_ORDER; }
    }

构造方法

有很多种构造方法，看主要的几个。没有参数的构造方法直接将空字符串的 value 进行赋值。

没有参数的构造器源码如下：可见是通过创建长度为0的字符数组给value赋值。

    public String() {
        this.value = new char[0];
    }

一个String对象作为参数的构造方法源码如下：可见是为value和hash进行赋值。

public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

构造方法传入 char 数组时的源码如下：

public String(char value[]) {
    this.value = Arrays.copyOf(value, value.length);
}

public String(char value[], int offset, int count) {
    if (offset < 0) {
        throw new StringIndexOutOfBoundsException(offset);
    }
    if (count < 0) {
        throw new StringIndexOutOfBoundsException(count);
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > value.length - count) {
        throw new StringIndexOutOfBoundsException(offset + count);
    }
    this.value = Arrays.copyOfRange(value, offset, offset+count);
}

通过上面的源码可以看到，如果传入一个字符数组，那么通过Arrays的工具类的copyOf方法进行创建赋值给value。当前也可以传入三个参数。里面有一些基础的数据校验，比如起始偏移等，该方法也是通过Arrays工具类的copyOfRange方法创建并赋值value。

构造方法传入 byte 数组，源码如下：

public String(byte bytes[], int offset, int length, Charset charset) {
    if (charset == null)
        throw new NullPointerException("charset");
    checkBounds(bytes, offset, length);
    this.value =  StringCoding.decode(charset, bytes, offset, length);
}

主要是传入一个字节数组，一个起始偏移量，一个长度，一个字符编码来进行创建，并赋值给value。

主要方法

length方法

字符串的长度应该是字符的长度，而不是字节数组的长度。源码如下：

    public int length() {
        return value.length;
    }

isEmpty方法

通过判断 byte 数组长度是否为0来判断字符串对象是否为空。

    public boolean isEmpty() {
        return value.length == 0;
    }

charAt方法

根据对应位置的角标，取出相应位置的字符。

public char charAt(int index) {
    if ((index < 0) || (index >= value.length)) {
        throw new StringIndexOutOfBoundsException(index);
    }
    return value[index];
}

codePointAt方法

获取字符串对应索引的 Unicode 代码点，根据编码做不同处理。

public int codePointAt(int index) {
    if ((index < 0) || (index >= value.length)) {
        throw new StringIndexOutOfBoundsException(index);
    }
    return Character.codePointAtImpl(value, index, value.length);
}