String 源码学习笔记

对于java程序员来说，String类是在熟悉不过了，但你真的了解它吗？常用构造器有哪些?intern()方法是干什么的？字符串拼接“+”号是如何实现的？我们通过源码一一解答。

本章String源码使用java1.8版本

String以下几个特性决定它是不可变的

1、`String`是`final`类，不可继承。

2、字符数组成员变量`value`使用`final`修饰，也就是常量，常量一大好处就是线程安全，所以`String`不需要考虑线程安全问题。

当然如果通过反射还是可以修改value常量值的，这时候会发现如果字符串是在常量池里，那么这个常量池字符串将会被修改成其他值。

3、成员变量`value`字符数组必须独有，其他程序(不包括String类和反射)不可操作`value`字符数组

String构造器

默认构造器String()

这个构造器基本上不使用，反正博主是没见过使用它的。

这个很简单，看一下源码就什么都明白了

public String() {
    this.value = new char[0];
}

实际上就是个长度等于0的字符数组，其实就是空字符串""，运行以下案例就什么都明白了

 String empty = "";
 String str = new String();
 System.out.println(empty == (str));//false
 System.out.println(empty == (str).intern());//true 如果不明白是什么后面章节将介绍
 System.out.println(empty.equals(str));//true

字符串构造器String(String original)

使用的也比较少

其实这个构造器就是个克隆的过程，但String是不可变的，所以克隆也是没有多大必要。

public String(String original) {
    this.value = original.value;
    this.hash = original.hash;
}

通过看源码也是一目了然，不在做过多讲解。

字符数组构造器String(char value[])

字符数组构造器有三个重载：

第一种:

   public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

传入一个字符数组，将该数组拷贝一份复制给value，其实这样做就是为了保证String不可变性，介绍不拷贝的话，字符数组中的元素就可以发生变化，如下代码：string成员变量value第一元素就会被修改为'1'，这样就破坏了String的不可变性。第三种字符数组构造器就是这样实现的。

char[] value = {'a','b','c'};
String string = new String(value);
value[0] = '1';

第二种:

public String(char value[], int offset, int count) {
    if (offset < 0) {
        throw new StringIndexOutOfBoundsException(offset);
    }
    if (count < 0) {
        throw new StringIndexOutOfBoundsException(count);
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > value.length - count) {
        throw new StringIndexOutOfBoundsException(offset + count);
    }
    this.value = Arrays.copyOfRange(value, offset, offset+count);
}

取得传入字符数组的部分元素，在第一种情况下，就是多了对传入的offset count变量判断是否下标越界。

第三种:

/*
    * Package private constructor which shares value array for speed.
    * this constructor is always expected to be called with share==true.
    * a separate constructor is needed because we already have a public
    * String(char[]) constructor that makes a copy of the given char[].
    */
String(char[] value, boolean share) {
    // assert share : "unshared not supported";
    this.value = value;
}

可以看到就是将传入字符数组直接赋值给成员变量value，share变量只是跟第一种构造器做区分使用，并且内部并没有使用该变量，这样做可以节省内存开销运行速率上也有所增加。

该构造器并没有添加访问修饰符，只有同一包内才可以访问，其实就是给jdk内部使用的一种构造器，比如：java.lang.Integer#toHexString、java.lang.Long#toUnsignedString(long, int)等等。

字节数组构造器String(byte bytes[])

这个在我们平时开发过程中使用的是最多的构造器了，比如：读取文本文件转换成String、网络IO二进制转换成String。

有以下几种重载构造器

@Deprecated
public String(byte ascii[], int hibyte, int offset, int count)

@Deprecated
public String(byte ascii[], int hibyte)

以上两种已经不推荐使用了，在这里就不进行讲解了。

在介绍以下几个构造器之前，说明下变量的含义

bytes：需要转换成字符串的字节数组

offset：字节数组的第一个字节的下标

length：需要转换成字符串的字节长度=

charsetName：编码格式的字符串名称，如：UTF-8

charset：编码格式java.nio.charset.Charset，如：Charset.forName("UTF-8");

public String(byte bytes[], int offset, int length, String charsetName)
            throws UnsupportedEncodingException {
        if (charsetName == null)
            throw new NullPointerException("charsetName");
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(charsetName, bytes, offset, length);
}

截取字节数组bytes从下标offset开始截取length长度，字节数组的编码是charsetName，如下样例：

try {
    byte[] bytes = "1234567890".getBytes("UTF-8");
    String str = new  String(bytes, 2, 5, "UTF-8");// 34567
    System.out.println(str);
} catch (UnsupportedEncodingException e) {
    e.printStackTrace();
}

public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBounds(bytes, offset, length);
        this.value =  StringCoding.decode(charset, bytes, offset, length);
}

其实与String(byte bytes[], int offset, int length, String charsetName)构造器差不多，就是把charsetName改成了charset。

public String(byte bytes[], String charsetName)throws UnsupportedEncodingException {
        this(bytes, 0, bytes.length, charsetName);
}

外观模式，执行String(byte bytes[], int offset, int length, String charsetName)构造器

public String(byte bytes[], Charset charset){
        this(bytes, 0, bytes.length, charset);
}

外观模式，执行String(byte bytes[], int offset, int length, Charset charset)构造器

public String(byte bytes[], int offset, int length) {
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(bytes, offset, length);
}

这个构造器没有指定字节编码，使用的是默认编码Charset.defaultCharset()

public String(byte bytes[]) {
        this(bytes, 0, bytes.length);
}

外观模式，执行String(byte bytes[], int offset, int length)构造器

String(StringBuffer buffer)和public String(StringBuilder builder)

这两个构造器没有什么好说的，只是StringBuffer 和StringBuilder 转换成字符串，但是一般我们都是使用它们俩的toString()方法。

到这里String构造器分析完了，其实我们平时主要使用的就是字符数组构造器和字节数组构造器根据不同情况选择不同的参数的构造器。

String.intern()

相信有很多同学已经对该方法有过了解，它是一个本地方法。

public native String intern();

下面是官网对其定义(java8)

public String intern()

A pool of strings, initially empty, is maintained privately by the class String.

When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.

It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.

All literal strings and string-valued constant expressions are interned. String literals are defined in section 3.10.5 of the The Java™ Language Specification.

Returns:

a string that has the same contents as this string, but is guaranteed to be from a pool of unique strings.

这段文字是说明了很多内容

1、字符串常量池，初始是空的，它是由String类私自维护的。

2、当intern方法调用，如果常量池中存在这个字符串(由equals方法判断相等的)，则返回常量池中的字符串，否则将这个字符串添加到常量池中，并返回一个对这个对象的引用。

其实就是存在则返回，否则添加并返回。

由此可见，对于任意两个字符串s和t，s.intern() == t.intern() 是 true，s.equals(t) 也是 true。

3、所有的字符串和字符串值的常数表达式都被插入，字符串字面量是在 Java™语言规范的3.10.5. String 字面量中定义的，关于字面

通俗点解释就是，使用双引号""的字符串，全部插入到字符串常量池中，关于字面量请参考字符串字面量小节

举个栗子：

char[] value = {'1','a','2','b'};
String str = "1a2b";//常量池中创建
String s = new String(value);//在堆中创建String对象
String intern = s.intern();//从常量池中获取
System.out.println(str==s);//false
System.out.println(str==intern);//true
System.out.println(s==intern);//false

为什么要有 intern，需要注意的是什么呢？

字符串常量池就是使用了共享模式，从而提升了效率和减少了内存占用。

要想将堆中创建的的String对象放入常量池中，只需要调用intern方法即可。

注意

只对常用的字符串添加到常量池中，使用次数很少或很长的字符串不要用intern添加常量池，这样会导致常量池中出现很多没有用的和占用内存非常大的字符串从而出现内存泄露，严重将出现内存溢出

字符串”+”号拼接

平时开发过程中经常会使用”+”号拼接字符串，那么它的实现原理是怎么样的呢？我们通过java反编译看看是如何实现的，下载一个反编译工具cfr_0_132

String +号拼接常用列子

private static void demo1() {
    String a = "123";
    String b = "123" + "456";
    String c = a + "456";

    int intd = 123;
    String stre = "456";
    String f = intd + stre;
    System.out.println(f);
    String h = intd + "456";
    System.out.println(h);
}

通过反编译工具cfr，将上面代码进行反编译后我们看看javac是怎么处理+的，执行反编译命令

java -jar cfr_0_132.jar /src/Projects/self/string/target/classes/com/example/StringPlusSign.class --methodname demo1 --stringbuilder false

private static void demo1() {
    String a = "123";
    String b = "123456";//"123" + "456"
    String c = new StringBuilder().append(a).append("456").toString();//a + "456";
    int intd = 123;
    String stre = "456";
    String f = new StringBuilder().append(intd).append(stre).toString();//intd + stre;
    System.out.println(f);
    String h = new StringBuilder().append(intd).append("456").toString();//intd + "456";
    System.out.println(h);
}

与源代码对比一目了然：

1、+号使用StringBuilder替换了

2、变量b将"123"+456"编译成"123456"

字符串字面量

官网描述

A string literal consists of zero or more characters enclosed in double quotes. Characters may be represented by escape sequences (§3.10.6) - one escape sequence for characters in the range U+0000 to U+FFFF, two escape sequences for the UTF-16 surrogate code units of characters in the range U+010000 to U+10FFFF.

大概意思：

一个字符串字面量由双引号括起来的零或多个字符组成。字符可以是转义序列(§3.10.6 ) -一个转义序列字符范围是 U+0000 到 U+FFFF，两个转义序列用于UTF-16代理代码单元格，范围为U + 010000到U + 10FFFF。

下面是字符串字面量示例：

""                    // 空字符串
"\""                  // 一个引号 ” 表示方式
"This is a string"    // 一个包含16个字符的字符串
"This is a " +        // 实际上是一个字符串值常量表达式，
    "two-line string"    // /由两个字符串文字组成

那么java是怎么将字符串字面量加入到字符串常量池的呢？

引用字符串的存储——字符串常量池

当一个.java文件被编译成.class文件时，和所有其他常量一样，每个字符串字面量都通过一种特殊的方式被记录下来。

当一个.class文件被加载时（注意加载发生在初始化之前），JVM在.class文件中寻找字符串字面量。

当找到一个时，JVM会检查是否有相等的字符串在常量池中存放了堆中引用。

如果找不到，就会在堆中创建一个对象，然后将它的引用存放在池中的一个常量表中。

一旦一个字符串对象的引用在常量池中被创建，这个字符串在程序中的所有字面量引用都会被常量池中已经存在的那个引用代替。

字符串字面量的引用实际上是引用了字符串常量池中已存在的字符串对象，这个过程是在JVM加载.class文件是完成的。

接下来通过图文结合的方式说明下字符串引用关系

java代码：

private static void demo2() {
    String a = "abc";
    String b = new String("abc");
    String c = new String("abc").intern();

}

根据上面的的代码，咱们画个图理解下

变量a是字面量直接引用常量池中的 "abc" ，类加载时

变量b new 了一个String，存放在堆中，运行时

变量c new 了一个String，存放在堆中，掉用intern()方法发现常量池中存在"abc"直接返回其引用，运行时。

总结

本章介绍了String类的各个构造器，那些是常用的，那些是不常用的和如何使用，intern方法是干什么用的及其使用需要注意那些，字符串+号是在编译器处理的，什么是字面量什么时候加载到字符串常量池中。

到这里关于String源码学习就结束了。

写在最后

如发现哪些知识点有误或是没有看懂，请在评论区提出，博主及时改正。

欢迎转载请注明出处。