源码解读-java中String类的substring方法

引出讨论

首先,定义一个字符串,调用substring方法

String s="dsajhfkjhfsa";
String s1=s.substring(0,2);

Ctrl +鼠标左键进入substring

substring

首先映入眼帘的是注释

 /**
     * Returns a string that is a substring of this string. The
     * substring begins at the specified {@code beginIndex} and
     * extends to the character at index {@code endIndex - 1}.
     * Thus the length of the substring is {@code endIndex-beginIndex}.
     * <p>
     * Examples:
     * <blockquote><pre>
     * "hamburger".substring(4, 8) returns "urge"
     * "smiles".substring(1, 5) returns "mile"
     * </pre></blockquote>
     *
     * @param      beginIndex   the beginning index, inclusive.
     * @param      endIndex     the ending index, exclusive.
     * @return     the specified substring.
     * @exception  IndexOutOfBoundsException  if the
     *             {@code beginIndex} is negative, or
     *             {@code endIndex} is larger than the length of
     *             this {@code String} object, or
     *             {@code beginIndex} is larger than
     *             {@code endIndex}.
     */

翻译

返回当前字符串的子字符串,子字符串从beginIndex开始,延伸到endIndex - 1,因此子字符串的长度是endIndex-beginIndex
举例

"hamburger".substring(4, 8) returns "urge"
"smiles".substring(1, 5) returns "mile"

异常:IndexOutOfBoundsException,如果beginIndex为负数,或者endIndex大于当前字符串对象的长度或者beginIndex>endIndex,就抛出异常

源码

public String substring(int beginIndex, int endIndex) {
    
    
        if (beginIndex < 0) {
    
    
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
    
    
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
    
    
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

分析:先判断beginIndex和endIndex有没有越界,beginIndex大于0;endIndex不可以大于数组的字符串的长度;否则,抛出异常
再判断 endIndex - beginIndex如果小于0,则不满足截取字符串条件,会抛出StringIndexOutOfBoundsException异常

   public StringIndexOutOfBoundsException(int index) {
    
    
        super("String index out of range: " + index);
    }

返回字符串的时候,如果beginIndex在原字符串开始位置,endIndex在原字符串末尾,就直接返回原字符串this,其他情况都会通过构造函数创建新字符串。新字符串包括beginIndex位置的元素,但不包括endIndex位置的元素,新字符串长度为endIndex-beginIndex

return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);

我们先来看看value是怎么获取的吧,从下面的代码可以看出,value是一个final类型的char数组,是不可变的,且是一个成员变量

  /** The value is used for character storage. */
    private final char value[];

value如何获得值

现在我们看看value如何获得值

 String s="dsajhfkjhfsa";

我们知道字符串的分配和其他对象分配一样,是需要消耗高昂的时间和空间的,而且字符串我们使用的非常多。JVM为了提高性能和减少内存的开销,在实例化字符串的时候进行了一些优化:使用字符串常量池。每当我们创建字符串常量时,JVM会首先检查字符串常量池,如果该字符串已经存在常量池中,那么就直接返回常量池中的实例引用。如果字符串不存在常量池中,就会实例化该字符串并且将其放到常量池中。由于String字符串的不可变性我们可以十分肯定常量池中一定不存在两个相同的字符串。

这里先去JVM常量池里找,找到了就不用创建对象了,直接把对象的引用地址赋给s。找不到会重新创建一个对象,然后把对象的引用地址赋给s

那么,现在常量池里面没有"dsajhfkjhfsa",就会实例化一个对象,实例化就会调用构造函数,然而构造函数有很多歌,会调用哪一个构造函数呢?

public final class String implements java.io.Serializable, Comparable<String>, CharSequence {
    
    
    private final char value[];
    public String() {
    
    
        this.value = "".value;//为空
    }
    public String(String original) {
    
    
        this.value = original.value;
        this.hash = original.hash;
    }
    public String(char value[]) {
    
    
        this.value = Arrays.copyOf(value, value.length);
    }
}   

在这里插入图片描述

从以上代码可以看出

String 类是final修饰
String存储内容使用的是char数组
char数组是final修饰

在这里插入图片描述

* <blockquote><pre>
 *     String str = "abc";
 * </pre></blockquote><p>
 * is equivalent to:
 * <blockquote><pre>
 *     char data[] = {'a', 'b', 'c'};
 *     String str = new String(data);
 

那么,很明显, String s=“dsajhfkjhfsa”;会调用下面这个构造函数,参数类型为char[]

  public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

我们再来看看copyOf的函数实现
注释

    /**
     * Copies the specified array, truncating or padding with null characters (if necessary)
     * so the copy has the specified length.  For all indices that are valid
     * in both the original array and the copy, the two arrays will contain
     * identical values.  For any indices that are valid in the copy but not
     * the original, the copy will contain <tt>'\\u000'</tt>.  Such indices
     * will exist if and only if the specified length is greater than that of
     * the original array.
     *
     * @param original the array to be copied
     * @param newLength the length of the copy to be returned
     * @return a copy of the original array, truncated or padded with null characters
     *     to obtain the specified length
     * @throws NegativeArraySizeException if <tt>newLength</tt> is negative
     * @throws NullPointerException if <tt>original</tt> is null
     * @since 1.6
     */

翻译
复制指定的数组,使用空字符截断或填充(如有必要),因此副本具有指定的长度。 对于在原始数组和副本中均有效的所有索引,两个数组将包含相同的值。 对于副本中有效但原始索引无效的任何索引,副本将包含 ‘\ u000’。 当且仅当指定长度大于原始数组的长度时,此类索引才会存在。

返回:原始数组的副本,用空字符截断或填充以获取指定的长度

 public static char[] copyOf(char[] original, int newLength) {
        char[] copy = new char[newLength];
        System.arraycopy(original, 0, copy, 0,
                         Math.min(original.length, newLength));
        return copy;
    }

可以看到, copyOf返回char[],那么,至此,value的赋值过程就确定了,即JVM常量池没有找到"dsajhfkjhfsa"这个字面量,会调用构造函数String(char value[])实例化这个字符串。value现在是一个已经赋值{‘d’,‘s’,‘a’,‘j’,‘h’,‘f’,‘k’,‘j’,‘h’,‘f’,‘s’,‘a’}的字符串数组

现在回到substring源码

 public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

我们知道,除了截取原字符串本身长度外,截取其他长度字符串都会调用new String(value, beginIndex, subLen);

String(char value[], int offset, int count)实现

现在我们来看看构造器 public String(char value[], int offset, int count)

注释

 /**
     * Allocates a new {@code String} that contains characters from a subarray
     * of the character array argument. The {@code offset} argument is the
     * index of the first character of the subarray and the {@code count}
     * argument specifies the length of the subarray. The contents of the
     * subarray are copied; subsequent modification of the character array does
     * not affect the newly created string.
     *
     * @param  value
     *         Array that is the source of characters
     *
     * @param  offset
     *         The initial offset
     *
     * @param  count
     *         The length
     *
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} and {@code count} arguments index
     *          characters outside the bounds of the {@code value} array
     */

翻译
分配一个新的{@code String},其中包含来自字符数组参数的子数组的字符。 {@code offset}参数是子数组第一个字符的索引,{@ code count}参数指定子数组的长度。 子数组的内容被复制; 字符数组的后续修改不会影响新创建的字符串。

这是一个有参构造函数,参数为char字符数组,offset(起始位置,偏移量),count(个数)

  • 作用就是在char数组的基础上,从offset位置开始计数count个,构成一个新的String的字符串

  • 意义就类似于截取count个长度的字符集合构成一个新的String对象

 public String(char value[], int offset, int count) {
    
    
        if (offset < 0) {
    
    
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
    
    
            if (count < 0) {
    
    
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
    
    
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - 
        count) {
    
    
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

从代码可以看出,上面这一段都是校验然后抛异常,重点,截取字符,这里也用到数组拷贝
注意,现在的this是指向的谁?,我们定义了字符串对象s,s调用了substring,通过new String(char value[], int offset, int count) 创建了匿名对象,所以,这个this指向新创建的匿名对象, this.value即代表这个匿名对象的value。

 if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }

我们看看上面这段代码,我们需要明白,当我是通过调用substring而间接调用这个构造函数时,count <0是不成立的,因为如下,substring调用构造函数时,subLen是大于等于0的,subLen<0会直接抛异常,之所以还需要判断count<0是因为有可能其他地方会调用这个构造函数

 return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);

所以,当count=0,this.value = “”.value;即把新建对象的value数组置空,然后reurn.

我们再来看看下面这句奇怪的代码

// Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }

为什么要用offset > value.length - count而不用offset +count< value.length呢,原因已经说明了,offset or count might be near -1>>>1.

-1>>>1 == 2147483647 is the maximum int value you can have in Java.(2147483647是java中整数类型的最大值)
即offset和count都可能接近最大值,那么相加就会造成溢出。

在这里插入图片描述

int a = -1;
a = a >>> 1;
System.out.println(a);
Then a will be 2147483647 (Which is Integer.MAX_VALUE)
System.out.println(Integer.toBinaryString(-1>>>1));
This will print:1111111111111111111111111111111

现在,我们看看copyOfRange这个函数的实现

  this.value = Arrays.copyOfRange(value, offset, offset+count);

注释

 /**
     * Copies the specified range of the specified array into a new array.
     * The initial index of the range (<tt>from</tt>) must lie between zero
     * and <tt>original.length</tt>, inclusive.  The value at
     * <tt>original[from]</tt> is placed into the initial element of the copy
     * (unless <tt>from == original.length</tt> or <tt>from == to</tt>).
     * Values from subsequent elements in the original array are placed into
     * subsequent elements in the copy.  The final index of the range
     * (<tt>to</tt>), which must be greater than or equal to <tt>from</tt>,
     * may be greater than <tt>original.length</tt>, in which case
     * <tt>'\\u000'</tt> is placed in all elements of the copy whose index is
     * greater than or equal to <tt>original.length - from</tt>.  The length
     * of the returned array will be <tt>to - from</tt>.
     *
     * @param original the array from which a range is to be copied
     * @param from the initial index of the range to be copied, inclusive
     * @param to the final index of the range to be copied, exclusive.
     *     (This index may lie outside the array.)
     * @return a new array containing the specified range from the original array,
     *     truncated or padded with null characters to obtain the required length
     * @throws ArrayIndexOutOfBoundsException if {@code from < 0}
     *     or {@code from > original.length}
     * @throws IllegalArgumentException if <tt>from &gt; to</tt>
     * @throws NullPointerException if <tt>original</tt> is null
     * @since 1.6
     */

解释
将指定数组的指定范围复制到新数组中。该范围的初始索引( from)必须介于零和original.length之间(包括两端)。original [from]处的值放置在副本的初始元素中(除非 from == original.length或 from ==to )。原始数组中后续元素的值放入副本中的后续元素中。 范围(to)的最终索引(必须大于或等于 from ),可以大于 original.length ,在这种情况下,将’\ u000’放置在副本的所有索引大于或等于original.length-from的元素中。 返回数组的长度为 to-from

original :要从中复制范围的数组
from:要复制的范围的初始索引,包括
to:要复制范围的最终索引(不包括)。该索引可能位于数组之外
return :一个新数组,其中包含原始数组中指定的范围,截断或用空字符填充以获得所需的长度

 public static char[] copyOfRange(char[] original, int from, int to) {
        int newLength = to - from;
        if (newLength < 0)
            throw new IllegalArgumentException(from + " > " + to);
        char[] copy = new char[newLength];
        System.arraycopy(original, from, copy, 0,
                         Math.min(original.length - from, newLength));
        return copy;
    }

至此:我们就梳理完成了 substring(int beginIndex, int endIndex)的全部流程

在这里插入图片描述

在这里插入图片描述

所以,当调用substring(int beginIndex, int endIndex),如果beginIndex=0,且 endIndex- beginIndex等于原字符串长度,我们将得到原来的字符串。否则,如果不抛出异常,我们将得到的是一个新建的字符串对象,它是原字符串的子串,它的value值通过数组拷贝得到

substring的重载

在String源码中,substring还有一个重载函数
注释

 /**
     * Returns a string that is a substring of this string. The
     * substring begins with the character at the specified index and
     * extends to the end of this string. <p>
     * Examples:
     * <blockquote><pre>
     * "unhappy".substring(2) returns "happy"
     * "Harbison".substring(3) returns "bison"
     * "emptiness".substring(9) returns "" (an empty string)
     * </pre></blockquote>
     *
     * @param      beginIndex   the beginning index, inclusive.
     * @return     the specified substring.
     * @exception  IndexOutOfBoundsException  if
     *             {@code beginIndex} is negative or larger than the
     *             length of this {@code String} object.
     */

翻译
返回原字符串的子串,返回从 beginIndex开始,延伸到字符串末尾的子串,包含beginIndex,如果beginIndex =0,就返回原字符串,否则,在不抛出异常的情况下,返回新建的字符串对象。返回新建的字符串对象的过程和前文所述是一样的

public String substring(int beginIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        int subLen = value.length - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
    }

参考文献

美团面试题:String s = new String(“111”)会创建几个对象?

String中substring方法内存泄漏问题

https://stackoverflow.com/questions/16054289/offset-or-count-might-be-near-11-what-does-it-mean

猜你喜欢

转载自blog.csdn.net/ningmengshuxiawo/article/details/115247532
今日推荐