String: public String(int[] codePoints,int offset,int count) 由int数组转化的构造方法

先把代码贴出来

public String(int[] codePoints, int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= codePoints.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > codePoints.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }

        final int end = offset + count;

        // Pass 1: Compute precise size of char[]
        int n = count;
        for (int i = offset; i < end; i++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                continue;
            else if (Character.isValidCodePoint(c))
                n++;
            else throw new IllegalArgumentException(Integer.toString(c));
        }

        // Pass 2: Allocate and fill in char[]
        final char[] v = new char[n];

        for (int i = offset, j = 0; i < end; i++, j++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                v[j] = (char)c;
            else
                Character.toSurrogates(c, v, j++);
        }

        this.value = v;
    }

这个构造方法的运行结果

 public static void main(String[] args) {
        int[] a = {100, 2312, 12313, 54545, 23432, 22, 65, 78, 99};
        String b = new String(a, 0, a.length);
        System.out.println(b);
    }
    //这是结果
    dई〙픑守ANc
  • unicode的合理取值范围现在扩展到了0x0000-0x10ffff,一共21位,二进制
    0000 0000 0000 0001 0000 1111 1111 1111 1111
  • java中的char是两个字节的,也就是16位。最大值就是0xffff,就是二进制
    1111 1111 1111 1111
  • unicode中 0x0000-0xffff 被称作BMP (Basic Multilingual Plane),char只能表示BMP
  • 值大于0xffff的字符称为增补字符
  • char只能表示BMP,而int的范围甚至超出了unicode的合理取值范围
Character.isBmpCodePoint(c)
 public static boolean isBmpCodePoint(int codePoint) {
        return codePoint >>> 16 == 0;
        // Optimized form of:
        //     codePoint >= MIN_VALUE && codePoint <= MAX_VALUE
        // We consistently use logical shift (>>>) to facilitate
        // additional runtime optimizations.
    }

判断是不是Bmp,如果是的话,一个char就能放下,所以不需要增加空间

Character.isValidCodePoint(c)
public static boolean isValidCodePoint(int codePoint) {
        // Optimized form of:
        //     codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT
        int plane = codePoint >>> 16;
        return plane < ((MAX_CODE_POINT + 1) >>> 16);
    }

判断是不是合理取值范围,如果是的话,说明一个char的空间存不下,再申请一个,如果超出了合理取值范围就抛异常

  • 这个例子就是超出范围了,抛异常
 public static void main(String[] args) {
        int[] a = {100, 99,0x7fffffff};
        String b = new String(a, 0, a.length);
        System.out.println(b);
    }

// Exception in thread "main" java.lang.IllegalArgumentException: 268435455
    at java.lang.String.<init>(String.java:266)
    at main.java.Test.main(Test.java:11
Character.toSurrogates(c, v, j++);

会将大于BMP范围但是是unicode合理范围的int,处理成两个char,分别为高位代理和地位代理,Charater类中有对应的方法,判断是否为代理,是否为高位代理,是否为地位代理,是否为代理对,将一对代理转换为一个代码点

 static void toSurrogates(int codePoint, char[] dst, int index) {
        // We write elements "backwards" to guarantee all-or-nothing
        dst[index+1] = lowSurrogate(codePoint);
        dst[index] = highSurrogate(codePoint);
    }


public static char lowSurrogate(int codePoint) {
        return (char) ((codePoint & 0x3ff) + MIN_LOW_SURROGATE);
    }


public static char highSurrogate(int codePoint) {
        return (char) ((codePoint >>> 10)
            + (MIN_HIGH_SURROGATE - (MIN_SUPPLEMENTARY_CODE_POINT >>> 10)));
    }

猜你喜欢

转载自blog.csdn.net/qq_39477410/article/details/82469104