java字符编码 String和char

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接: https://blog.csdn.net/u013501457/article/details/102661433

java字符编码

java字符编码
结合上图和下面的程序

public static void main(String[] args) {
        String s = "\uD835\uDD6B";
        print(s);
    }

    private static void print(String s) {
        System.out.println("s: "+s);
        System.out.println("s.length(): "+s.length());
        System.out.println("s.codePointCount(0,s.length()): "+s.codePointCount(0,s.length()));
        printBytes(s);
        printBytes(s,"GBK");
        printBytes(s,"UTF-8");
        printBytes(s,"UTF-16");
        printChars(s);
    }

    private static void printBytes(String s) {
        byte[] bytes = s.getBytes();
        System.out.print(String.format("%s: ","默认编码"));
        for(byte b :bytes) {
            System.out.printf(String.format("%02x ",b));
        }
        System.out.println();
    }

    private static void printBytes(String s,String charset) {
        byte[] bytes = new byte[0];
        try {
            bytes = s.getBytes(charset);
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
        System.out.print(String.format("%s: ",charset));
        for(byte b :bytes) {
            System.out.printf(String.format("%02x ",b));
        }
        System.out.println();
    }

    private static void printChars(String s) {
        char[] bytes = s.toCharArray();
        System.out.print(String.format("%s: ","char底层"));
        for(char b :bytes) {
            System.out.printf(Integer.toHexString(b)+" ");
        }
        System.out.println();
    }

结果:
在这里插入图片描述

  1. .java文件编码不影响程序运行时的字符编码
  2. 任何编码格式的java文件,转换为class二进制时,字符串常量都会转换为UTF-8字节
  3. 程序运行时,加载class,将UTF-8字符串常量转换为UTF-16编码保存在内存中,以char[]的形式保存UTF-16编码
  4. String.length()获得的是UTF-16的char[]数组的长度,UTF-16存在2字节字符和4字节字符,2字节字符占用1个char,4字节字符占用2个char,所以,length() 不等于 字符数
  5. String.codePointCount()获得的是对UTF-16的char[]数组对应的字符数(码点) 也就是真实的字符数
  6. String.getBytes()时,将UTF-16的char[]转化为指定的编码字节数组,如果没有指定,使用UTF-8
  7. java中 char[]的默认编码格式为UTF-16

猜你喜欢

转载自blog.csdn.net/u013501457/article/details/102661433