版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
java字符编码
结合上图和下面的程序
public static void main(String[] args) {
String s = "\uD835\uDD6B";
print(s);
}
private static void print(String s) {
System.out.println("s: "+s);
System.out.println("s.length(): "+s.length());
System.out.println("s.codePointCount(0,s.length()): "+s.codePointCount(0,s.length()));
printBytes(s);
printBytes(s,"GBK");
printBytes(s,"UTF-8");
printBytes(s,"UTF-16");
printChars(s);
}
private static void printBytes(String s) {
byte[] bytes = s.getBytes();
System.out.print(String.format("%s: ","默认编码"));
for(byte b :bytes) {
System.out.printf(String.format("%02x ",b));
}
System.out.println();
}
private static void printBytes(String s,String charset) {
byte[] bytes = new byte[0];
try {
bytes = s.getBytes(charset);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.print(String.format("%s: ",charset));
for(byte b :bytes) {
System.out.printf(String.format("%02x ",b));
}
System.out.println();
}
private static void printChars(String s) {
char[] bytes = s.toCharArray();
System.out.print(String.format("%s: ","char底层"));
for(char b :bytes) {
System.out.printf(Integer.toHexString(b)+" ");
}
System.out.println();
}
结果:
- .java文件编码不影响程序运行时的字符编码
- 任何编码格式的java文件,转换为class二进制时,字符串常量都会转换为UTF-8字节
- 程序运行时,加载class,将UTF-8字符串常量转换为UTF-16编码保存在内存中,以char[]的形式保存UTF-16编码
- String.length()获得的是UTF-16的char[]数组的长度,UTF-16存在2字节字符和4字节字符,2字节字符占用1个char,4字节字符占用2个char,所以,length() 不等于 字符数
- String.codePointCount()获得的是对UTF-16的char[]数组对应的字符数(码点) 也就是真实的字符数
- String.getBytes()时,将UTF-16的char[]转化为指定的编码字节数组,如果没有指定,使用UTF-8
- java中 char[]的默认编码格式为UTF-16