How Java recognizes and reads text files with different encodings

I believe most people know that there are four encoding formats for txt files, "GBK", "UTF-8", "Unicode", "UTF-16BE", and the difference between each encoding format lies in the information written into the file header. .In order to avoid the phenomenon of reading garbled characters, we should read the file header information before reading the text, so as to make the correct reading encoding method. The method is given below.
/**
* Determine the encoding format of the file
* @param fileName :file
* @return file encoding format
* @throws Exception
*/
public static String codeString(String fileName) throws Exception{
BufferedInputStream bin = new BufferedInputStream(
new FileInputStream(fileName));
int p = (bin.read() << + bin.read();
String code = null;

switch (p) {
case 0xefbb:
code = "UTF-8";
break;
case 0xfffe:
code = "Unicode";
break;
case 0xfeff:
code = "UTF-16BE";
break;
default:
code = "GBK";
}

return code;
}
Then, read the text
FileInputStream fInputStream = new FileInputStream(file);
//code is returned in the above method The encoding method of
InputStreamReader inputStreamReader = new InputStreamReader(fInputStream, code);
BufferedReader in = new BufferedReader(inputStreamReader);

String strTmp = "";
//Read by line
while (( strTmp = in.readLine()) != null) {
sBuffer.append(strTmp + "/n");
}
return sBuffer.toString();

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326846958&siteId=291194637