The utf-8 encoding may be 2 bytes, 3 bytes, or 4 bytes of characters, but MySQL's utf8 encoding only supports 3 bytes of data, while the expression data of the mobile terminal is 4 bytes of characters. If you directly insert emoticon data into a database encoded with utf-8, an SQL exception will be reported in the java program.
java.sql.SQLException: Incorrect string value: ‘\xF0\x9F\x92\x94’ for column ‘name’ at row 1 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3593) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3525) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1986) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2140) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2620) at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1662) at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1581)
4-byte characters can be encoded and stored, and then decoded when they are taken out. But doing so will cause encoding and decoding to occur wherever the character is used.
utf8mb4 encoding is a superset of utf8 encoding, compatible with utf8, and can store 4-byte emoji characters.
The advantage of using utf8mb4 encoding is that when storing and retrieving data, there is no need to consider the encoding and decoding of emoji characters.
Change database encoding to utf8mb4
1. MySQL version
The minimum supported mysql version of utf8mb4 is 5.5.3+, if not, please upgrade to a newer version.
2. MySQL driver
5.1.34 is available, the minimum cannot be lower than 5.1.13
3. Modify the MySQL configuration file
Modify the mysql configuration file my.cnf (windows is my.ini)
my.cnf is generally located in etc/mysql/my.cnf. After finding it, please add the following content to the following three parts:
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE #Ignore client handshake encoding
character-set-server = utf8mb4
# collation-server = utf8mb4_unicode_ci can not be configured
# init_connect='SET NAMES utf8mb4' can not be configured
4. Restart the database and check the variables
SHOW VARIABLES WHERE Variable_name LIKE 'character_set_%' OR Variable_name LIKE 'collation%';
Variable_name Valuecharacter_set_client | utf8mb4 |
character_set_connection | utf8mb4 |
character_set_database | utf8mb4 |
character_set_filesystem | binary |
character_set_results | utf8mb4 |
character_set_server | utf8mb4 |
character_set_system | utf8 |
collation_connection | utf8mb4_unicode_ci |
collation_database | utf8mb4_unicode_ci |
collation_server | utf8mb4_unicode_ci |
It doesn't matter what collation_connection, collation_database, collation_server are.
But it must be guaranteed
System Variable Descriptioncharacter_set_client | (Character set used by client source data) |
character_set_connection | (connection layer character set) |
character_set_database | (The default character set of the currently selected database) |
character_set_results | (Query result character set) |
character_set_server | (default internal operation character set) |
These variables must be utf8mb4.
5. Configuration of database connection
In the database connection parameters:
characterEncoding=utf8 will be automatically recognized as utf8mb4, or it will be automatically detected without this parameter.
And autoReconnect=true must be added.
6. Convert the database and already built tables to utf8mb4
Change database encoding: ALTER DATABASE caitu99 CHARACTER SET utf8mb4
COLLATE utf8mb4_general_ci
;
Change the table encoding: ALTER TABLE TABLE_NAME
CONVERT TO CHARACTER SET utf8mb4
COLLATE utf8mb4_general_ci
;
also change the encoding of the column if necessary
See: https://www.cnblogs.com/shihaiming/p/5855616.html
See: http://blog.csdn.net/woslx/article/details/49685111