firebird中文字符集问题 malformed string问题



fb默认采用none字符集建库,这其实不是任何字符集,它对字符串当二进制流(数组)处理。由于字符编码原因,有时我们想查找 “%王%” 也许结果中出现一些并没有 王 的数据(排序也是一样道理,也许数据次序有些怪异)。另外,在字符存储时,数据库也不对字符串进行任何校验(我们知道,字符集的校验是一种正确性检查),这回导致存入的字符串可以包括任何编码(似乎满足灵活性需要),但问题是你取出字符串时却没法知道它的字符集而导致可能不正确还原出现乱码。


none也不是恐惧的问题。按白猫黑猫实用主义对系统和开发员也挺务实。只是要明白潜在问题以及解决方法。fb的数据库字符集好像是不能改的。所以可能需要看看高版本是否解决这个问题或自己动手写个数据迁移工具在两个不同字符集数据库中导数据。

简体中文windows是缺省文件以及输入编码是gbk。但如果你的程序是web方式的或有linux客户端情况就稍复杂些,因为web以及linux的对中文可能采用utf编码,这样会造成显示乱码或错误查询结果。如果数据库设置了gbk,数据库端会校验输出中文,会拒绝utf的汉字输入(malformed string),也给你及时提醒编码需要设置正确。

如何设置编码有两部分:
1、数据库字符编码。这个比较简单,建库语句里设置。
2、数据库客户端连接时编码选择。具体设置这和你选择的连接方式有关。比如java
jdbc:firebirdsql://localhost//home/databases/sample.gdb?lc_ctype=gb2312

连接的编码是告诉服务器,该客户端的字符串编码。所以简体windows客户端是gb2312/gbk,但如果你的数据库设置是none就没有必要告诉服务器了。





Malformed string


If you see this error, it usually means the the client is not able to transform the string you wrote to the character set acceptable to the Firebird server. Here's a short explanation how this works:

In your client program, you type in some text, which is then shown on the screen. This text has the same character set like the rest of your environment. Typically it is UTF8 on Linux, and various WIN-xxxx charset on Windows (WIN-1252 in Western Europe, WIN-1250 in Eastern Europe, etc.).

Your client tool should then transliterate this to the connection character set of your Firebird connection, and the Firebird server transliterates that to the character set of the actual columns when it stores the string into the database. If your client tool is not 'smart' enough to do the transliteration, you should set the connection character set to be the same as your environment character set. For example, if you're using isql and have a West-Europe Windows, you should type this at the beginning of your isql session:

SET NAMES WIN1252;

With this, you can even work with UTF8 databases while using isql. Some tools (like, for example, FlameRobin) are more advanced and do the needed transliteration for you, so you can use some other character set for connection (for example UTF8) without problems.


猜你喜欢

转载自blog.csdn.net/sq8706/article/details/7098567