40. Install libiconv library under linux: GB2312 and UTF-8 transcoding test

One: Introduction to libiconv library:

The libiconv library provides an iconv() function for applications that need to be converted to implement the conversion from one character code to another.

With the advent of the Internet era, text communication via the Internet has gradually increased: when browsing foreign websites, the conversion of character encoding becomes particularly important at this time. This brings about a problem, that is, many characters are not in a certain encoding method. In order to solve this confusion, Unicode encoding was established. Unicode is a super encoding that contains all the character sets of these encodings, so the default encoding for some new text formats like XML is Unicode.

But many old computers are still using local traditional character encoding methods. And some programs, such as mail programs and browsers, must be able to switch between these different user codes. Some other programs have built-in support for Unicode to smoothly support the processing of internationalization, but there is still a need to convert between Unicode and other traditional encodings. GNU's libiconv is a code conversion library designed for these two applications.

Two: Download:

Official website download:

http://www.gnu.org/software/libiconv/

Three: Compile:

sudo tar -zxvf libiconv-1.15.tar.gz -C .
sudo chown -R aston libiconv-1.15/
mkdir install_lib
sudo ./configure --prefix=/home/aston/huawei/libiconv-1.15/install_lib
sudo make
sudo make install

Four: Test:

1. Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iconv.h>

int code_convert(char *from_charset, char *to_charset, char *inbuf, size_t inlen, char *outbuf, size_t outlen) 
{
    
    
	iconv_t cd;
	char **pin = &inbuf;
	char **pout = &outbuf;

	cd = iconv_open(to_charset, from_charset);
	if (cd == 0)
	{
    
    
		return -1;
	}
	
	memset(outbuf, 0, outlen);
	if (iconv(cd, pin, &inlen, pout, &outlen) == -1)
	{
    
    
		return -1;
	}
		
	iconv_close(cd);
	*pout = '\0';

	return 0;
}

int u2g(char *inbuf, size_t inlen, char *outbuf, size_t outlen) 
{
    
    
	return code_convert((char *)"utf-8", (char *)"gb2312", inbuf, inlen, outbuf, outlen);
}

int g2u(char *inbuf, size_t inlen, char *outbuf, size_t outlen) 
{
    
    
	return code_convert((char *)"gb2312",(char *) "utf-8", inbuf, inlen, outbuf, outlen);
}

int main()
{
    
    
	char buf[16] = "粤";
	char buf2[16] = {
    
    0};

	int i = 0;

	for (i = 0; i < strlen(buf); i ++)
	{
    
    
		printf("[%s:%d]:[yang]  buf[%d] = 0X%02X\n",__FUNCTION__,__LINE__,i ,buf[i]);
	}
		
	printf("[%s:%d]:[yang] buf = %s\n",__FUNCTION__,__LINE__,buf);
	
	g2u(buf, strlen(buf), buf2, sizeof(buf2));

	for (i = 0; i < strlen(buf2); i ++)
	{
    
    
		printf("[%s:%d]:[yang]  buf2[%d] = 0X%02X\n",__FUNCTION__,__LINE__,i ,buf2[i]);
	}
	printf("[%s:%d]:[yang] buf2 = %s\n",__FUNCTION__,__LINE__,buf2);

	return 0;
}

Compile:

aston@ubuntu:/mnt/hgfs/share/source_insight/main_135/test_libiconv$ make
g++ test_libiconv.cpp -g -I./include -L./lib -liconv -lcharset -lcharset -ldl -lpthread -lz -o app.out

Print: Transcode the "Cantonese" in gb2312 format into UTF-8 format (Xshell is now configured as UTF-8 so it can be displayed normally);

aston@ubuntu:/mnt/hgfs/share/source_insight/main_135/test_libiconv$ ./app.out 
[main:49]:[yang]  buf[0] = 0XFFFFFFD4
[main:49]:[yang]  buf[1] = 0XFFFFFFC1
[main:52]:[yang] buf = Ձ
[main:58]:[yang]  buf2[0] = 0XFFFFFFE7
[main:58]:[yang]  buf2[1] = 0XFFFFFFB2
[main:58]:[yang]  buf2[2] = 0XFFFFFFA4
[main:60]:[yang] buf2 =

Insert picture description here

2. Configure Xshell to GB2312 format, then buf displays "Cantonese" and buf2 displays incorrectly;

aston@ubuntu:/mnt/hgfs/share/source_insight/main_135/test_libiconv$ ./app.out 
[main:49]:[yang]  buf[0] = 0XFFFFFFD4
[main:49]:[yang]  buf[1] = 0XFFFFFFC1
[main:52]:[yang] buf =[main:58]:[yang]  buf2[0] = 0XFFFFFFE7
[main:58]:[yang]  buf2[1] = 0XFFFFFFB2
[main:58]:[yang]  buf2[2] = 0XFFFFFFA4
[main:60]:[yang] buf2 =//这里应该是乱码,凑巧E7B2是汉字“绮”的gb2312编码;

Insert picture description here

3. The character encoding set of the Chinese character "粤": Insert picture description here
4. When Xshell is configured as GB2312 format, the reason why buf2 displays "
綺" is: The gb2312 code of "綺" is: E7B2
Insert picture description here

Guess you like

Origin blog.csdn.net/yanghangwww/article/details/113101230