Detailed description of size_t type in C language [Reprint]

In many library functions in C language, the parameter type in the function prototype is size_t. But the size_t type is rarely used when we write programs. So what exactly does this type do?

Using size_t may improve the portability, efficiency, or readability of your code, or perhaps all three at the same time.

　　The parameters or return values used by many functions in the standard C library represent the size of the object in bytes. For example, the parameter n of the malloc(n) function specifies the amount of space that needs to be applied for, and memcpy(s1, The last parameter of s2, n) indicates the memory size that needs to be copied. The return value of the strlen(s) function indicates the length of the string ending with '\0' (excluding '\0'), and its return value does not Not the actual length of the string, because '\0' is removed.
　　Maybe you would think that these parameters or return values should be declared as int type (or long or unsigned), but in fact they are not. They are defined in the C standard as size_t. The declaration of malloc recorded in the standard should appear at, defined as:

void *malloc(size_t n);

　　The declarations for memcpy and strlen should appear in:

void *memcpy(void *s1, void const *s2, size_t n);
size_t strlen(char const *s);

　　size_t also often appears in the C++ standard library. In addition, a similar type size_type is often used in the C++ library, and may be used more than size_t.
　　As far as I know, most C and C++ programmers are afraid of these libraries using size_t because they don't know what size_t stands for or why these libraries need to use it. Ultimately, the reason is when and where they need to use it.

Portability issues

　　The early C language (written by Brian Kernighan and Dennis Ritchie in The C Programming Language, Prentice-Hall, 1978) did not provide the size_t type. The C Standard Committee introduced size_t in order to solve the portability problem. For example: Let
　　's To write a portable standard memcpy function, we will see some different declarations and how they compile on different platforms and different sizes of address spaces.
　　Recall the memcpy(s1, s2, n) function, which copies n bytes starting from the address pointed to by s2 to the address pointed by s2, and returns s1. This function can copy any data type, so the types of parameters and return values should be Points to any type void*. At the same time, the source address should not be changed, so the second parameter s2 type should be const void*, these are not problems.
　　The real problem is how we declare the third parameter, which represents the size of the source object. I believe most programmers will choose int:

void *memcpy(void *s1, void const *s2, int n);

　　Using the int type is OK in most cases, but not in all cases. int is signed, it can represent negative numbers, however, the size cannot be complex. So we can use unsigned int instead to make the third parameter represent a larger range.
　　On most machines, the maximum value of unsigned int is twice the maximum value of int. For example, on a 16-bit machine, the maximum value of unsigned int is 65535 and the maximum value of int is 32767.
　　Although the size of the int type depends on the C compiler implementation, the size of an int object is the same as the size of an unsigned int object on a given platform. Therefore, the cost of using unsigned int to modify the third parameter is the same as int:

void *memcpy(void *s1, void const *s2, unsigned int n);

　　这样似乎没有问题了，unsigned int可以表示最大类型的对象大小了，这种情况只有在整形和指针类型具有相同大小的情况下，比如说在IP16中，整形和指针都占2个字节（16位），而在IP32上面，整形和指针都占4个字节（32位）。（参见下面C数据模型表示法）

C数据模型表示法
　　最近，我偶然发现几篇文章，他们使用简明的标记来表述不同目标平台下c语言数据的实现。我还没有找到这个标记的来源，正式的语法，甚至连名字都没有，但他似乎很简单，即使没有正规的定义也可以很容易使用起来。这些标记的一边形式形如：
　　I nI L nL LL nLL P nP。   
　　其中每个大写字母（或成对出现）代表一个C的数据类型，每一个对应的n是这个类型包含的位数。I代表int，L代表long，LL代表long long，以及P代表指针（指向数据，而不是函数）。每个字母和数字都是可选的。   
　　例如，I16P32架构支持16位int和32位指针类型，没有指明是否支持long或者long long。如果两个连续的类型具有相同的大小，通常省略第一个数字。例如，你可以将I16L32P32写为I16LP32，这是一个支持16位int，32位long，和32位指针的架构。  
　　标记通常把字母分类在一起，所以可以按照其对应的数字升序排列。例如，IL32LL64P32表示支持32位int，32位long，64位long long和32位指针的架构；然而，通常写作ILP32LL64。

　　不幸的是，这种memcpy的申明在I16LP32架构上（整形是16-bit 长整形和指针类型时32-bits）显得不够用了，比如说摩托罗拉第一代处理器68000，在这种情况下，处理器可能拷贝的数据大于65535个字节，但是这个函数第三个参数n不能处理这么大的数据。
　　什么？你说很容易就可以改正？只需要把memcpy的第三个参数的类型修改一下：

void *memcpy(void *s1, void const *s2, unsigned long  n);

　　你可以在I16LP32目标架构上使用这个函数了，它可以处理更大的数据。而且在IP16和IP32平台上效果也还行，说明它确实给出了memcpy的一种移植性较好的申明。但是，在IP16平台上相比于使用unsigned int，你使用unsigned long可能会使你的代码运行效率大打折扣（代码量变大而且运行变慢）。
　　在标准C中规定，长整形（无论无符号或者有符号）至少占用32位，因此在IP16平台上支持标准C的话，那么它一定是IP16L32 平台。这些平台通常使用一对16位的字来实现32位的长整形。在这种情况下，移动一个长整形需要两条机器指令，每条移动一个16位的块。事实上，这个平台上的大部分的32位操作都需要至上两条指令。
　　因此，以可移植性为名将memcpy的第三个参数申明为unsigned long而降低某些平台的性能是我们所不希望看到的。使用size_t可以有效避免这种情况。
　　size_t类型是一个类型定义，通常将一些无符号的整形定义为size_t，比如说unsigned int或者unsigned long，甚至unsigned long long。每一个标准C实现应该选择足够大的无符号整形来代表该平台上最大可能出现的对象大小。

使用size_t

　　size_t的定义在<stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>和<wchar.h>这些标准C头文件中，也出现在相应的C++头文件, 等等中，你应该在你的头文件中至少包含一个这样的头文件在使用size_t之前。　　包含以上任何C头文件（由C或C++编译的程序）表明将size_t作为全局关键字。包含以上任何C++头文件（当你只能在C++中做某种操作时）表明将size_t作为std命名空间的成员。　　根据定义，size_t是sizeof关键字（注：sizeof是关键字，并非运算符）运算结果的类型。所以，应当通过适当的方式声明n来完成赋值：

n = sizeof(thing);

　　考虑到可移植性和程序效率，n应该被申明为size_t类型。类似的，下面的foo函数的参数也应当被申明为sizeof：

foo(sizeof(thing));

　　参数中带有size_t的函数通常会含有局部变量用来对数组的大小或者索引进行计算，在这种情况下，size_t是个不错的选择。
　　适当地使用size_t还会使你的代码变得如同自带文档。当你看到一个对象声明为size_t类型，你马上就知道它代表字节大小或数组索引，而不是错误代码或者是一个普通的算术值。
　　本文来自点击打开链接