The efficiency problem of swapping two variables

First of all, in the era of object-oriented prevalence, I switched to using the two words object to refer to the widest range of variables. The current variable is not necessarily just an integer or floating point, or even a basic data type. We will discuss the issue of object exchange in a broader sense.

In the previous article " Questions about the exchange of two objects " (note that the name has been changed), we discussed several methods of exchanging two variables and gave a formal formula. In this article, we will discuss the issues of efficiency and feasibility. (Note: The idea on this topic was mainly triggered by the comments made by farproc friends on the previous article.)

Intermediate variable method

First, let's look at the code that uses the simplest and most direct exchange method:

{
        int tmp;
        tmp = a;
        a = b;
        b = tmp;
}

Considering the characteristics of the language itself, these codes do the following tasks:

Allocate space on the stack for the integer variable tmp;
Put the value of a into tmp;
Put the value of b into a;
Put the value of tmp into b;
Release the stack space allocated for tmp.

But actually? Let's take a look at the generated assembly code:

        movl b, %eax; load b from memory to register eax 
        movl a, %edx; load a from memory to register edx 
        movl %eax, a; store the contents of eax into memory a 
        xorl %eax, %eax; clear eax 
        movl %edx, b; store the content of edx in memory b

It seems that the assembly instructions are not as complicated as we thought. Because the variable must be loaded from the memory to the register to participate in the operation, it is enough to exchange the two variables in the reverse order and then store it in the memory. Just four instructions for exchanging data between memory and registers, it seems that there is no exchange operation. And why should eax be cleared here? Because the eax register is specially used to store the return value of the function, and our test function is very simple, except for performing the above operations, the rest is return 0;, so it has nothing to do with variable exchange. As you can see from the above, the compiler does more work for us than we thought.

XOR

Next, let's look at the code based on XOR exchange:

{
        a ^= b;
        b ^= a;
        a ^= b;
}

This code looks very pure, not a single sentence is wasteful (meaning that all operations are related to exchange, and there is no temporary variable space allocation operation like in the above example), and the code directly corresponds to the operation: three exclusive ORs. Intuitively, we think it should be the most efficient. But its side effect is that the readability of the code is greatly reduced (note that readability is very important), and some people think it is worthwhile because of the efficiency it brings. Let's see if it is worth it.

The following is the assembly code corresponding to the above code:

        movl        b, %eax       ;将b从内存载入寄存器eax
        movl        a, %ecx       ;将a从内存载入寄存器ecx
        movl        %eax, %edx    ;将eax的值保存到edx中
        xorl        %ecx, %edx    ;ecx与edx异或
        xorl        %edx, %eax    ;edx与eax异或
        xorl        %eax, %edx    ;eax与edx异或
        movl        %eax, b       ;将eax的值存入到内存b中
        xorl        %eax, %eax    ;将eax置0：设置返回值，与上例中一样
        movl        %edx, a       ;将edx的值存入到内存a中

哦，好像有点晕了。
它总共用了四次内存与寄存器之间的数据移动操作，一次寄存器之间的赋值，以及三次异或运算。
我很诧异编译器会产生这样的汇编代码，我怀疑是编译选项出了问题（这是在-O2下的结果），于是试了-O3的结果，居然也是完全一样，更令人意想不到的是，在-O1下产生的结果居然是最简洁的。不过我们先来看上面这些代码都做了些什么操作，是否都是必要的操作。

“意外”现象分析

首先我们将上面的C代码改写一下（现在想来才觉得C代码其实也是一样的迷惑人，我并不清楚它到底经过了哪些步骤，而只知道它能交换两个整型变量的值而已）：

{
        int tmp;

        tmp = a ^ b;      //得到异或的中间结果，即任何a、b中与它
                          //异或，都会得到另外一个的值（对比参考
                          //第一篇文章中关于加和乘情况的讨论）
        b = tmp ^ b;      //b的最终结果：b=(a^b)^b=a^(b^b)=a
        a = tmp ^ a;      //a的最终结果：a=(a^b)^a=b^(a^a)=b
}

现在，我们来将汇编代码逐行翻译为C代码来看看（忽略内存与寄存器之间的数据交换）：

        int tmp;        //寄存器edx对应变量tmp

        tmp = b;
        tmp = a ^ tmp;  //对应于tmp = a ^ b;
        
        b = tmp ^ b;
        
        tmp = b ^ tmp;
        a = tmp;        //对应于a = tmp ^ b;

与我们转换后的代码相比，对这段代码编译器好像有点犯迷糊了。我们明明没有用中间变量的代码，它居然不仅用了中间变量，而且还多用了两个赋值操作。
接下来我们再看在-O1下产生的结果：

        movl        b, %eax       ;将b载入到寄存器eax
        movl        %eax, %edx    ;将eax的值保存到edx
        xorl        a, %edx       ;内存a与edx异或，结果保存到edx，得到中间结果
        xorl        %edx, %eax    ;edx与eax异或，结果到eax，得到b的最终值，即a
        movl        %eax, b       ;保存到内存b
        xorl        %eax, %edx    ;edx与eax异或，结果到edx，得到a的最终值，即b
        movl        %edx, a       ;保存到内存a
        movl        $0, %eax      ;设置返回值

这一结果与我们手工转换的代码是类似的。但它不仅进行了四次内存与寄存器之间的数据移动操作（对应于中间变量交换的情况），而且还进行了一次寄存器之间的赋值，两次寄存器之间的异或运算，以及一次寄存器与内存之间的异或运算（应该包含一次内存与隐含寄存器之间的数据移动，以及一次异或运算）。由此看来，-O1产生的代码确实不如-O2产生的代码效率高，编译器并没有犯迷糊。

结论

很明显可以看出，异或方式的效率比预期的要坏得多，而且要比采用中间变量的方式更坏。现在看来，如果我们一开始就从汇编及CPU的执行流程上来考虑的话，就可以很容易的得出这一结论。在机器的角度来考虑交换两个整型变量（即相对应的内存）的值，只需要将两个变量的值载入到寄存器中，然后按相反的对应关系使用，或是按相反的对应关系保存到内存中即可，完全不需要经过中间计算。而用异或方式，除了上述内存与寄存器之间的数据移动操作外，还需要进行三次的异或操作（以及可能由此带来的移动操作）。这个结论是显而易见的。
采用异或的方式，我们不仅牺牲了可读性，而且还牺牲了效率，所以并不可取。
其它的方式，如加、乘等，用脚趾头想想也知道结果了，所以就不再讨论了。

说明

以上的结果，只是根据由C代码生成的汇编代码的行数，及其内存与寄存器之间数据移动的次数等方面比较它们的效率；C代码也是很简单而纯粹的整型变量交换，与实际情况差别较大；而且最重要的是没有来实际测量它们的运行时间，因此得出的结论并不一定正确。

本次只讨论的是对整型变量交换的情况，而实际中要交换的对象是多种多样的。比如在C++中，最常见的应该就是类对象的交换，甚至是两个不知道何种类型的对象的交换（考虑模板类的情形）。

并不是所有对象都支持异或、加、乘的运算，所以这些方法就基本舍弃了,但仍要重视它们所带来的思想上的东西（这种情况下仍然有可以用它们，但是很危险，参见注1）。而基于中间变量的方式也要加以小心，一些对象必须提供合适的拷贝构造函数和赋值运算符函数，才能保证交换操作在语义上也是正确的，比如那些内部含有指针成员的类对象。

更广泛的结论

总的来说，采用中间变量方式交换两个对象的值，是最通用、可读性最高、效率比较高的一种方式。在此我建议大家在一般情况下，都采用这种方式。（注2）

注

[1] 我们可以将对象看成若干个字符类型变量的数组，从而可以使用异或等方式。但是，这并不能保证它的语义是正确的，尤其是在C++中。可以这样说，在实际情况中，这样的操作几乎总是会带来错误。

[2] 说到最后，还不如原来就不要知道这种方法呢:)

[n] 我的系统平台是Debian 4.1.1、GCC 4.1.2，所有编译选项默认均为-O2，编译为汇编代码的选项为-S。

[n+1] farproc的汇编结果是另一种情况。在进行交换之前数据已经载入到寄存器中，从而考虑的只有寄存器中的运算。下面是他的留言：

经过我的测试（vc2005 release），使用一个临时变量的交换方式还是效率最高的。位异或的次之，相加或相乘的最慢。
其实看一下生成的汇编码就很清楚了。
使用临时变量版本：

     mov eax,edi
     mov edi,esi
     mov esi,eax

位异或版本：

     xor edi,esi
     xor esi,edi
     xor edi,esi

加减版本：

     add edi,esi
     mov ecx,edi
     sub ecx,esi
     mov esi,ecx
     sub edi,esi

[n+2] 思想在交流中迸发：kebing.zh 在 gmail 点 com
转载自:http://hi.baidu.com/bellgrade/blog/item/07664e5801deed202934f02f.html