The branch code switch (x) is very common in our ordinary code, and it is also a relatively time-consuming operation. If optimized, the efficiency of the code can be greatly improved.
1. For branch codes of type 0 <= x <N
In this case, N cannot be too large. For the following C code:
int ref_switch(int x)
{
switch (x) {
case 0: return method_0();
case 1: return method_1();
case 2: return method_2();
case 3: return method_3();
case 4: return method_4();
case 5: return method_5();
case 6: return method_6();
case 7: return method_7();
default: return method_d();
}
}
We can use the value of the pc register as a reference and the value of x as an index to achieve. The optimized assembly code is as follows:
; int switch_relative(int x)
switch_relative
MP x, #8
ADDLT pc, pc, x, LSL#2
B method_d
B method_0
B method_1
B method_2
B method_3
B method_4
B method_5
B method_6
B method_7
2. x is an ordinary value
If you encounter x does not follow the form of 0 <= x <N, or N is very large, the above method is obviously not applicable. In this case, we can use the hashing function to map, that is, y = f (x), which can be converted into the form of 0 <= y <N, with y = f (x) instead of x as the condition for branch judgment , So that we can use the above method.
For example, suppose that when x = 2 ^ k, the method_k function is called, that is, the value of x is 1, 2, 4, 8, 16, 32, 64, 128, and other values call the default function method_d. We need to find a hash function composed of several powers of 2 minus one multiply (this method is more efficient on ARM, and direct displacement can be achieved). Through experiments, it is found that the 9-11th digits of the numbers obtained by the above 8 values x * 15 * 31 are different, we can use this feature to achieve branch jumps through bit operations.
The following is the optimized assembly code:
x RN0
hash RN 1
; int switch_hash(int x)
switch_hash
RSB hash, x, x, LSL#4 ; hash=x*15
RSB hash, hash, hash, LSL#5 ; hash=x*15*31
AND hash, hash, #7 << 9 ; mask out the hash value
ADD pc, pc, hash, LSR#6
NOP
TEQ x, #0x01
BEQ method_0
TEQ x, #0x02
BEQ method_1
TEQ x, #0x40
BEQ method_6
TEQ x, #0x04
BEQ method_2
TEQ x, #0x80
BEQ method_7
TEQ x, #0x20
BEQ method_5
TEQ x, #0x10
BEQ method_4
TEQ x, #0x08
BEQ method_3
B method_d
The above method is just a special case we cited. In the case where x is a power other than 2, we can still use a similar method to achieve. Only one idea is provided here.
3. Unaligned data access
Non-address aligned data access should be avoided as much as possible, otherwise it is detrimental to portability and efficiency.
- The simplest access method is to read and write in units of one byte or halfword. This method is more recommended, but the efficiency is relatively low.
p RN0 x RN1
t0 RN 2
t1 RN 3
t2 RN 12
; int load_32_little(char *p)
load_32_little
LDRB x, [p]
LDRB t0, [p, #1]
LDRB t1, [p, #2]
LDRB t2, [p, #3]
ORR x, x, t0, LSL#8
ORR x, x, t1, LSL#16
ORR r0, x, t2, LSL#24
MOV pc, lr
; int load_32_big(char *p)
load_32_big
LDRB x, [p]
LDRB t0, [p, #1]
LDRB t1, [p, #2]
LDRB t2, [p, #3]
ORR x, t0, x, LSL#8
ORR x, t1, x, LSL#8
ORR r0, t2, x, LSL#8
MOV pc, lr
; void store_32_little(char *p, int x)
store_32_little
STRB x, [p]
MOV t0, x, LSR#8
STRB t0, [p, #1]
MOV t0, x, LSR#16
STRB t0, [p, #2]
MOV t0, x, LSR#24
STRB t0, [p, #3]
MOV pc, lr
; void store_32_big(char *p, int x)
store_32_big
MOV t0, x, LSR#24
STRB t0, [p]
MOV t0, x, LSR#16
STRB t0, [p, #1]
MOV t0, x, LSR#8
STRB t0, [p, #2]
STRB x, [p, #3]
MOV pc,lr