870 lines
40 KiB
HTML
870 lines
40 KiB
HTML
<!-- saved from url=(0022)http://internet.e-mail -->
|
|
<!--startcut ==============================================-->
|
|
<!-- *** BEGIN HTML header *** -->
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML><HEAD>
|
|
<title>Code Optimization Using the GNU C Compiler LG #71</title>
|
|
</HEAD>
|
|
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
|
|
ALINK="#FF0000">
|
|
<!-- *** END HTML header *** -->
|
|
|
|
<CENTER>
|
|
<A HREF="http://www.linuxgazette.com/">
|
|
<IMG ALT="LINUX GAZETTE" SRC="../gx/lglogo.png"
|
|
WIDTH="600" HEIGHT="124" border="0"></A>
|
|
<BR>
|
|
|
|
<!-- *** BEGIN navbar *** -->
|
|
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="arndt.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue71/joshi.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../faq/index.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="orr.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
|
|
<!-- *** END navbar *** -->
|
|
<P>
|
|
</CENTER>
|
|
|
|
<!--endcut ============================================================-->
|
|
|
|
<H4 ALIGN="center">
|
|
"Linux Gazette...<I>making Linux just a little more fun!</I>"
|
|
</H4>
|
|
|
|
<P> <HR> <P>
|
|
<!--===================================================================-->
|
|
|
|
<center>
|
|
<H1><font color="maroon">Code Optimization Using the GNU C Compiler</font></H1>
|
|
<H4>By <a href="mailto:rahulj@veritas.com">Rahul U Joshi</a></H4>
|
|
</center>
|
|
<P> <HR> <P>
|
|
|
|
<!-- END header -->
|
|
|
|
|
|
|
|
|
|
<p>
|
|
<i>
|
|
This article describes some of the code optimization techniques used by the
|
|
GNU C Compiler, in order to give the reader a feel of what code optimization is
|
|
and how it can increase the efficiency of the generated object code.
|
|
</i></p>
|
|
|
|
<h2>Contents</h2>
|
|
<OL>
|
|
<LI><a href="#sect1">Introduction</a>
|
|
<LI><a href="#sect2">Assembly Language Code for a C Program</a>
|
|
<LI><a href="#sect3">Constant Folding</a>
|
|
<LI><a href="#sect4">Common Subexpression Elimination</a>
|
|
<LI><a href="#sect5">Dead Code Elimination</a>
|
|
<LI><a href="#sect6">Strength Reduction using Induction Variable</a>
|
|
<LI><a href="#sect7">Conclusion</a>
|
|
<LI><a href="#sect8">Acknowledgments</a>
|
|
<LI><a href="#sect9">References</a>
|
|
</OL>
|
|
|
|
<h2><A NAME="sect1">1. Introduction</A></h2>
|
|
<p>
|
|
As we all know, a compiler is a program that reads the source program in
|
|
a high-level language and translates it into (typically) machine language. This
|
|
is a complicated process involving a number of stages. If the compiler is
|
|
an <i>optimizing compiler,</i> one of these stages "optimizes" the
|
|
machine language code so that it either takes less time to run or occupies
|
|
less memory or sometimes both. Of course, whatever optimizations the compiler
|
|
does, it must not affect the logic of the program i.e. the optimization must
|
|
preserve the meaning of the program. One might wonder what type of optimizations
|
|
the compiler uses to produce efficient machine code? Since in no case the
|
|
meaning of the program being compiled should be changed, the compiler must
|
|
inspect the program very thoroughly and find out the suitable optimizations
|
|
that can be applied. As we may wonder, such a thorough analysis of the program
|
|
and then finding and applying suitable optimizations is a complex and time
|
|
consuming process, the details of which are beyond the scope of this article.
|
|
</p>
|
|
|
|
<p>
|
|
What I am going to describe in this article are a few type of code
|
|
optimizations that the GNU C Compiler uses so that we can understand how the
|
|
code is optimized and appreciate the complexity of the optimization process.
|
|
The GNU C Compiler is a sophisticated optimizing C compiler that uses a large
|
|
array of optimization techniques to produce efficient code. It may not be
|
|
possible to describe all of them, so I have chosen some that are interesting
|
|
and also easy to understand. A complete list of the optimization techniques
|
|
that are used by the GNU C compiler is available at <a href="http://www.redhat.com/products/support/gnupro/gnupro_gcc.html">http://www.redhat.com/products/support/gnupro/gnupro_gcc.html</a>.
|
|
Instead of simply describing what a particular optimization technique does,
|
|
I will describe them using the assembly language code that is generated by
|
|
the GNU C Compiler. This method will also enable the readers to further
|
|
explore code optimization carried out by the compiler and also the advanced
|
|
optimization techniques, in case they are interested.
|
|
</p>
|
|
|
|
|
|
<h2><A name="sect2">2. Assembly Language Code for a C Program</A></h2>
|
|
<p>
|
|
The GNU C Compiler (called GCC henceforth) reads a C program source file and
|
|
translates it into an object program that contains the machine code for the
|
|
program in binary form. However, it can also produce assembly language source
|
|
code for the program instead of the object code, so that we can read it and
|
|
understand how the assembly language program looks. We will be generating such
|
|
assembly language files to see the optimizations being used by the compiler,
|
|
so it would be beneficial if we see how the assembly language code for a simple
|
|
C program looks like. However, also note that we need not understand <i>each</i>
|
|
assembly language statement in the generated code, so some statements that are
|
|
not crucial towards understanding the code optimization will not be explained
|
|
for simplicity. To generate the assembly language code, create a file
|
|
<tt>test1.c</tt> as shown below and give the following command:</p>
|
|
|
|
<PRE>$ gcc -c -S test1.c</PRE>
|
|
|
|
<p>
|
|
This will generate a file <tt>test1.s</tt>, which has the assembly language listing of
|
|
the generated code for the C program. The file <tt>test1.c</tt> along with the assembly
|
|
language code is shown below.</p>
|
|
|
|
<PRE><TT>
|
|
1 : /* test1.c */
|
|
2 : /* This first file simply demonstrates how the assembly program that the
|
|
3 : compiler produces looks like and some peculiarities of the GNU
|
|
4 : assembler that follows some different conventions from MASM/TASM.
|
|
5 : */
|
|
6 : #include <stdio.h>
|
|
7 :
|
|
8 : int main()
|
|
9 : {
|
|
10 : printf("Hello, World\n");
|
|
11 : return 0;
|
|
12 : }
|
|
13 :
|
|
14 : /* end test1.c */
|
|
15 : /* ----------------------------------------------------------------- */
|
|
16 : /* generated assembly language file */
|
|
17 : .file "test1.c" /* some assembler directives to be */
|
|
18 : .version "01.01" /* ignored */
|
|
19 : gcc2_compiled.:
|
|
20 : .section .rodata /* this segment has read-only data */
|
|
21 : .LC0:
|
|
22 : .string "Hello, World\n"
|
|
23 : .text
|
|
24 : .align 4
|
|
25 : .globl main
|
|
26 : .type main,@function
|
|
27 : main: /* main function begins */
|
|
28 : pushl $.LC0 /* push parameter for printf() on stack */
|
|
29 : call printf /* call the function */
|
|
30 : addl $4,%esp /* clear the stack */
|
|
31 :
|
|
32 : xorl %eax,%eax /* make EAX = 0, functions use register */
|
|
33 : jmp .L1 /* EAX to return values */
|
|
34 : .p2align 4,,7 /* this is an alignment directive */
|
|
35 : .L1:
|
|
36 : ret /* return from main, done */
|
|
37 : .Lfe1:
|
|
38 : /* other assembler directives to be ignored */
|
|
39 : .size main,.Lfe1-main
|
|
40 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
|
|
41 : /* end of the generated assembly language file */
|
|
42 : /* ---------------------------------------------------------------- */
|
|
</TT></PRE>
|
|
|
|
<p>
|
|
In case you know about the architecture
|
|
of the 80386 or higher microprocessors and have experience with assembly
|
|
language programming, you will find that the assembly language that is produced
|
|
is only partly familiar to you. This assembly language follows the AT&T
|
|
assembler syntax that is different from the Intel/Microsoft
|
|
Assembler/Turbo Assembler syntax that most of us are probably familiar with.
|
|
Let us go through the assembly language code. We will ignore the assembler
|
|
directives as they are not crucial towards understanding the generated code.
|
|
On line 20 a read-only data segment has been defined with the string <tt>"Hello,
|
|
World\n"</tt> in it. A label <tt>.LC0</tt> has been assigned to the string. On line 27, the code for the <tt>main()</tt>
|
|
function begins. As we know, in C language the parameters to functions are
|
|
passed by pushing them on the stack, and the parameters are pushed in the
|
|
reverse order i.e. the last parameter is pushed first on the stack. In this
|
|
case, the <tt>printf()</tt> function is being called with a single parameter, the
|
|
string <tt>"Hello, World\n"</tt>. The statement <tt>pushl $.LC0</tt> pushes the address of the
|
|
string <tt>"Hello, World\n"</tt> on the stack. The "<tt>l</tt>" in <tt>pushl</tt> stands for "long" as we
|
|
are dealing with 32 bit variables. In all the assembly language programs that
|
|
we will see, the mnemonics will be followed by an "<tt>l</tt>" to indicate that we are
|
|
dealing with 32 bit variables. The "<tt>$</tt>" preceding <tt>.LC0</tt> means the address of the
|
|
string. The next statement calls the <tt>printf()</tt> function. After the <tt>printf()</tt>
|
|
function finishes execution, we need to cleanup the stack, so we need to add 4
|
|
to ESP. (Why 4? That's because we pushed 4 bytes on the stack before calling
|
|
<tt>printf()</tt>.)
|
|
To do so, we would normally write, <tt>ADD ESP, 4</tt>. The Intel convention is
|
|
<tt><instruction> dest, src</tt>. However, the AT&T convention is <tt><instruction> src, dest</tt>, so you can
|
|
see that the instruction on line 30 is <tt>addl $4, %esp</tt>. Immediate operands like
|
|
4 are prefixed by a $ and register names are prefixed by a %. This convention
|
|
was followed to maintain compatibility with the BSD assembler. The next
|
|
statement XOR's EAX with itself so that EAX = 0 afterwards. This is because
|
|
the return value from a function is stored in the EAX register. After that,
|
|
you will see a jump instruction and an alignment directive. These will not be
|
|
explained and it will suffice to know that <tt>main()</tt> function will return back at
|
|
this point. Now that we understand how the assembly language code looks like,
|
|
let us move towards the optimizations.</p>
|
|
|
|
<h2><A name="sect3">3. Constant Folding</a></h2>
|
|
<p>
|
|
Constant folding is the simplest code optimization to understand. Let
|
|
us suppose that you write the statement <tt>x = 45 * 88;</tt> in your C program. A non-optimizing
|
|
compiler will generate code to multiply 45 by 88 and store the
|
|
value in x. An optimizing compiler will detect that both 45 and 88 are
|
|
constants, so their product will also be a constant. Hence it will find 45 *
|
|
88 = 3960 and generate code that simply copies 3960 into x. This is
|
|
<i>constant folding</i>, and means the calculation is done just once, at
|
|
compile time, rather than every time the program is run. To illustrate this,
|
|
create the file <tt>test2.c</tt> as shown below. Generate the assembly code
|
|
for this program as was shown earlier. Since by default, GCC does not
|
|
optimize the program, you will see that the assembly code is a
|
|
straightforward translation of the C program (Lines 18 to 62).</p>
|
|
|
|
<PRE><TT>
|
|
1 : /* test2.c */
|
|
2 : /* Demonstration of constant propagation */
|
|
3 : #include <stdio.h>
|
|
4 :
|
|
5 : int main()
|
|
6 : {
|
|
7 : int x, y, z;
|
|
8 : x = 10;
|
|
9 : y = x + 45;
|
|
10 : z = y + 4;
|
|
11 : printf("The value of z = %d", z);
|
|
12 : return 0;
|
|
13 : }
|
|
14 : /* end of test2.c */
|
|
15 :
|
|
16 : /* ---------------------------------------------------------------- */
|
|
17 : /* assembly language file without any optimizations */
|
|
18 : .file "test2.c"
|
|
19 : .version "01.01"
|
|
20 : gcc2_compiled.:
|
|
21 : .section .rodata
|
|
22 : .LC0:
|
|
23 : .string "The value of z = %d"
|
|
24 : .text
|
|
25 : .align 4
|
|
26 : .globl main
|
|
27 : .type main,@function
|
|
28 : main:
|
|
29 : pushl %ebp /* save EBP register on stack */
|
|
30 : movl %esp,%ebp /* EBP = ESP */
|
|
31 : subl $12,%esp /* Create stack frame. 3 variables x 4 bytes */
|
|
32 :
|
|
33 : /* x = 10; */
|
|
34 : movl $10,-4(%ebp) /* x = 10. x is topmost on the stack */
|
|
35 :
|
|
36 : /* y = x + 45; */
|
|
37 : movl -4(%ebp),%edx /* EDX = x */
|
|
38 : addl $45,%edx /* EDX = EDX + 45 */
|
|
39 : movl %edx,-8(%ebp) /* y = EDX. y is second from top of stack */
|
|
40 :
|
|
41 : /* z = y + 4 */
|
|
42 : movl -8(%ebp),%edx /* EDX = y */
|
|
43 : addl $4,%edx /* EDX = EDX + 4 */
|
|
44 : movl %edx,-12(%ebp) /* z = EDX. z is third from top of stack */
|
|
45 :
|
|
46 : /* printf("The value of z = ", z); */
|
|
47 : movl -12(%ebp),%eax /* EAX = z */
|
|
48 : pushl %eax /* push EAX(=z) as first parameter of printf */
|
|
49 : pushl $.LC0 /* second parameter for printf */
|
|
50 : call printf
|
|
51 : addl $8,%esp /* clear stack */
|
|
52 :
|
|
53 : /* return 0; */
|
|
54 : xorl %eax,%eax /* for return 0 */
|
|
55 : jmp .L1
|
|
56 : .p2align 4,,7
|
|
57 : .L1:
|
|
58 : leave
|
|
59 : ret
|
|
60 : .Lfe1:
|
|
61 : .size main,.Lfe1-main
|
|
62 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
|
|
63 : /* end of assembly language code */
|
|
64 : /* ---------------------------------------------------------------- */
|
|
65 :
|
|
66 : /* ---------------------------------------------------------------- */
|
|
67 : /* generated assembly language code after optimization */
|
|
68 :
|
|
69 : .file "test2.c"
|
|
70 : .version "01.01"
|
|
71 : gcc2_compiled.:
|
|
72 : .section .rodata
|
|
73 : .LC0:
|
|
74 : .string "The value of z = %d"
|
|
75 : .text
|
|
76 : .align 4
|
|
77 : .globl main
|
|
78 : .type main,@function
|
|
79 : main:
|
|
80 : pushl %ebp /* Save EBP register on stack */
|
|
81 : movl %esp,%ebp /* EBP = ESP */
|
|
82 :
|
|
83 : /* by constant propagation, z will always be 59 */
|
|
84 : /* printf("The value of z = %d", z); */
|
|
85 : pushl $59 /* first printf parameter */
|
|
86 : pushl $.LC0 /* second printf parameter */
|
|
87 : call printf
|
|
88 : /* no need of cleanup, we are exiting anyway */
|
|
89 : /* return 0; */
|
|
90 : xorl %eax,%eax
|
|
91 : leave
|
|
92 : ret
|
|
93 : .Lfe1:
|
|
94 : .size main,.Lfe1-main
|
|
95 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
|
|
96 : /* end of assembly language code */
|
|
97 : /* ----------------------------------------------------------------- */
|
|
</TT></PRE>
|
|
|
|
<p>
|
|
There are some new things in the assembly code that we need to
|
|
understand. First of all, the <tt>main()</tt> function defines 3 local variables for
|
|
whom space will be allocated on the stack. Also, to access these variables, we
|
|
will be using the indexed addressing mode and we will be using EBP for that.
|
|
So the first statement saves the current value of EBP onto the stack. Then
|
|
the stack pointer ESP is copied into EBP (note that the source ESP is first)
|
|
so that EBP can be used for accessing the local variables. Finally, we need
|
|
to create space for three 4-byte variables on the stack, so we subtract 12
|
|
from ESP; i.e., we grow the stack by 12 bytes. <tt>x</tt> will be the
|
|
bottom-most stack element and at an offset of -4 from EBP. See the diagram
|
|
below.</p>
|
|
|
|
<pre><tt>
|
|
EBP-----> ---------------
|
|
| x | 4 Bytes
|
|
---------------
|
|
| y | 4 Bytes
|
|
---------------
|
|
| z | 4 Bytes
|
|
ESP-----> ---------------
|
|
| |
|
|
| |
|
|
^ | |
|
|
| ...
|
|
Address increases | | |
|
|
as we go up | 0 ---------------
|
|
</tt></pre>
|
|
|
|
<p>
|
|
Similarly, the offset of y from EBP is -8 and that of z is -12. Now consider the
|
|
assignment <tt>x = 10;</tt>. The statement to implement it on line 34 is
|
|
<tt>movl $10, -4(%ebp)</tt>. The indexed addressing mode is written as
|
|
<tt><offset>(<base register>)</tt>. Thus the above statement will copy the constant 10
|
|
to the address (EBP - 4) i.e. the address of <tt>x</tt> on the stack. In a similar
|
|
manner, <tt>-8(%ebp)</tt> is used to access <tt>y</tt> and <tt>-12(%ebp)</tt> is used to access <tt>z</tt>. Lines
|
|
37, 38 and 39 evaluate <tt>x + 45</tt> and assign the result to <tt>y</tt>. To do so, the value
|
|
of <tt>x</tt> is first copied into the EDX register, 45 is added to it and the result
|
|
which is in EDX is copied to <tt>y</tt>. The code for <tt>z = y + 4</tt> is similar. In line 47
|
|
the parameters for <tt>printf()</tt> are pushed on the stack. The last parameter (<tt>z</tt>) is
|
|
pushed first and then the address of the string is pushed. In line 51, we
|
|
cleanup the stack by adding 8 to ESP. I guess that since we pushed two 4-byte
|
|
parameters, we had to add 8 to ESP. In the first example, we pushed only one
|
|
parameter and hence we had to add just 4 to ESP. The rest of the code is easy
|
|
to understand.</p>
|
|
|
|
<p>
|
|
Now if we look at the above program, when we evaluate <tt>x + 45</tt>, the
|
|
value of <tt>x</tt> is always going to be 10 because of the assignment <tt>x = 10</tt>.
|
|
Therefore <tt>x + 45</tt> will always evaluate to 55. So the statement always assigns
|
|
55 to <tt>y</tt>. Similarly, when evaluating <tt>y + 4</tt>, the result will always be 59. If
|
|
the optimization is enabled, the compiler will be in a position to detect
|
|
this. To enable optimization, use the -O2 option. Enter the following command:</p>
|
|
|
|
<PRE>gcc -c -S -O2 test2.c</PRE>
|
|
|
|
<p>
|
|
The assembly code generated with optimization enabled is shown in lines 69 to
|
|
95. Notice that on line 85, the compiler directly pushes the constant 59 on
|
|
the stack followed by the address of the string and calls <tt>printf()</tt>. Thus, by
|
|
constant folding, the compiler evaluates the various expressions in the
|
|
program only once and plugs the final value into the generated code. One more
|
|
interesting thing to observe is that after constant folding, there is no need
|
|
of the variables <tt>x</tt>, <tt>y</tt> and <tt>z</tt>. Therefore no space for them is allocated on the
|
|
stack, thus reducing the memory requirement of the program. This brings out
|
|
the fact that one optimization may lead to another one. In the above case
|
|
constant folding lead to a decrease in run time (since all the computations
|
|
are done at compile time) and also to a decrease in the space requirements.</p>
|
|
|
|
<h2><a name="sect4">4. Common Subexpression Elimination</a></h2>
|
|
<p>
|
|
Many a times, it happens that the same expression is evaluated at
|
|
different places in the same program and the values of the operands in the
|
|
expression do not change in between the two evaluations of that expression.
|
|
For example, the program may evaluate say <tt>a * b</tt> at the beginning and at the
|
|
end. If the values of <tt>a</tt> and <tt>b</tt> do not change in between these two evaluations
|
|
of <tt>a * b</tt>, then instead of evaluating <tt>a * b</tt> again at the end, we can save the
|
|
result of the evaluation of <tt>a * b</tt> at the beginning in some temporary variable
|
|
and use it at the end. This will help eliminate redundant computations in the
|
|
program. This optimization is called as <i>common subexpression elimination</i>.
|
|
Consider the following program that demonstrates it.</p>
|
|
|
|
<PRE><TT>
|
|
1 : /* test3.c */
|
|
2 : /* common subexpression elimination, and also of constant propagation */
|
|
3 : #include <stdio.h>
|
|
4 :
|
|
5 : int main()
|
|
6 : {
|
|
7 : int a, b;
|
|
8 : int x, y, z;
|
|
9 : scanf("%d %d", &a, &b);
|
|
10 : x = a * b;
|
|
11 :
|
|
12 : if(b >= 4)
|
|
13 : {
|
|
14 : y = a * b;
|
|
15 : z = 0;
|
|
16 : }
|
|
17 : else
|
|
18 : {
|
|
19 : z = a * b * 4;
|
|
20 : y = 0;
|
|
21 : }
|
|
22 :
|
|
23 : printf("x = %d, y = %d, z = %d\n", x, y, z);
|
|
24 : return 0;
|
|
25 : }
|
|
26 : /* end of test3.c */
|
|
27 :
|
|
28 : /* ----------------------------------------------------------------- */
|
|
29 : /* generated unoptimized assembly language code */
|
|
30 : .file "test3.c"
|
|
31 : .version "01.01"
|
|
32 : gcc2_compiled.:
|
|
33 : .section .rodata
|
|
34 : .LC0:
|
|
35 : .string "%d %d"
|
|
36 : .LC1:
|
|
37 : .string "x = %d, y = %d, z = %d\n"
|
|
38 : .text
|
|
39 : .align 4
|
|
40 : .globl main
|
|
41 : .type main,@function
|
|
42 : main:
|
|
43 : pushl %ebp /* save EBP */
|
|
44 : movl %esp,%ebp /* EBP = ESP */
|
|
45 : subl $20,%esp /* Create space for 5 variables */
|
|
46 :
|
|
47 : /* scanf("%d %d". &a, &b); */
|
|
48 : leal -8(%ebp),%eax
|
|
49 : pushl %eax /* push address of b on stack */
|
|
50 : leal -4(%ebp),%eax
|
|
51 : pushl %eax /* push address of a on stack */
|
|
52 : pushl $.LC0 /* push address of string */
|
|
53 : call scanf
|
|
54 : addl $12,%esp /* stack cleanup after scanf */
|
|
55 :
|
|
56 : /* x = a * b; */
|
|
57 : movl -4(%ebp),%eax /* EAX = a */
|
|
58 : imull -8(%ebp),%eax /* EAX = EAX * b = a * b */
|
|
59 : movl %eax,-12(%ebp) /* x = EAX = a * b */
|
|
60 :
|
|
61 : /* if( b >= 4)... */
|
|
62 : cmpl $3,-8(%ebp) /* compare b with 3 */
|
|
63 : jle .L2 /* else part at label .L2, if follows */
|
|
64 :
|
|
65 : /* y = a * b; */
|
|
66 : movl -4(%ebp),%eax /* EAX = a */
|
|
67 : imull -8(%ebp),%eax /* EAX = EAX * b = a * b */
|
|
68 : movl %eax,-16(%ebp) /* y = EAX = a * b */
|
|
69 : /* z = 0; */
|
|
70 : movl $0,-20(%ebp)
|
|
71 : jmp .L3 /* jump over the else part */
|
|
72 :
|
|
73 : .p2align 4,,7
|
|
74 : .L2:
|
|
75 : /* else part begins here */
|
|
76 :
|
|
77 : /* z = a * b * 4; */
|
|
78 : movl -4(%ebp),%eax /* EAX = a */
|
|
79 : imull -8(%ebp),%eax /* EAX = EAX * b = a * b */
|
|
80 : leal 0(,%eax,4),%edx /* EDX = EAX*4 + 0 */
|
|
81 : movl %edx,-20(%ebp) /* z = EDX */
|
|
82 : /* y = 0; */
|
|
83 : movl $0,-16(%ebp)
|
|
84 : .L3:
|
|
85 : /* if..else is over here */
|
|
86 :
|
|
87 : /* printf("x = %d, y = %d, z = %d\n", x, y, x); */
|
|
88 : movl -20(%ebp),%eax
|
|
89 : pushl %eax /* push address of z on stack */
|
|
90 : movl -16(%ebp),%eax
|
|
91 : pushl %eax /* push address of y on stack */
|
|
92 : movl -12(%ebp),%eax
|
|
93 : pushl %eax /* push address of x on stack */
|
|
94 : pushl $.LC1 /* address of string */
|
|
95 : call printf
|
|
96 : addl $16,%esp /* stack cleanup after printf */
|
|
97 :
|
|
98 : /* return 0 */
|
|
99 : xorl %eax,%eax
|
|
100 : jmp .L1
|
|
101 : .p2align 4,,7
|
|
102 : .L1:
|
|
103 : leave
|
|
104 : ret
|
|
105 : .Lfe1:
|
|
106 : .size main,.Lfe1-main
|
|
107 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
|
|
108 : /* end of unoptimized assembly language code */
|
|
109 : /* --------------------------------------------------------------- */
|
|
110 :
|
|
111 : /* --------------------------------------------------------------- */
|
|
112 : /* generated optimized assembly language code */
|
|
113 : .file "test3.c"
|
|
114 : .version "01.01"
|
|
115 : gcc2_compiled.:
|
|
116 : .section .rodata
|
|
117 : .LC0:
|
|
118 : .string "%d %d"
|
|
119 : .LC1:
|
|
120 : .string "x = %d, y = %d, z = %d\n"
|
|
121 : .text
|
|
122 : .align 4
|
|
123 : .globl main
|
|
124 : .type main,@function
|
|
125 : main:
|
|
126 : pushl %ebp /* save EBP */
|
|
127 : movl %esp,%ebp /* EBP = ESP */
|
|
128 : subl $8,%esp /* space for just 2 variables on stack */
|
|
129 :
|
|
130 : /* scanf("%d %d", &a, &b); */
|
|
131 : leal -4(%ebp),%eax
|
|
132 : pushl %eax /* push address of b on stack */
|
|
133 : leal -8(%ebp),%eax
|
|
134 : pushl %eax /* push address of a on stack */
|
|
135 : pushl $.LC0 /* address of string */
|
|
136 : call scanf
|
|
137 :
|
|
138 : /* x = a * b; */
|
|
139 : movl -4(%ebp),%eax /* EAX = b */
|
|
140 : movl %eax,%edx /* EDX = EAX = b */
|
|
141 : imull -8(%ebp),%edx /* EDX = EDX * a = b * a = a * b */
|
|
142 :
|
|
143 : addl $12,%esp /* delayed stack cleanup */
|
|
144 : /* if( b >= 4).... */
|
|
145 : cmpl $3,%eax /* compare EAX = b with 3 */
|
|
146 : jle .L17 /* else part from label .L17 */
|
|
147 :
|
|
148 : /* y stored in ECX, z in EAX, x in EDX */
|
|
149 : /* y = a * b; */
|
|
150 : movl %edx,%ecx
|
|
151 : /* z = 0; */
|
|
152 : xorl %eax,%eax
|
|
153 : jmp .L18 /* jump over the else part */
|
|
154 : .p2align 4,,7
|
|
155 : .L17:
|
|
156 : /* z = a * b * 4; */
|
|
157 : leal 0(,%edx,4),%eax /* LEA EAX, [EDX*4]+0 */
|
|
158 : /* y = 0; */
|
|
159 : xorl %ecx,%ecx
|
|
160 : .L18:
|
|
161 : pushl %eax /* push value of z */
|
|
162 : pushl %ecx /* push value of y */
|
|
163 : pushl %edx /* push value of x */
|
|
164 : pushl $.LC1 /* push address of string */
|
|
165 : call printf
|
|
166 : /* stack cleanup after printf not necessary */
|
|
167 :
|
|
168 : /* return 0; */
|
|
169 : xorl %eax,%eax
|
|
170 : leave
|
|
171 : ret
|
|
172 : .Lfe1:
|
|
173 : .size main,.Lfe1-main
|
|
174 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
|
|
175 : /* end optimized assembly code */
|
|
176 : /* -------------------------------------------------------------- */
|
|
</TT></PRE>
|
|
|
|
<p>
|
|
This program has an example of a common subexpression. On line 10, the expression
|
|
<tt>a * b</tt> is evaluated the first time, and then again on lines 14 and 19. The last two evaluations
|
|
of <tt>a * b</tt> are redundant since the value of neither <tt>a</tt> nor <tt>b</tt> changes after the first evaluation.
|
|
Thus these two evaluations are common subexpressions that can be eliminated. Lines 30 to 108
|
|
show the unoptimized assembly code that is a straightforward translation of the C code and
|
|
should be easy to understand. Now consider the optimized assembly language code from lines 113
|
|
to 176. The first thing to notice is that now only variables <tt>a</tt> and <tt>b</tt> are stored on the stack,
|
|
hence a stack of just 8 bytes is required as opposed to a 20 byte stack for the unoptimized
|
|
version. You may wonder what's so special about variables <tt>a</tt> and <tt>b</tt>. The specialty is that the
|
|
address of variables <tt>a</tt> and <tt>b</tt> is used in the program (for <tt>scanf()</tt>) and variables that reside in
|
|
the registers cannot have a memory address. Hence the compiler is unable to put them inside
|
|
the registers. The registers that the compiler chooses for other variables are as follows:
|
|
<tt>x</tt> in EDX, <tt>y</tt> in ECX and <tt>z</tt> in EAX. Statements 139 to 141 evaluate <tt>a * b</tt> and the value is stored
|
|
in EDX (which holds <tt>x</tt>). Also, the value of <tt>b</tt> is available in the register EAX.
|
|
One more thing to notice is the delayed cleanup of stack after the call to
|
|
<tt>scanf()</tt> on line 143. May be there is some advantage in doing so, I don't know.</p>
|
|
|
|
<p>
|
|
Next starts the <tt>if</tt> statement. In the <tt>if</tt> part, the expression <tt>a * b</tt> is reevaluated and
|
|
we expect that the compiler will use the value stored in EDX. That is exactly
|
|
what the compiler does. For the statement <tt>y = a * b</tt>, the generated code on
|
|
line 150 simply copies the value of <tt>a * b</tt> in register EDX into the register
|
|
ECX (which holds <tt>y</tt>). In the <tt>else</tt> part again, there is a common subexpression
|
|
in the statement <tt>z = a * b * 4</tt>. Thus we expect that the compiler will take
|
|
the value of <tt>a * b</tt> in EDX, multiply it by 4 and then store the result in EAX
|
|
(which holds <tt>z</tt>). Now there are many ways in which this can be done. One is
|
|
to use a sequence of <tt>movl</tt>, <tt>imull</tt> instructions as in</p>
|
|
<pre><tt>
|
|
movl %edx, %eax /* EAX = EDX = a * b */
|
|
imull $4, %eax /* EAX = EAX * 4 */
|
|
</tt></pre>
|
|
<p>
|
|
The second choice is to use a shift (<tt>sall</tt>) instead of <tt>imull</tt> in the above
|
|
sequence of instructions. However, it turns out that the GCC uses an obscure
|
|
speed trick to optimize the code. It uses the instruction</p>
|
|
<pre><tt>
|
|
leal 0(,%edx,4), %eax
|
|
</tt></pre>
|
|
<p>
|
|
This instruction uses the scaling and indexing capabilities of the 80386
|
|
processors. The above instruction takes EDX as the index register, 4 as the
|
|
scale and 0 as the offset, calculates the effective address and stores it
|
|
in the EAX register. Thus the register EAX gets a value EDX(=a*b)*4 + 0 i.e.
|
|
<tt>a * b * 4</tt>. The <tt>leal</tt> instruction always executes in 2 clock cycles on an 80386,
|
|
whereas a simple shift will take around 7 clock cycles. We see
|
|
that the compiler knows about processor peculiarities and uses
|
|
it when it generates the code.</p>
|
|
|
|
<p>
|
|
Thus it can be seen that common subexpressions save a lot of computations and also
|
|
space that is used up by the code for these redundant computations. At this stage, one can
|
|
understand that the compiler when analyzing the program for optimization must keep a track
|
|
of how and when the variables change, the expressions that are evaluated and also
|
|
which registers are allocated to variables. In general such a kind of analysis that the
|
|
compiler performs on the program is called as <i>data flow analysis</i>.</p>
|
|
|
|
|
|
<h2><a name="sect5">5. Dead Code Elimination</a></h2>
|
|
<p>
|
|
Dead code is the code in the program that will never be executed for any input or other
|
|
conditions. A common example is an <tt>if</tt> statement. If the compiler finds out that the condition
|
|
inside the <tt>if</tt> is never going to be true, then the body of the <tt>if</tt> statement will never be
|
|
executed. In that case, the compiler can completely eliminate this dead code, thus saving the
|
|
memory space occupied by the code. The following program demonstrates dead code elimination.</p>
|
|
|
|
<pre><tt>
|
|
1 : /* test4.c */
|
|
2 : /* demonstration of dead code elimination */
|
|
3 : #include <stdio.h>
|
|
4 :
|
|
5 : int main()
|
|
6 : {
|
|
7 : int x;
|
|
8 :
|
|
9 : scanf("%d", &x);
|
|
10 :
|
|
11 : if(x < 0 && x > 0)
|
|
12 : {
|
|
13 : x = 99;
|
|
14 : printf("Hello. Inside the if!!!");
|
|
15 : }
|
|
16 :
|
|
17 : return 0;
|
|
18 : }
|
|
19 : /* end of test4.c */
|
|
20 :
|
|
21 : /* --------------------------------------------------------------- */
|
|
22 : /* optimized assembly code */
|
|
23 : .file "test4.c"
|
|
24 : .version "01.01"
|
|
25 : gcc2_compiled.:
|
|
26 : .section .rodata
|
|
27 : .LC0:
|
|
28 : .string "%d"
|
|
29 : .LC1:
|
|
30 : .string "Hello. Inside the if!!!"
|
|
31 : .text
|
|
32 : .align 4
|
|
33 : .globl main
|
|
34 : .type main,@function
|
|
35 : main:
|
|
36 : pushl %ebp /* save EBP on stack */
|
|
37 : movl %esp,%ebp /* EBP = ESP */
|
|
38 : subl $4,%esp /* create space for x on the stack */
|
|
39 :
|
|
40 : /* scanf("%d", &x); */
|
|
41 : leal -4(%ebp),%eax
|
|
42 : pushl %eax /* push address of a on stack */
|
|
43 : pushl $.LC0 /* push string on stack */
|
|
44 : call scanf
|
|
45 :
|
|
46 : /* the entire body of the if and the condition checking is dead code */
|
|
47 : /* return 0; */
|
|
48 : xorl %eax,%eax /* no stack cleanup, we are exiting anyway */
|
|
49 : leave
|
|
50 : ret
|
|
51 : .Lfe1:
|
|
52 : .size main,.Lfe1-main
|
|
53 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
|
|
54 : /* end optimized assembly code */
|
|
55 : /* ---------------------------------------------------------------- */
|
|
</tt></pre>
|
|
|
|
<p>
|
|
Since the unoptimized assembly language code serves no purpose and also its easy
|
|
to understand, I have omitted it. In the program, the condition on line 11 is <tt>x < 0 && x > 0</tt>,
|
|
which can never be true. The compiler finds this and concludes that the body of the <tt>if</tt>
|
|
statement forms dead code, so it does not generate any code for that. One interesting
|
|
thing to notice is that after the body of <tt>if</tt> has been eliminated, the string
|
|
<tt>"Hello. Inside the if!!!"</tt> is no longer needed and hence can be eliminated from the read-only
|
|
data section. However, detection of such things probably will increase the complexity of the
|
|
compiler and hence is not done.</p>
|
|
|
|
<h2><a name="sect6">6. Strength Reduction using Induction Variable</a></h2>
|
|
<p>
|
|
One type of code optimization is strength reduction in which a "costly" operation is
|
|
replaced by a less expensive one. For example, the evaluation of x<SUP>2</SUP> is much more efficient if
|
|
we multiply x by x rather than call the exponentiation routine. One place where this optimization
|
|
can be applied is in loops. Many times in a loop, one variable changes in synchronization with
|
|
the loop variable. Such a variable is called as <i>induction variable</i>. Induction variables give
|
|
the compiler a chance to apply strength reduction, as can be seen from the following program.</p>
|
|
|
|
<PRE><TT>
|
|
1 : /* test5.c */
|
|
2 : /* demonstration of induction variable elimination */
|
|
3 : int main()
|
|
4 : {
|
|
5 : int i, j;
|
|
6 :
|
|
7 : for(i = 0; i < 10; i++)
|
|
8 : {
|
|
9 : j = i * 7;
|
|
10 : printf("i = %d, j = %d", i, j);
|
|
11 : }
|
|
12 : return 0;
|
|
13 : }
|
|
14 : /* end test5.c */
|
|
15 :
|
|
16 : /* --------------------------------------------------------------- */
|
|
17 : /* optimized assembly language code */
|
|
18 : .file "test5.c"
|
|
19 : .version "01.01"
|
|
20 : gcc2_compiled.:
|
|
21 : .section .rodata
|
|
22 : .LC0:
|
|
23 : .string "i = %d, j = %d"
|
|
24 : .text
|
|
25 : .align 4
|
|
26 : .globl main
|
|
27 : .type main,@function
|
|
28 : main:
|
|
29 : pushl %ebp /* save EBP on stack */
|
|
30 : movl %esp,%ebp /* ESP = EBP */
|
|
31 :
|
|
32 : pushl %esi /* ESI will hold 'j' */
|
|
33 : pushl %ebx /* EBX will hold 'i' */
|
|
34 : xorl %ebx,%ebx /* both are initialized to zero */
|
|
35 : xorl %esi,%esi
|
|
36 : .p2align 4,,7
|
|
37 : .L5:
|
|
38 : /* printf("i = %d, j = %d", i, j); */
|
|
39 : pushl %esi /* push value of j */
|
|
40 : pushl %ebx /* push value of i */
|
|
41 : pushl $.LC0 /* push address of string */
|
|
42 : call printf
|
|
43 : addl $12,%esp /* stack cleanup */
|
|
44 :
|
|
45 : /* instead of saying j = i * 7, its efficient to say j = j + 7 */
|
|
46 : addl $7,%esi
|
|
47 : incl %ebx /* i++ */
|
|
48 : cmpl $9,%ebx /* is i <= 9, then repeat the loop */
|
|
49 : jle .L5
|
|
50 :
|
|
51 : /* return 0; */
|
|
52 : xorl %eax,%eax
|
|
53 : leal -8(%ebp),%esp
|
|
54 : popl %ebx
|
|
55 : popl %esi
|
|
56 : leave
|
|
57 : ret
|
|
58 : .Lfe1:
|
|
59 : .size main,.Lfe1-main
|
|
60 : .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
|
|
61 :
|
|
62 : /* end optimized assembly code */
|
|
63 : /* ----------------------------------------------------------------- */
|
|
</TT></PRE>
|
|
|
|
<p>
|
|
Here <tt>i</tt> is the loop variable whereas <tt>j</tt> is the induction variable as <tt>j</tt> always changes
|
|
in lock step with <tt>i</tt>. In the generated assembly language code, the compiler has decided to
|
|
use registers to store both <tt>i</tt> and <tt>j</tt> with <tt>i</tt> in EBX and <tt>j</tt> in ESI. Line 34 initializes EBX(=<tt>i</tt>) to
|
|
zero, as is indicated in the initialization part of the for loop. The compiler sees that during
|
|
the first pass, <tt>j</tt> will be assigned a value of 0, so it also initialized ESI to 0 in line 35.
|
|
The loop starts at line 37 and continues till line 49. Since <tt>j</tt> already has its value assigned
|
|
to it, the first thing the compiler does inside the loop is to call the <tt>printf()</tt> function. Here,
|
|
the compiler has adjusted the order of the statements in the loop so as to enable it to use
|
|
strength reduction. After analyzing the program, the compiler sees that inside the loop,
|
|
the value of <tt>i</tt> is always going to increase by 1 and since <tt>j</tt> is an induction variable, its
|
|
value is always going to increase by 7. Therefore, instead of multiplying <tt>i</tt> by 7 (which is costly),
|
|
it adds 7 to the old value of <tt>j</tt> before going on to the next iteration. Thus a costly operation
|
|
of multiplication has been replaced by addition which is cheaper (in terms of time).</p>
|
|
|
|
<h2><a name="sect7">7. Conclusion</a></h2>
|
|
|
|
<p>
|
|
In this article we have seen some very basic code optimization techniques that the
|
|
GNU C Compiler uses to optimize the generated code. From these optimization techniques and
|
|
the various pieces of information that are required to apply them, the reader can appreciate
|
|
the type of sophisticated and complex analysis that the compiler must carry out on the
|
|
program. It has to keep track of various things like variables, their storage places (memory
|
|
or registers), expression evaluations, constant expressions, dead code etc. It is because of
|
|
this high complexity of the process that the compiler takes much longer to compile a program
|
|
with optimization enabled. There are many details that are involved in code
|
|
optimization and code generation that I have not explained and that I don't know.
|
|
Code optimization is a field of active research and interested
|
|
readers can refer to [1] for additional information.</p>
|
|
|
|
<h2><a name="sect8">8. Acknowledgments</a></h2>
|
|
<p>
|
|
I would like to thank my Bachelor's project guide Dr. Uday Khedker for creating my
|
|
interest in compilers and code optimization. I would also like to thank the <i>Linux
|
|
Gazette, Linux Documentation Project, PC Quest</I> and the <i>Pune Linux User's Group</i>
|
|
(<a href="http://www.pluggies.org">http://www.pluggies.org</a>)
|
|
which have earlier accepted and published my contributions that inspires me to write more.
|
|
</p>
|
|
<h2><a name="sect9">9. References</a></h2>
|
|
|
|
<OL>
|
|
<LI><i>Compilers: Principles, Techniques, and Tools</i>, A.V.Aho, Ravi Sethi and J.D.Ullman, Addison Wesley.
|
|
<LI><i>Systems Programming and Operating Systems</i>, D.M.Dhamdhere, Tata McGraw-Hill.
|
|
<LI><A HREF="http://www.linuxdoc.org/HOWTO/Assembly-HOWTO/index.html"><i>Assembly HOWTO</i></A>, Franois-Ren Rideau, Linux Documentation Project.
|
|
<LI><i>Advanced 80386 Programming Techniques</i>, James L. Turley, Osborne McGraw-Hill.
|
|
</OL>
|
|
|
|
|
|
|
|
|
|
|
|
<!-- *** BEGIN bio *** -->
|
|
<SPACER TYPE="vertical" SIZE="30">
|
|
<P>
|
|
<H4><IMG ALIGN=BOTTOM ALT="" SRC="../gx/note.gif">Rahul U Joshi</H4>
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<EM>I have recently joined Veritas Software India Pvt. Ltd. as an Associate
|
|
Software Engineer. I have worked on a research project entitled "Node Listing
|
|
Based Global Data Flow Analysis" during my Bachelor's degree. I am interested
|
|
in compilers, programming languages, operating systems and parallel/distributed
|
|
processing. I have contributed articles to Linux Gazette about
|
|
parallel processing and about optimizing C.</EM>
|
|
|
|
<!-- *** END bio *** -->
|
|
|
|
<!-- *** BEGIN copyright *** -->
|
|
<P> <hr> <!-- P -->
|
|
<H5 ALIGN=center>
|
|
|
|
Copyright © 2001, Rahul U Joshi.<BR>
|
|
Copying license <A HREF="../copying.html">http://www.linuxgazette.com/copying.html</A><BR>
|
|
Published in Issue 71 of <i>Linux Gazette</i>, October 2001</H5>
|
|
<!-- *** END copyright *** -->
|
|
|
|
<!--startcut ==========================================================-->
|
|
<HR><P>
|
|
<CENTER>
|
|
<!-- *** BEGIN navbar *** -->
|
|
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="arndt.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue71/joshi.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../faq/index.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="orr.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
|
|
<!-- *** END navbar *** -->
|
|
</CENTER>
|
|
</BODY></HTML>
|
|
<!--endcut ============================================================-->
|