LDP/LDP/howto/linuxdoc/Assembly-HOWTO.sgml

<!doctype linuxdoc system>

<!-- $Id$ -->

<!--
	This is (probably) the last release of the HOWTO with linuxdoc dtd.
	Following releases (0.6+) will be in docbook dtd.
	Translators (if any), get ready.
-->

<article>

<title>Linux Assembly HOWTO

<author>
<url url="mailto:konst@linuxassembly.org" name="Konstantin Boldyshev"> and
<url url="mailto:fare@tunes.org" name="Francois-Rene Rideau">

<date>v0.5m, October 22, 2000

<abstract>
This is the Linux Assembly HOWTO.

This document describes how to program in assembly language
using FREE programming tools,
focusing on development for or from the Linux Operating System,
mostly on IA-32 (i386) platform.

Included material may or may not be applicable
to other hardware and/or software platforms.
Contributions about them are gladly accepted.

<bf/Keywords/:
	<tt/assembly, assembler, asm, inline asm, macroprocessor, preprocessor,
	32-bit, IA-32, i386, x86, nasm, gas, as86, OS, kernel, system, libc,
	system call, interrupt, small, fast, embedded, hardware, port/
</abstract>

<toc>

<sect>INTRODUCTION
<p>

You can skip this section if you are familiar with HOWTOs,
or just hate to read all this assembly-nonrelated crap.

<sect1>Legal Blurb
<p>Copyright &copy; 1999-2000 Konstantin Boldyshev.
<p>Copyright &copy; 1996-1999 Francois-Rene Rideau.

This document may be distributed only subject to the terms and conditions set
forth in the <url url="http://linuxdoc.org/COPYRIGHT.html" name="LDP License">.
It may be reproduced and distributed in whole or in part,
in any medium physical or electronic,
provided that this license notice is displayed in the reproduction.
Commercial redistribution is permitted and encouraged.

All modified documents, including translations, anthologies,
and partial documents, must meet the following requirements:

<itemize>
<item>The modified version must be labeled as such
<item>The person making the modifications must be identified
<item>Acknowledgement of the original author must be retained
<item>The location of the original unmodified document be identified
<item>The original author's (or authors') name(s) may not be used to
	assert or imply endorsement of the resulting document without
	the original author's (or authors') permission
</itemize>

The most recent official version of this document is available from
<url url="http://linuxassembly.org" name="Linux Assembly"> and
<url url="http://linuxdoc.org" name="LDP"> sites.
If you are reading a few-months-old copy,
consider checking urls above for a new version.

<sect1>Foreword
<p>
This document aims answering questions of those
who program or want to program 32-bit x86 assembly using
<em><url url="http://www.gnu.org/philosophy/" name="free software"></em>,
particularly under the Linux operating system.
At many places, Universal Resource Locators (URL) are given for some
software or documentation repository.
This document also points to other documents about
non-free, non-x86, or non-32-bit assemblers,
although this is not its primary goal.
Also note that there are FAQs and docs about programming
on your favorite platform (whatever it is), which you should consult
for platform-specific issues, not related directly to assembly programming.

Because the main interest of assembly programming is to build
the guts of operating systems, interpreters, compilers, and games,
where C compiler fails to provide the needed expressiveness
(performance is more and more seldom as issue),
we are focusing on development of such kind of software.

If you don't know what <em/free/ software is,
please do read <em/carefully/ the GNU General Public License,
which is used in a lot of free software,
and is the model for most of their licenses.
It generally comes in a file named <tt/COPYING/ (or <tt/COPYING.LIB/).
Literature from the <url url="http://www.fsf.org" name="FSF">
(free software foundation) might help you, too.
Particularly, the interesting feature of free software
is that it comes with source code which you can consult and correct,
or sometimes even borrow from.
Read your particular license carefully and do comply to it.

<sect1>Contributions
<p>
This is an interactively evolving document: you are especially invited
to ask questions,
to answer questions,
to correct given answers,
to give pointers to new software,
to point the current maintainer to bugs or deficiencies in the pages.
In one word, contribute!

To contribute, please contact the Assembly-HOWTO maintainer.
At the time of this writing, it is
<url url="mailto:konst@linuxassembly.org" name="Konstantin Boldyshev">
and no more
<url url="mailto:fare@tunes.org" name="Francois-Rene Rideau">.
I (Fare) had been looking for some time for a serious hacker
to replace me as maintainer of this document,
and am pleased to announce Konstantin as my worthy successor.

<sect1>Credits
<p>
I would like to thank following persons, by order of appearance:
<itemize>
<item><url url="mailto:buried.alive@in.mail" name="Linus Torvalds">
	for Linux
<item><url url="mailto:bde@zeta.org.au" name="Bruce Evans">
	for bcc from which as86 is extracted
<item><url url="mailto:anakin@pobox.com" name="Simon Tatham"> and
      <url url="mailto:jules@earthcorp.com" name="Julian Hall">
	for NASM
<item><url url="mailto:gregh@metalab.unc.edu" name="Greg Hankins">
	and now
	<url url="mailto:linux-howto@metalab.unc.edu" name="Tim Bynum">
	for maintaining HOWTOs
<item><url url="mailto:raymoon@moonware.dgsys.com" name="Raymond Moon">
	for his FAQ
<item><url url="mailto:dumas@linux.eu.org" name="Eric Dumas">
	for his translation of the mini-HOWTO into French
	(sad thing for the original author to be French and write in English)
<item><url url="mailto:paul@geeky1.ebtech.net" name="Paul Anderson">
	and <url url="mailto:rahim@megsinet.net" name="Rahim Azizarab">
	for helping me, if not for taking over the HOWTO.
<item><url url="mailto:pcg@goof.com" name="Marc Lehman">
	for his insight on GCC invocation.
<item><url url="mailto:ams@wiw.org" name="Abhijit Menon-Sen">
	for helping me figure out the argument passing convention
<item>All the people who have contributed ideas, answers, remarks, and moral support.
</itemize>

<sect1>History
<p>
Each version includes a few fixes and minor corrections,
that need not to be repeatedly mentioned every time.
<descrip>
<tag/Version 0.5m	22 Oct 2000/
    Linux 2.4 system calls can have 6 args,
    Added ALD note to FAQ,
    fixed mailing list subscribe address

<tag/Version 0.5l	23 Aug 2000/
    Added TDASM, updates on NASM

<tag/Version 0.5k	11 Jul 2000/
    Few additions to FAQ

<tag/Version 0.5j	14 Jun 2000/
    Complete rearrangement of INTRODUCTION and RESOURCES;
    FAQ added to RESOURCES, misc cleanups and additions
    (and more to come)

<tag/Version 0.5i	04 May 2000/
    Added HLA, TALC;
    rearrangements in RESOURCES, QUICK START, ASSEMBLERS;
    few new pointers

<tag/Version 0.5h	09 Apr 2000/
    finally managed to state LDP license on document,
    new resources added, misc fixes

<tag/Version 0.5g	26 Mar 2000/
    new resources on different CPUs

<tag/Version 0.5f	02 Mar 2000/
    new resources, misc corrections

<tag/Version 0.5e	10 Feb 2000/
    url updates, changes in GAS example

<tag/Version 0.5d	01 Feb 2000/
    RESOURCES (former POINTERS) section completely redone,
    various url updates.

<tag/Version 0.5c	05 Dec 1999/
    New pointers, updates and some rearrangements.
    Rewrite of sgml source.

<tag/Version 0.5b	19 Sep 1999/
    Discussion about libc or not libc continues.
    New web pointers and and overall updates.

<tag/Version 0.5a	01 Aug 1999/
    "QUICK START" section rearranged, added GAS example.
    Several new web pointers.

<tag/Version 0.5	25 July 1999/
    GAS has 16-bit mode.
    New maintainer (at last): Konstantin Boldyshev.
    Discussion about libc or not libc.
    Added section "QUICK START" with examples of using assembly.

<tag/Version 0.4q	22 June 1999/
    process argument passing (argc,argv,environ) in assembly.
    This is yet another
	"last release by Fare before new maintainer takes over".
    Nobody knows who might be the new maintainer.

<tag/Version 0.4p	6 June 1999/
  clean up and updates.

<tag/Version 0.4o	1 December 1998/ *

<tag/Version 0.4m	23 March 1998/
  corrections about gcc invocation

<tag/Version 0.4l	16 November 1997/
  release for LSL 6th edition.

<tag/Version 0.4k	19 October 1997/ *

<tag/Version 0.4j	7 September 1997/ *

<tag/Version 0.4i	17 July 1997/
  info on 16-bit mode access from Linux.

<tag/Version 0.4h	19 Jun 1997/
  still more on "how not to use assembly";
  updates on NASM, GAS.

<tag/Version 0.4g	30 Mar 1997/ *

<tag/Version 0.4f	20 Mar 1997/ *

<tag/Version 0.4e	13 Mar 1997/
  Release for DrLinux

<tag/Version 0.4d	28 Feb 1997/
  Vapor announce of a new Assembly-HOWTO maintainer.

<tag/Version 0.4c	9 Feb 1997/
  Added section "DO YOU NEED ASSEMBLY?"

<tag/Version 0.4b	3 Feb 1997/
  NASM moved: now is before AS86

<tag/Version 0.4a	20 Jan 1997/
  CREDITS section added

<tag/Version 0.4	20 Jan 1997/
  first release of the HOWTO as such.

<tag/Version 0.4pre1	13 Jan 1997/
  text mini-HOWTO transformed into a full linuxdoc-sgml HOWTO,
  to see what the SGML tools are like.

<tag/Version 0.3l	11 Jan 1997/ *

<tag/Version 0.3k	19 Dec 1996/
  What? I had forgotten to point to terse???

<tag/Version 0.3j	24 Nov 1996/
  point to French translated version

<tag/Version 0.3i	16 Nov 1996/
  NASM is getting pretty slick

<tag/Version 0.3h	6 Nov 1996/
  more about cross-compiling -- See on sunsite: devel/msdos/

<tag/Version 0.3g	2 Nov 1996/
  Created the History. Added pointers in cross-compiling section.
  Added section about I/O programming under Linux (particularly video).

<tag/Version 0.3f	17 Oct 1996/ *

<tag/Version 0.3c	15 Jun 1996/ *

<tag/Version 0.2	04 May 1996/ *

<tag/Version 0.1	23 Apr 1996/
  Francois-Rene "Fare" Rideau &lt;fare@tunes.org&gt;
  creates and publishes the first mini-HOWTO,
  because "I'm sick of answering ever the same questions
   on comp.lang.asm.x86"
</descrip>


<sect>DO YOU NEED ASSEMBLY?<label id="doyouneedasm">
<p>
Well, I wouldn't want to interfere with what you're doing,
but here is some advice from hard-earned experience.


<sect1>Pros and Cons
<p>

<sect2>The advantages of Assembly
<p>
Assembly can express very low-level things:
<itemize>
<item>you can access machine-dependent registers and I/O.
<item>you can control the exact behavior of code
	in critical sections that might otherwise involve deadlock
	between multiple software threads or hardware devices.
<item>you can break the conventions of your usual compiler,
	which might allow some optimizations
	(like temporarily breaking rules about memory allocation,
	threading, calling conventions, etc).
<item>you can build interfaces between code fragments
	using incompatible such conventions
	(e.g. produced by different compilers,
	or separated by a low-level interface).
<item>you can get access to unusual programming modes of your processor
	(e.g. 16 bit mode to interface startup, firmware, or legacy code
	on Intel PCs)
<item>you can produce reasonably fast code for tight loops
	to cope with a bad non-optimizing compiler
	(but then, there are free optimizing compilers available!)
<item>you can produce hand-optimized code
	perfectly tuned for your particular hardware setup,
	though not to anyone else's.
<item>you can write some code for your new language's
	optimizing compiler
	(that's something few will ever do, and even they, not often).
</itemize>
<p>


<sect2>The disadvantages of Assembly
<p>
Assembly is a very low-level language
(the lowest above hand-coding the binary instruction patterns).
This means
<itemize>
<item>it's long and tedious to write initially,
<item>it's quite bug-prone,
<item>your bugs can be very difficult to chase,
<item>it's very difficult to understand and modify,
	i.e. to maintain.
<item>the result is very non-portable to other architectures,
	existing or future,
<item>your code will be optimized only for a certain implementation
	of a same architecture:
	for instance, among Intel-compatible platforms,
	each CPU design and its variations
	(relative latency, throughput, and capacity,
	of processing units, caches, RAM, bus, disks,
	presence of FPU, MMX, 3DNOW, SIMD extensions, etc)
	implies potentially completely different optimization techniques.
	CPU designs already include:
	Intel 386, 486, Pentium, PPro, Pentium II, Pentium III;
	Cyrix 5x86, 6x86; AMD K5, K6 (K6-2, K6-III), K7 (Athlon).
	New designs keep popping up, so don't expect either this listing
	or your code to be up-to-date.
<item>you spend more time on a few details,
	and can't focus on small and large algorithmic design,
	that are known to bring the largest part of the speed up.
	&lsqb;e.g. you might spend some time building very fast
	list/array manipulation primitives in assembly;
	only a hash table would have sped up your program much more;
	or, in another context, a binary tree;
	or some high-level structure distributed over a cluster of CPUs&rsqb;
<item>a small change in algorithmic design might completely
	invalidate all your existing assembly code.
	So that either you're ready (and able) to rewrite it all,
	or you're tied to a particular algorithmic design;
<item>On code that ain't too far from what's in standard benchmarks,
	commercial optimizing compilers outperform hand-coded assembly
	(well, that's less true on the x86 architecture
	than on RISC architectures,
	and perhaps less true for widely available/free compilers;
	anyway, for typical C code, GCC is fairly good);
<item>And in any case, as says moderator John Levine on
	<url url="news:comp.compilers" name="comp.compilers">,
	"compilers make it a lot easier to use complex	data structures,
	and compilers don't get bored halfway through
	and generate reliably pretty good code."
	They will also <em/correctly/ propagate code transformations
	throughout the whole (huge) program
	when optimizing code between procedures and module boundaries.
</itemize>
<p>

<sect2>Assessment
<p>
All in all, you might find that though using assembly is sometimes needed,
and might even be useful in a few cases where it is not, you'll want to:
<itemize>
<item>minimize the use of assembly code,
<item>encapsulate this code in well-defined interfaces
<item>have your assembly code automatically generated
	from patterns expressed in a higher-level language
	than assembly (e.g. GCC inline assembly macros).
<item>have automatic tools translate these programs
	into assembly code
<item>have this code be optimized if possible
<item>All of the above,
	i.e. write (an extension to) an optimizing compiler back-end.
</itemize>

Even in cases when assembly is needed (e.g. OS development),
you'll find that not so much of it is,
and that the above principles hold.

See the Linux kernel sources concerning this:
as little assembly as needed,
resulting in a fast, reliable, portable, maintainable OS.
Even a successful game like DOOM was almost massively written in C,
with a tiny part only being written in assembly for speed up.


<sect1>How to NOT use Assembly
<p>

<sect2>General procedure to achieve efficient code
<p>
As says Charles Fiterman on
	<url url="news:comp.compilers" name="comp.compilers">
about human vs computer-generated assembly code,

&quot;
The human should always win and here is why.
<itemize>
<item>First the human writes the whole thing in a high level language.
<item>Second he profiles it to find the hot spots where it spends its time.
<item>Third he has the compiler produce assembly for those small
	sections of code.
<item>Fourth he hand tunes them looking for tiny improvements over
	the machine generated code.
</itemize>
The human wins because he can use the machine.
&quot;


<sect2>Languages with optimizing compilers
<p>
Languages like ObjectiveCAML, SML, CommonLISP, Scheme, ADA, Pascal, C, C++,
among others, all have free optimizing compilers
that will optimize the bulk of your programs,
and often do better than hand-coded assembly even for tight loops,
while allowing you to focus on higher-level details,
and without forbidding you to grab
a few percent of extra performance in the above-mentioned way,
once you've reached a stable design.
Of course, there are also commercial optimizing compilers
for most of these languages, too!

Some languages have compilers that produce C code,
which can be further optimized by a C compiler:
LISP, Scheme, Perl, and many other.
Speed is fairly good.

<sect2>General procedure to speed your code up
<p>
As for speeding code up,
you should do it only for parts of a program
that a profiling tool has consistently identified
as being a performance bottleneck.

Hence, if you identify some code portion as being too slow, you should
<itemize>
<item>first try to use a better algorithm;
<item>then try to compile it rather than interpret it;
<item>then try to enable and tweak optimization from your compiler;
<item>then give the compiler hints about how to optimize
(typing information in LISP; register usage with GCC;
lots of options in most compilers, etc).
<item>then possibly fallback to assembly programming
</itemize>

Finally, before you end up writing assembly,
you should inspect generated code,
to check that the problem really is with bad code generation,
as this might really not be the case:
compiler-generated code might be better than what you'd have written,
particularly on modern multi-pipelined architectures!
Slow parts of a program might be intrinsically so.
Biggest problems on modern architectures with fast processors
are due to delays from memory access, cache-misses, TLB-misses,
and page-faults;
register optimization becomes useless,
and you'll more profitably re-think data structures and threading
to achieve better locality in memory access.
Perhaps a completely different approach to the problem might help, then.


<sect2>Inspecting compiler-generated code
<p>
There are many reasons to inspect compiler-generated assembly code.
Here are what you'll do with such code:
<itemize>
<item>check whether generated code
	can be obviously enhanced with hand-coded assembly
	(or by tweaking compiler switches)
<item>when that's the case,
	start from generated code and modify it
	instead of starting from scratch
<item>more generally, use generated code as stubs to modify,
	which at least gets right the way
	your assembly routines interface to the external world
<item>track down bugs in your compiler (hopefully rarer)
</itemize>

The standard way to have assembly code be generated
is to invoke your compiler with the <tt/-S/ flag.
This works with most Unix compilers,
including the GNU C Compiler (GCC), but YMMV.
As for GCC, it will produce more understandable assembly code with
the <tt/-fverbose-asm/ command-line option.
Of course, if you want to get good assembly code,
don't forget your usual optimization options and hints!


<sect1>Linux and assembly
<p>
In general case you don't need to use assembly language in Linux programming.
Unlike DOS, you do not have to write Linux drivers in assembly
(well, actually you can do it if you really want).
And with modern optimizing compilers,
if you care of speed optimization for different CPU's,
it's much simpler to write in C.
However, if you're reading this,
you might have some reason to use assembly instead of C/C++.

You may <em/need/ to use assembly, or you may <em/want/ to use assembly.
Shortly, main practical reasons why you may need to get into Linux assembly
are <em/small code/ and <em/libc independence/.
Non-practical (and most often) reason is being just an old crazy hacker,
who has twenty years old habit of doing everything in assembly language.

Also, if you're porting Linux to some embedded hardware
	you can be quite short at size of whole system:
you need to fit kernel, libc
	and all that stuff of (file|find|text|sh|etc.) utils
	into several hundreds of kilobytes,
and every kilobyte costs much.
So, one of the ways you've got is to rewrite some
	(or all) parts of system in assembly,
and this will really save you a lot of space.
For instance, a simple <tt/httpd/ written in assembly
	can take less than 600 bytes;
you can fit a webserver, consisting of kernel and httpd,
in 400 KB or less... Think about it.

<sect>ASSEMBLERS
<p>

<sect1>GCC Inline Assembly
<p>
The well-known GNU C/C++ Compiler (GCC),
an optimizing 32-bit compiler at the heart of the GNU project,
supports the x86 architecture quite well,
and includes the ability to insert assembly code in C programs,
in such a way that register allocation can be either specified or left to GCC.
GCC works on most available platforms,
notably Linux, *BSD, VSTa, OS/2, *DOS, Win*, etc.

<sect2>Where to find GCC
<p>
The original GCC site is the GNU FTP site
	<url url="ftp://prep.ai.mit.edu/pub/gnu/gcc/">
together with all released application software from the GNU project.
Linux-configured and precompiled versions can be found in
	<url url="ftp://metalab.unc.edu/pub/Linux/GCC/">
There are a lot of FTP mirrors of both sites,
everywhere around the world, as well as CD-ROM copies.

GCC development has split into two branches some time ago (GCC 2.8 and EGCS),
but they merged back, and current GCC webpage is <url url="http://gcc.gnu.org">.

Sources adapted to your favorite OS and precompiled binaries
should be found at your usual FTP sites.

For most popular DOS port of GCC is named DJGPP,
and can be found in directories of such name in FTP sites.
See <url url="http://www.delorie.com/djgpp/">.

There are two Win32 GCC ports:
<url url="http://sourceware.cygnus.com/cygwin/" name="cygwin"> and
<url url="http://www.mingw.org" name="mingw">

There is also a port of GCC to OS/2 named EMX,
that also works under DOS,
and includes lots of unix-emulation library routines.
See around the following site:
	<url url="ftp://ftp-os2.cdrom.com/pub/os2/emx09c/">.

<!-- broken url url="http://www.leo.org/pub/comp/os/os2/gnu/emx+gcc/"-->
<!-- broken url url="http://warp.eecs.berkeley.edu/os2/software/shareware/emx.html"-->

<sect2>Where to find docs for GCC Inline Asm
<p>
The documentation of GCC includes documentation files in TeXinfo format.
You can compile them with TeX and print then result,
or convert them to .info, and browse them with emacs,
or convert them to .html, or nearly whatever you like;
convert (with the right tools) to whatever you like,
or just read as is.
The .info files are generally found on any good installation for GCC.

The right section to look for is <tt/C Extensions::Extended Asm::/

Section	<tt/Invoking GCC::Submodel Options::i386 Options::/ might help too.
Particularly, it gives the i386 specific constraint names for registers:
<tt/abcdSDB/ correspond to
<tt/&percnt;eax/,
<tt/&percnt;ebx/,
<tt/&percnt;ecx/,
<tt/&percnt;edx/,
<tt/&percnt;esi/,
<tt/&percnt;edi/
and
<tt/&percnt;ebp/
respectively (no letter for <tt/&percnt;esp/).

The DJGPP Games resource (not only for game hackers) had page
specifically about assembly, but it's down.
Its data have nonetheless been recovered on the
	<url url="http://www.delorie.com/djgpp/" name="DJGPP site">,
	that contains a mine of other useful information:
	<url url="http://www.delorie.com/djgpp/doc/brennan/">,
	and in the <url url="http://www.castle.net/&tilde;avly/djasm.html"
	name="DJGPP Quick ASM Programming Guide">.

<!-- broken url url="http://www.rt66.com/&tilde;brennan/djgpp/djgpp&lowbar;asm.html"-->

GCC depends on GAS for assembling, and follow its syntax (see below);
do mind that inline asm needs percent characters to be quoted
so they be passed to GAS.
See the section about GAS below.

Find <em/lots/ of useful examples in the <tt>linux/include/asm-i386/</tt>
subdirectory of the sources for the Linux kernel.


<sect2>Invoking GCC to build proper inline assembly code
<p>
Because assembly routines from the kernel headers
(and most likely your own headers,
if you try making your assembly programming as clean
as it is in the linux kernel)
are embedded in <tt/extern inline/ functions,
GCC must be invoked with the <tt/-O/ flag
	(or <tt/-O2/, <tt/-O3/, etc),
for these routines to be available.
If not, your code may compile, but not link properly,
since it will be looking for non-inlined <tt/extern/ functions
in the libraries against which your program is being linked!
Another way is to link against libraries that include fallback
versions of the routines.

Inline assembly can be disabled with <tt/-fno-asm/,
which will have the compiler die when using extended inline asm syntax,
or else generate calls to an external function named <tt/asm()/
that the linker can't resolve.
To counter such flag, <tt/-fasm/ restores treatment
	of the <tt/asm/ keyword.

More generally, good compile flags for GCC on the x86 platform are
<code>
	gcc -O2 -fomit-frame-pointer -W -Wall
</code>

<tt/-O2/ is the good optimization level in most cases.
Optimizing besides it takes longer, and yields code that is a lot larger,
but only a bit faster;
such overoptimization might be useful for tight loops only (if any),
which you may be doing in assembly anyway.
In cases when you need really strong compiler optimization for a few files,
do consider using up to <tt/-O6/.

<tt/-fomit-frame-pointer/ allows generated code to skip the stupid
frame pointer maintenance, which makes code smaller and faster,
and frees a register for further optimizations.
It precludes the easy use of debugging tools (<tt/gdb/),
but when you use these,
you just don't care about size and speed anymore anyway.

<tt/-W -Wall/ enables all warnings
and helps you catch obvious stupid errors.

You can add some CPU-specific <tt/-m486/ or such flag so that
GCC will produce code that is more adapted to your precise computer.
Note that modern GCC has <tt/-mpentium/ and such flags
(and <url url="http://goof.com/pcg/" name="PGCC"> has even more),
whereas GCC 2.7.x and older versions do not.
A good choice of CPU-specific flags should be in the Linux kernel.
Check the TeXinfo documentation of your current GCC installation for more.

<tt/-m386/ will help optimize for size,
hence also for speed on computers whose memory is tight and/or loaded,
since big programs cause swap, which more than counters
any "optimization" intended by the larger code.
In such settings, it might be useful to stop using C,
and use instead a language that favors code factorization,
such as a functional language and/or FORTH,
and use a bytecode- or wordcode- based implementation.

Note that you can vary code generation flags from file to file,
so performance-critical files will use maximum optimization,
whereas other files will be optimized for size.

To optimize even more, option <tt/-mregparm=2/
and/or corresponding function attribute might help,
but might pose lots of problems when linking to foreign code,
<em/including libc/.
There are ways to correctly declare foreign functions
so the right call sequences be generated,
or you might want to recompile the foreign libraries
to use the same register-based calling convention...

Note that you can add make these flags the default by editing file
	<tt>/usr/lib/gcc-lib/i486-linux/2.7.2.3/specs</tt>
or wherever that is on your system
	(better not add <tt/-W -Wall/ there, though).
The exact location of the GCC specs files on <em/your/ system
can be found by asking <tt/gcc -v/.


<sect1>GAS
<p>
GAS is the GNU Assembler, that GCC relies upon.


<sect2>Where to find it
<p>
Find it at the same place where you found GCC,
in a package named binutils.

The latest version is available from HJLu at
	<url url="ftp://ftp.varesearch.com/pub/support/hjl/binutils/">.


<sect2>What is this AT&amp;T syntax
<p>
Because GAS was invented to support a 32-bit unix compiler,
it uses standard AT&amp;T syntax,
which resembles a lot the syntax for standard m68k assemblers,
and is standard in the UNIX world.
This syntax is no worse, no better than the Intel syntax.
It's just different.
When you get used to it,
you find it much more regular than the Intel syntax,
though a bit boring.

Here are the major caveats about GAS syntax:
<itemize>
<item>
Register names are prefixed with <tt/&percnt;/, so that
registers are <tt/&percnt;eax/, <tt/&percnt;dl/ and so on,
instead of just <tt/eax/, <tt/dl/, etc.
This makes it possible to include external C symbols directly
in assembly source, without any risk of confusion, or any need
for ugly underscore prefixes.
<item>
The order of operands is source(s) first, and destination last,
as opposed to the Intel convention of destination first and sources last.
Hence, what in Intel syntax is <tt/mov ax,dx/ (move contents of
register <tt/dx/ into register <tt/ax/) will be in GAS syntax
<tt/mov &percnt;dx, &percnt;ax/.
<item>
The operand length is specified as a suffix to the instruction name.
The suffix is <tt/b/ for (8-bit) byte,
	<tt/w/ for (16-bit) word,
and <tt/l/ for (32-bit) long.
For instance, the correct syntax for the above instruction
	would have been <tt/movw &percnt;dx,&percnt;ax/.
However, gas does not require strict AT&amp;T syntax,
so the suffix is optional when length can be guessed from register operands,
and else defaults to 32-bit (with a warning).
<item>
Immediate operands are marked with a <tt/&dollar;/ prefix,
as in <tt/addl &dollar;5,&percnt;eax/
(add immediate long value 5 to register <tt/&percnt;eax/).
<item>
No prefix to an operand indicates it is a memory-address;
hence <tt/movl &dollar;foo,&percnt;eax/
puts the <em/address/ of variable <tt/foo/
		in register <tt/&percnt;eax/,
but <tt/movl foo,&percnt;eax/
puts the <em/contents/ of variable <tt/foo/
	in register <tt/&percnt;eax/.
<item>
Indexing or indirection is done by enclosing the index register
or indirection memory cell address in parentheses,
as in <tt/testb &dollar;0x80,17(&percnt;ebp)/
(test the high bit of the byte value at offset 17
from the cell pointed to by <tt/&percnt;ebp/).
</itemize>


A program exists to help you convert programs
from TASM syntax to AT&amp;T syntax. See
	<url url="ftp://x2ftp.oulu.fi/pub/msdos/programming/convert/ta2asv08.zip">.
<!--
	(Since the original x2ftp site is closing (no more?), use a
	<url url="ftp://ftp.lip6.fr/pub/pc/x2ftp/README.mirror_sites"
		name="mirror site">).
-->
There also exists a program for the reverse conversion:
	<url url="http://www.multimania.com/placr/a2i.html">.


GAS has comprehensive documentation in TeXinfo format,
which comes at least with the source distribution.
Browse extracted .info pages with Emacs or whatever.
There used to be a file named gas.doc or as.doc
around the GAS source package, but it was merged into the TeXinfo docs.
Of course, in case of doubt, the ultimate documentation
is the sources themselves!
A section that will particularly interest you is
	<tt/Machine Dependencies::i386-Dependent::/


Again, the sources for Linux (the OS kernel) come in as excellent examples;
see under <tt>linux/arch/i386/</tt> the following files:
<tt>kernel/*.S</tt>, <tt>boot/compressed/*.S</tt>, <tt>mathemu/*.S</tt>.

If you are writing kind of a language, a thread package, etc.,
you might as well see how other languages
	(<url url="http://para.inria.fr/" name="OCaml">,
	<url url="http://www.jwdt.com/~paysan/gforth.html" name="Gforth">,
	etc.),
or thread packages (QuickThreads, MIT pthreads, LinuxThreads, etc),
or whatever, do it.

Finally, just compiling a C program to assembly
might show you the syntax for the kind of instructions you want.
See section <ref id="doyouneedasm" name="Do you need Assembly?"> above.


<sect2>16-bit mode
<p>
The current stable release of binutils (2.9.1.0.25)
now fully supports 16-bit mode (registers <em/and/ addressing) on i386 PCs.
Still with its peculiar AT&amp;T syntax, of course.
Use <tt/.code16/ and <tt/.code32/
	to switch between assembly modes.

Also, a neat trick used by some (including the oskit authors)
is to have GCC produce code for 16-bit real mode,
using an inline assembly statement
	<tt/asm(&dquot;.code16&bsol;n&dquot;)/.
GCC will still emit only 32-bit addressing modes,
	but GAS will insert proper 32-bit prefixes for them.


<sect2>GASP
<p>
GASP is the GAS Preprocessor.
It adds macros and some nice syntax to GAS.
GASP comes together with GAS in the GNU binutils archive.
It works as a filter, much like cpp and the like.
I have no idea on details, but it comes with its own texinfo documentation,
so just browse them (in .info), print them, grok them.
GAS with GASP looks like a regular macro-assembler to me.


<sect1>NASM
<p>
The Netwide Assembler project provides cool i386 assembler,
written in C, that should be modular enough
to eventually support all known syntaxes and object formats.

<sect2>Where to find NASM<label id="findnasm">
<p>
	<url url="http://nasm.sourceforge.net">

Binary release on your usual metalab mirror in <tt>devel/lang/asm/</tt>.
Should also be available as .rpm or .deb in your usual RedHat/Debian
distributions' contrib.

<sect2>What it does
<p>
The syntax is Intel-style.
Fairly good macroprocessing support is integrated.

Supported object file formats are
<tt/bin, aout, coff, elf, as86,/ (DOS) <tt/obj, win32,/ (their own format) <tt/rdf/.

NASM can be used as a backend for the free LCC compiler
(support files included).

Unless you're using BCC as a 16-bit compiler
(which is out of scope of this 32-bit HOWTO),
you should definitely use NASM instead of say AS86 or MASM,
because it is actively supported online,
and runs on all platforms.

Note: NASM also comes with a disassembler, NDISASM.

Its hand-written parser makes it much faster than GAS,
though of course, it doesn't support three bazillion different architectures.
If you like Intel-style syntax, as opposed to GAS syntax,
then it should be the assembler of choice...

Note: There are <ref id="res" name="converters between GAS AT&amp;T and Intel assembler syntax">,
which perform conversion in both directions.

<sect1>AS86
<p>
AS86 is a 80x86 assembler, both 16-bit and 32-bit,
part of Bruce Evans' C Compiler (BCC).
It has mostly Intel-syntax,
though it differs slightly as for addressing modes.


<sect2>Where to get AS86
<p>
A completely outdated version of AS86 is distributed by HJLu
just to compile the Linux kernel,
in a package named bin86 (current version 0.4),
available in any Linux GCC repository.
But I advise no one to use it for anything else but compiling Linux.
This version supports only a hacked minix object file format,
which is not supported by the GNU binutils or anything,
and it has a few bugs in 32-bit mode,
so you really should better keep it only for compiling Linux.

The most recent versions by Bruce Evans (bde@zeta.org.au)
are published together with the FreeBSD distribution.
Well, they were: I could not find the sources from distribution 2.1 on :(
Hence, I put the sources at my place:
<url url="http://www.tunes.org/~fare/files/asm/bcc-95.3.12.src.tgz">

The Linux/8086 (aka ELKS) project is somehow maintaining bcc
(though I don't think they included the 32-bit patches).
See around
<url url="http://www.linux.org.uk/ELKS-Home/">
(or <url url="http://www.elks.ecs.soton.ac.uk">)
and <url url="ftp://linux.mit.edu/pub/linux/ELKS/">.
I haven't followed these developments,
and would appreciate a reader contributing on this topic.

Among other things, these more recent versions, unlike HJLu's,
supports Linux GNU a.out format,
so you can link you code to Linux programs, and/or use the usual
tools from the GNU binutils package to manipulate your data.
This version can co-exist without any harm with the previous one
(see according question below).

BCC from 12 march 1995 and earlier version has a misfeature
that makes all segment pushing/popping 16-bit,
which is quite annoying when programming in 32-bit mode.
I wrote a patch at a time when the TUNES Project used as86:
	<url url="http://www.tunes.org/~fare/files/asm/as86.bcc.patch.gz">.
Bruce Evans accepted this patch,
but since as far as I know he hasn't published a new release of bcc,
the ones to ask about integrating it (if not done yet)
are the ELKS developers.


<sect2>How to invoke the assembler?
<p>
Here's the GNU Makefile entry for using bcc
to transform <tt/.s/ asm
into both GNU a.out <tt/.o/ object
and <tt/.l/ listing:

<code>
&percnt;.o &percnt;.l:        &percnt;.s
        bcc -3 -G -c -A-d -A-l -A&dollar;*.l -o &dollar;*.o &dollar;<
</code>

Remove the <tt/&percnt;.l/, <tt/-A-l/, and <tt/-A&dollar;*.l/,
if you don't want any listing.
If you want something else than GNU a.out,
you can see the docs of bcc about the other supported formats,
and/or use the objcopy utility from the GNU binutils package.


<sect2>Where to find docs
<p>
The docs are what is included in the bcc package.
I salvaged the man pages that used to be available from the FreeBSD site at
<url url="http://www.tunes.org/~fare/files/asm/bcc-95.3.12.src.tgz">.
Maybe ELKS developers know better.
When in doubt, the sources themselves are often a good docs:
it's not very well commented, but the programming style is straightforward.
You might try to see how as86 is used in ELKS or Tunes 0.0.0.25...


<sect2>What if I can't compile Linux anymore with this new version ?
<p>
Linus is buried alive in mail,
and since HJLu (official bin86 maintainer)
chose to write hacks around an obsolete version of as86
instead of building clean code around the latest version,
I don't think my patch for compiling Linux with a modern as86
has any chance to be accepted if resubmitted.
Now, this shouldn't matter: just keep your as86 from the bin86 package
in <tt>/usr/bin/</tt>, and let bcc install the good as86 as
<tt>/usr/local/libexec/i386/bcc/as</tt>
where it should be. You never need explicitly call this "good" as86,
because bcc does everything right, including conversion to Linux a.out,
when invoked with the right options;
so assemble files exclusively with bcc as a frontend, not directly with as86.

Since GAS now supports 16-bit code,
and since H. Peter Anvin, well-known linux hacker, works on NASM,
maybe Linux will get rid of AS86, anyway? Who knows!


<sect1>OTHER ASSEMBLERS
<p>
These are other non-regular options,
in case the previous didn't satisfy you (why?),
that I don't recommend in the usual (?) case,
but that could be quite useful if the assembler must be integrated
in the software you're designing (i.e. an OS or development environment).

<sect2>Win32Forth assembler
<p>
Win32Forth is a <em/free/ 32-bit ANS FORTH system
that successfully runs under Win32s, Win95, Win/NT.
It includes a free 32-bit assembler (either prefix or postfix syntax)
integrated into the reflective FORTH language.
Macro processing is done with
the full power of the reflective language FORTH;
however, the only supported input and output contexts is Win32For itself
(no dumping of .obj file, but you could add that feature yourself, of course).
Find it at
<url url="ftp://ftp.forth.org/pub/Forth/Compilers/native/windows/Win32For/">.


<sect2>TDASM
<p>
The Table Driven Assembler (TDASM) is a <em/free/ portable
cross assembler for any kind of assembly language.
It should be possible to use it as a compiler to any target microprocessor
using a table that defines the compilation process.

It is available from <url url="http://www.penguin.cz/~niki/tdasm/">.


<sect2>Terse
<p>
<url url="http://www.terse.com" name="Terse">
is a programming tool that provides
<em/THE/ most compact assembler syntax for the x86 family!
However, it is evil proprietary software.
It is said that there was a project for a free clone somewhere,
that was abandoned after worthless pretenses that the syntax
would be owned by the original author.
Thus, if you're looking for
	a nifty programming project related to assembly hacking,
I invite you to develop a terse-syntax frontend to NASM,
if you like that syntax.

As an interesting historic remark, on
	<url url="news:comp.compilers" name="comp.compilers">,
	 1999/07/11 19:36:51, the moderator wrote:
"There's no reason that assemblers have to have awful syntax.  About
30 years ago I used Niklaus Wirth's PL360, which was basically a S/360
assembler with Algol syntax and a a little syntactic sugar like while
loops that turned into the obvious branches.  It really was an
assembler, e.g., you had to write out your expressions with explicit
assignments of values to registers, but it was nice.  Wirth used it to
write Algol W, a small fast Algol subset, which was a predecessor to
Pascal.  As is so often the case, Algol W was a significant
improvement over many of its successors. -John"

<sect2>HLA
<p>
<url url="http://webster.cs.ucr.edu" name="HLA">
is a <bf/H/igh <bf/L/evel <bf/A/ssembly language.
It uses a high level language like syntax
(similar to Pascal, C/C++, and other HLLs) for variable declarations,
procedure declarations, and procedure calls. It uses a modified
assembly language syntax for the standard machine instructions.
It also provides several high level language style control structures
(if, while, repeat..until, etc.) that help you write much more readable code.

HLA is free, but runs only under Win32.
You need MASM and a 32-bit version of MS-link,
because HLA produces MASM code and uses MASM for final
assembling and linking. However it comes with <tt/m2t/ (MASM to TASM)
post-processor program that converts the HLA MASM output to a form
that will compile under TASM. Unfortunately, NASM is not supported.


<sect2>TALC
<p>
<url url="http://www.cs.cornell.edu/talc/" name="TALC">
is another free MASM/Win32 based compiler
(however it supports ELF output, does it?).

TAL stands for <bf/T/yped <bf/A/ssembly <bf/L/anguage.
It extends traditional untyped assembly languages with typing annotations,
memory management primitives, and a sound set of typing rules, to guarantee
the memory safety, control flow safety,and type safety of TAL programs.
Moreover, the typing constructs are expressive enough to encode
most source language programming features including records and structures,
arrays, higher-order and polymorphic functions, exceptions, abstract data types,
subtyping, and modules.
Just as importantly, TAL is flexible enough to admit many low-level compiler optimizations.
Consequently, TAL is an ideal target platform for type-directed compilers
that want to produce verifiably safe code
for use in secure mobile code applications
or extensible operating system kernels.

<sect2>Non-free and/or Non-32bit x86 assemblers.
<p>
You may find more about them,
together with the basics of x86 assembly programming,
in <ref id="res-general" name="Raymond Moon's x86 assembly FAQ">.

Note that all DOS-based assemblers should work inside the Linux DOS Emulator,
as well as other similar emulators, so that if you already own one,
you can still use it inside a real OS.
Recent DOS-based assemblers also support COFF and/or other object file formats
that are supported by the GNU BFD library,
so that you can use them together with your free 32-bit tools,
perhaps using GNU objcopy (part of the binutils) as a conversion filter.


<sect>METAPROGRAMMING/MACROPROCESSING
<p>
Assembly programming is a bore,
but for critical parts of programs.

You should use the appropriate tool for the right task,
so don't choose assembly when it's not fit;
C, OCaml, perl, Scheme, might be a better choice for most
of your programming.

However, there are cases when these tools do not give
a fine enough control on the machine, and assembly is useful or needed.
In those case, you'll appreciate a system of macroprocessing and
metaprogramming that'll allow recurring patterns to be factored
each into a one indefinitely reusable definition,
which allows safer programming, automatic propagation of pattern modification,
etc.
Plain assembler often is not enough,
even when one is doing only small routines to link with C.


<sect1>What's integrated into the above
<p>

Yes I know this section does not contain much useful up-to-date information.
Feel free to contribute what you discover the hard way...


<sect2>GCC
<p>
GCC allows (and requires) you to specify register constraints
in your inline assembly code, so the optimizer always know about it;
thus, inline assembly code is really made of patterns,
not forcibly exact code.

Thus, you can make put your assembly into CPP macros, and inline C functions,
	so anyone can use it in as any C function/macro.
Inline functions resemble macros very much, but are sometimes cleaner to use.
Beware that in all those cases, code will be duplicated,
	so only local labels (of <tt/1:/ style)
	should be defined in that asm code.
However, a macro would allow the name for a non local defined label
	to be passed as a parameter
	(or else, you should use additional meta-programming methods).
Also, note that propagating inline asm code will spread potential bugs in them;
so watch out doubly for register constraints in such inline asm code.

Lastly, the C language itself may be considered as a good abstraction
to assembly programming,
which relieves you from most of the trouble of assembling.


<sect2>GAS
<p>
GAS has some macro capability included, as detailed in the texinfo docs.
Moreover, while GCC recognizes .s files as raw assembly to send to GAS,
it also recognizes .S files as files to pipe through CPP before
to feed them to GAS.
Again and again, see Linux sources for examples.


<sect2>GASP
<p>
It adds all the usual macroassembly tricks to GAS.
See its texinfo docs.


<sect2>NASM
<p>
NASM has comprehensive macro support, too.
See according docs.
If you have some bright idea,
you might wanna contact the authors,
as they are actively developing it.
Meanwhile, see about external filters below.


<sect2>AS86
<p>
It has some simple macro support, but I couldn't find docs.
Now the sources are very straightforward,
so if you're interested, you should understand them easily.
If you need more than the basics, you should use an external filter
(see below).


<sect2>OTHER ASSEMBLERS
<p>
<itemize>
<item>
Win32FORTH:
CODE and END-CODE are normal that do not switch from interpretation mode
to compilation mode, so you have access to the full power of FORTH
while assembling.
<item>
TUNES:
it doesn't work yet, but the Scheme language is a real high-level language
that allows arbitrary meta-programming.
</itemize>


<sect1>External Filters
<p>
Whatever is the macro support from your assembler,
or whatever language you use (even C !),
if the language is not expressive enough to you,
you can have files passed through an external filter
with a Makefile rule like that:

<code>
&percnt;.s:    &percnt;.S other&lowbar;dependencies
        &dollar;(FILTER) &dollar;(FILTER&lowbar;OPTIONS) < &dollar;< > &dollar;@
</code>


<sect2>CPP
<p>
CPP is truly not very expressive, but it's enough for easy things,
it's standard, and called transparently by GCC.

As an example of its limitations, you can't declare objects so that
destructors are automatically called at the end of the declaring block;
you don't have diversions or scoping, etc.

CPP comes with any C compiler.
However, considering how mediocre it is,
	stay away from it if by chance you can make it without C,


<sect2>M4
<p>
M4 gives you the full power of macroprocessing,
with a Turing equivalent language, recursion, regular expressions, etc.
You can do with it everything that CPP cannot.

See <url url="ftp://ftp.forth.org/pub/Forth/Compilers/native/unix/this4th.tar.gz" name="macro4th (this4th)">
or
<url url="ftp://ftp.tunes.org/pub/tunes/obsolete/dist/tunes.0.0.0/tunes.0.0.0.25.src.zip"
    name="the Tunes 0.0.0.25 sources">
as examples of advanced macroprogramming using m4.

However, its disfunctional quoting and unquoting semantics force you to use
explicit continuation-passing tail-recursive macro style if
you want to do <em/advanced/ macro programming
(which is remindful of TeX -- BTW, has anyone tried to use TeX as
a macroprocessor for anything else than typesetting ?).
This is NOT worse than CPP that does not allow quoting and recursion anyway.

The right version of m4 to get is GNU m4 1.4 (or later if exists),
which has the most features and the least bugs or limitations of all.
m4 is designed to be slow for anything but the simplest uses,
which might still be ok for most assembly programming
(you're not writing million-lines assembly programs, are you?).


<sect2>Macroprocessing with your own filter
<p>
You can write your own simple macro-expansion filter
with the usual tools: perl, awk, sed, etc.
That's quick to do, and you control everything.
But of course, any power in macroprocessing must be earned the hard way.


<sect2>Metaprogramming
<p>
Instead of using an external filter that expands macros,
one way to do things is to write programs that write part
or all of other programs.

For instance, you could use a program outputting source code
<itemize>
<item>
to generate sine/cosine/whatever lookup tables,
<item>
to extract a source-form representation of a binary file,
<item>
to compile your bitmaps into fast display routines,
<item>
to extract documentation, initialization/finalization code,
description tables, as well as normal code from the same source files,
<item>
to have customized assembly code, generated from a perl/shell/scheme script
that does arbitrary processing,
<item>
to propagate data defined at one point only
into several cross-referencing tables and code chunks.
<item>
etc.
</itemize>

Think about it!


<sect3>Backends from compilers
<p>
Compilers like GCC, SML/NJ, Objective CAML, MIT-Scheme, CMUCL, etc,
do have their own generic assembler backend,
which you might choose to use,
if you intend to generate code semi-automatically
from the according languages,
or from a language you hack:
rather than write great assembly code,
you may instead modify a compiler so that it dumps great assembly code!


<sect3>The New-Jersey Machine-Code Toolkit
<p>
There is a project, using the programming language Icon
(with an experimental ML version),
to build a basis for producing assembly-manipulating code.
See around
	<url url="http://www.eecs.harvard.edu/&tilde;nr/toolkit/">


<sect3>TUNES<p>

The <url url="http://www.tunes.org" name="TUNES Project">
	for a Free Reflective Computing System
is developing its own assembler
as an extension to the Scheme language,
as part of its development process.
It doesn't run at all yet, though help is welcome.

The assembler manipulates abstract syntax trees,
so it could equally serve as the basis for a assembly syntax translator,
a disassembler, a common assembler/compiler back-end, etc.
Also, the full power of a real language, Scheme,
make it unchallenged as for macroprocessing/metaprogramming.


<sect>CALLING CONVENTIONS
<p>


<sect1>Linux
<p>

<sect2>Linking to GCC
<p>
This is the preferred way if you are developing mixed C-asm project.
Check GCC docs and examples from Linux kernel <tt/.S/ files
that go through gas (not those that go through as86).

32-bit arguments are pushed down stack in reverse syntactic order
(hence accessed/popped in the right order),
above the 32-bit near return address.
<tt/&percnt;ebp, &percnt;esi, &percnt;edi, &percnt;ebx/ are callee-saved,
other registers are caller-saved;
<tt/&percnt;eax/ is to hold the result,
or <tt/&percnt;edx:&percnt;eax/ for 64-bit results.

FP stack: I'm not sure,
but I think it's result in <tt/st(0)/, whole stack caller-saved.

Note that GCC has options to modify the calling conventions
by reserving registers, having arguments in registers,
not assuming the FPU, etc. Check the i386 .info pages.

Beware that you must then declare the <tt/cdecl/ or <tt/regparm(0)/
attribute for a function that will follow standard GCC calling conventions.
See in the GCC info pages the section:
	<tt/C Extensions::Extended Asm::/.
See also how Linux defines its asmlinkage macro...


<sect2>ELF vs a.out problems
<p>
Some C compilers prepend an underscore before every symbol,
while others do not.

Particularly, Linux a.out GCC does such prepending,
while Linux ELF GCC does not.

If you need cope with both behaviors at once,
see how existing packages do.
For instance, get an old Linux source tree,
the Elk, qthreads, or OCaml...

You can also override the implicit C&rarr;asm renaming
by inserting statements like
<code>
	void foo asm(&dquot;bar&dquot;) (void);
</code>
to be sure that the C function <tt/foo/
	will be called really <tt/bar/ in assembly.

Note that the utility <tt/objcopy/, from the <tt/binutils/ package,
should allow you to transform your a.out objects into ELF objects,
and perhaps the contrary too, in some cases.
More generally, it will do lots of file format conversions.


<sect2>Direct Linux syscalls
<p>
Often you will be told that using libc is the only way,
and direct system calls are bad. This is true. To some extent.
So, you must know that libc is not sacred, and in <em/most/ cases
libc only does some checks, then calls kernel, and then sets errno.
You can easily do this in your program as well (if you need to),
and your program will be dozen times smaller, and
this will also result in improved performance, just because
you're not using shared libraries (static binaries are faster).
Using or not using libc in assembly programming is more a question of
taste/belief than something practical.
Remember, Linux is aiming to be POSIX compliant, so
does libc. This means that syntax of almost all libc "system calls" exactly
matches syntax of real kernel system calls (and vice versa). Besides, modern
libc becomes slower and slower, and eats more and more memory, and so, cases
of using direct system calls become quite usual.
But.. main drawback of throwing libc away is that possibly you will need to
implement several libc specific functions (that are not just syscall wrappers)
on your own (printf and Co.).. and you are ready for that, aren't you? :)


Here is summary of direct system calls pros and cons.

Pros:
<itemize>
<item>smallest possible size; squeezing the last byte out of the system.
<item>highest possible speed; squeezing cycles out of your favorite benchmark.
<item>full control: you can adapt your program/library
	to your specific language or memory requirements or whatever
<item>no pollution by libc cruft.
<item>no pollution by C calling conventions
	(if you're developing your own language or environment).
<item>static binaries make you independent from libc upgrades or crashes,
	or from dangling <tt/&num;!/ path to a interpreter (and are faster).
<item>just for the fun out of it
	(don't you get a kick out of assembly programming?)
</itemize>

Cons:
<itemize>
<item>If any other program on your computer uses the libc,
	then duplicating the libc code will actually
	waste memory, not save it.
<item>Services redundantly implemented in many static binaries
	are a waste of memory.
	But you can make your libc replacement a shared library.
<item>Size is much better saved by having some kind
	of bytecode, wordcode, or structure interpreter
	than by writing everything in assembly.
	(the interpreter itself could be written either in C or assembly.)
	The best way to keep multiple binaries small is
	to not have multiple binaries, but instead
	to have an interpreter process files with <tt/&num;!/ prefix.
	This is how OCaml works when used in wordcode mode
	(as opposed to optimized native code mode),
	and it is compatible with using the libc.
	This is also how Tom Christiansen's
	<url name="Perl PowerTools"
		url="http://language.perl.com/ppt/">
	reimplementation of unix utilities works.
	Finally, one last way to keep things small,
	that doesn't depend on an external file with a hardcoded path,
	be it library or interpreter,
	is to have only one binary,
	and have multiply-named hard or soft links to it:
	the same binary will provide everything you need in an optimal space,
	with no redundancy of subroutines or useless binary headers;
	it will dispatch its specific behavior
	according to its <tt/argv[0]/;
	in case it isn't called with a recognized name,
	it might default to a shell,
	and be possibly thus also usable as an interpreter!
<item>You cannot benefit from the many functionalities that libc provides
	besides mere linux syscalls:
	that is, functionality described in section 3 of the manual pages,
	as opposed to section 2,
	such as malloc, threads, locale, password,
	high-level network management, etc.
<item>Consequently, you might have to reimplement large parts of libc,
	from <tt/printf/ to <tt/malloc/ and <tt/gethostbyname/.
	It's redundant with the libc effort,
		and can be <em/quite/ boring sometimes.
	Note that some people have already reimplemented &quot;light&quot;
	replacements for parts of the libc -- check them out!
	(Redhat's minilibc,
	Rick Hohensee's <url url="ftp://linux01.gwdg.de/pub/cLIeNUX/interim/libsys.tgz" name="libsys">,
	Felix von Leitner's <url url="http://www.fefe.de/dietlibc/" name="dietlibc">,
	Christian Fowelin's <ref id="res" name="libASM">,
	<ref id="res" name="asmutils"> project is working on pure assembly libc)

<item>Static libraries prevent your benefitting from libc upgrades
	as well as from libc add-ons such as the <tt/zlibc/ package,
	that does on-the-fly transparent decompression
	of gzip-compressed files.
<item>The few instructions added by the libc are
	a <em/ridiculously/ small speed overhead as compared
	to the cost of a system call.
	If speed is a concern, your main problem is in
	your usage of system calls, not in their wrapper's implementation.
<item>Using the standard assembly API for system calls is much slower
	than using the libc API when running in micro-kernel versions
	of Linux such as L4Linux,
	that have their own faster calling convention,
	and pay high convention-translation overhead
	when using the standard one
	(L4Linux comes with libc recompiled with their syscall API;
	of course, you could recompile your code with their API, too).
<item>See previous discussion for general speed optimization issue.
<item>If syscalls are too slow to you,
	you might want to hack the kernel sources (in C)
	instead of staying in userland.
</itemize>

If you've pondered the above pros and cons,
and still want to use direct syscalls
(as documented in section 2 of the manual pages),
then here is some advice.

<itemize>
<item>You can easily define your system calling functions
	in a portable way in C (as opposed to unportable using assembly),
	by including <tt>&lt;asm/unistd.h&gt;</tt>,
	and using provided macros.
<item>Since you're trying to replace it,
	go get the sources for the libc, and grok them.
	(And if you think you can do better,
	then send feedback to the authors!)
<item>As an example of pure assembly code that does everything you want,
examine <ref id="res" name="Linux Assembly resources">.
</itemize>

Basically, you issue an <tt/int 0x80/,
with the <tt/&lowbar;&lowbar;NR&lowbar;/syscallname number
	(from <tt>asm/unistd.h</tt>)
in <tt/eax/, and parameters (up to six [*]) in
<tt/ebx, ecx, edx, esi, edi, ebp [*]/ respectively ([*] - Linux 2.4 only,
previous versions have only 5 parameters).
Result is returned in <tt/eax/, with a negative result being an error,
whose opposite is what libc would put in <tt/errno/.
The user-stack is not touched,
so you needn't have a valid one when doing a syscall.

<url url="http://www.linuxdoc.org/LDP/lki/" name="Linux Kernel Internals">,
and especially
<url url="http://www.linuxdoc.org/LDP/lki/Linux-Kernel-Internals-2.html#ss2.11"
name="How System Calls Are Implemented on i386 Architecture?">
chapter will give you more robust overview.

As for the invocation arguments passed to a process upon startup,
	the general principle is that the stack
	originally contains the number of arguments <tt/argc/,
	then the list of pointers that constitute <tt/*argv/,
	then a null-terminated sequence of null-terminated
		variable=value strings for the <tt/environ/ment.
For more details,
	do examine <ref id="res" name="Linux assembly resources">,
	read the sources of C startup code from your libc
	(<tt/crt0.S/ or <tt/crt1.S/),
	or those from the Linux kernel
	(<tt/exec.c/ and <tt/binfmt_*.c/ in <tt>linux/fs/</tt>).


<sect2>Hardware I/O under Linux
<p>
If you want to do direct I/O under Linux,
either it's something very simple that needn't OS arbitration,
and you should see the <tt/IO-Port-Programming/ mini-HOWTO;
or it needs a kernel device driver, and you should try to learn more about
kernel hacking, device driver development, kernel modules, etc,
for which there are other excellent HOWTOs and documents from the LDP.

Particularly, if what you want is Graphics programming,
then do join one of the
	<url url="http://www.ggi-project.org/" name="GGI">
or	<url url="http://www.XFree86.org/" name="XFree86">
projects.

Some people have even done better,
writing small and robust XFree86 drivers
	in an interpreted domain-specific language,
	<url url="http://www.irisa.fr/compose/gal/" name="GAL">,
and achieving the efficiency of hand C-written drivers
through partial evaluation (drivers not only not in asm, but not even in C!).
The problem is that the partial evaluator they used
	to achieve efficiency is not free software.
Any taker for a replacement?

Anyway, in all these cases, you'll be better when using GCC inline assembly
with the macros from <tt>linux/asm/*.h</tt>
	than writing full assembly source files.


<sect2>Accessing 16-bit drivers from Linux/i386
<p>
Such thing is theoretically possible
(proof: see how <url url="http://www.dosemu.org" name="DOSEMU">
can selectively grant hardware port access to programs),
and I've heard rumors that someone somewhere did actually do it
(in the PCI driver? Some VESA access stuff? ISA PnP? dunno).
If you have some more precise information on that,
you'll be most welcome.
Anyway, good places to look for more information are the Linux kernel sources,
DOSEMU sources (and other programs in the
<url url="ftp://tsx-11.mit.edu/pub/linux/ALPHA/dosemu/"
     name="DOSEMU repository">),
and sources for various low-level programs under Linux...
(perhaps GGI if it supports VESA).

Basically, you must either use 16-bit protected mode or vm86 mode.

The first is simpler to setup, but only works with well-behaved code
that won't do any kind of segment arithmetics
or absolute segment addressing (particularly addressing segment 0),
unless by chance it happens that all segments used can be setup in advance
in the LDT.

The later allows for more "compatibility" with vanilla 16-bit environments,
but requires more complicated handling.

In both cases, before you can jump to 16-bit code,
you must
<itemize>
<item>mmap any absolute address used in the 16-bit code
(such as ROM, video buffers, DMA targets, and memory-mapped I/O)
from <tt>/dev/mem</tt> to your process' address space,
<item>setup the LDT and/or vm86 mode monitor.
<item>grab proper I/O permissions from the kernel (see the above section)
</itemize>

Again, carefully read the source for the stuff contributed
	to the DOSEMU project,
particularly these mini-emulators
for running ELKS and/or simple .COM programs under Linux/i386.


<sect1>DOS
<p>
Most DOS extenders come with some interface to DOS services.
Read their docs about that,
but often, they just simulate <tt/int 0x21/ and such,
so you do "as if" you are in real mode
(I doubt they have more than stubs
and extend things to work with 32-bit operands;
they most likely will just reflect the interrupt
into the real-mode or vm86 handler).

Docs about DPMI (and much more) can be found on
	<url url="ftp://x2ftp.oulu.fi/pub/msdos/programming/">
(again, the original x2ftp site is closing (no more?), so use a
	<url url="ftp://ftp.lip6.fr/pub/pc/x2ftp/README.mirror_sites"
		name="mirror site">).

DJGPP comes with its own (limited) glibc derivative/subset/replacement, too.

It is possible to cross-compile from Linux to DOS,
see the devel/msdos/ directory of your local FTP mirror for metalab.unc.edu
Also see the MOSS dos-extender from the
	<url url="http://www.cs.utah.edu/projects/flux/" name="Flux project">
from university of Utah.

Other documents and FAQs are more DOS-centered.
We do not recommend DOS development.


<sect1>Windows and Co.
<p>
This HOWTO is not about Windows programming,
you can find lots of documents about it everywhere..
The thing you should know is that
<url url="http://www.cygnus.com" name="Cygnus Solutions">
developed the
<url url="http://sourceware.cygnus.com/cygwin/" name="cygwin32.dll library">,
for GNU programs to run on Win32 platform; thus, you can use GCC, GAS,
all the GNU tools, and many other Unix applications.

<sect1>Your own OS
<p>
Control is what attracts many OS developers to assembly,
often is what leads to or stems from assembly hacking.
Note that any system that allows self-development
could be qualified an &quot;OS&quot;,
though it can run &quot;on the top&quot; of an underlying system
(much like Linux over Mach or OpenGenera over Unix).

Hence, for easier debugging purpose,
you might like to develop your &quot;OS&quot; first as a process running
on top of Linux (despite the slowness), then use the
<url url="http://www.cs.utah.edu/projects/flux/oskit/" name="Flux OS kit">
(which grants use of Linux and BSD drivers in your own OS)
to make it standalone.
When your OS is stable, it is time to write your own
hardware drivers if you really love that.

This HOWTO will not cover topics such as
Boot loader code &amp; getting into 32-bit mode,
Handling Interrupts,
The basics about Intel protected mode or V86/R86 braindeadness,
defining your object format and calling conventions.

The main place where to find reliable information about that all,
is source code of existing OSes and bootloaders.
Lots of pointers are on the following webpage:
    <url url="http://www.tunes.org/Review/OSes.html">


<sect>QUICK START
<p>
Finally, if you still want to try this crazy idea and write something in
assembly (if you've reached this section -- you're real assembly fan),
I'll herein provide what you will need to get started.

As you've read before, you can write for Linux in different ways;
I'll show example of using pure system calls.
This means that we will not use libc at all, the only thing required for
our program to run is kernel.
Our code will not be linked to any library, will not use ELF interpreter --
it will communicate directly with kernel.

I will show the same sample program in two assemblers, <tt/nasm/ and <tt/gas/,
thus showing Intel and AT&amp;T syntax.

You may also want to read <url url="http://linuxassembly.org/intro.html"
name="Introduction to UNIX assembly programming"> tutorial,
it contains sample code for other UNIX-like OSes.

<sect1>Tools you need
<p>
First of all you need assembler (compiler): <tt/nasm/ or <tt/gas/.
Second, you need linker: <tt/ld/, assembler produces only object code.
Almost all distributions include <tt/gas/ and <tt/ld/, in binutils package.
As for <tt/nasm/, you may have to download and install binary packages
for Linux and docs from the <ref id="findnasm" name="nasm webpage">;
however, several distributions (Stampede, Debian, SuSe)
already include it, check first.
<p>
If you are going to dig in, you should also install kernel source.
I assume that you are using at least Linux 2.0 and ELF.

<sect1>Hello, world!
<p>
Linux is 32bit and has flat memory model.
A program can be divided into sections.
Main sections are <em/.text/ for your code,
<em/.data/ for your data, <em/.bss/ for undefined data.
Program must have at least <em/.text/ section.
<p>
Now we will write our first program. Here is sample code:

<sect2>NASM (hello.asm)
<p>
<tscreen><code>
section .data				;section declaration

msg     db      "Hello, world!",0xa	;our dear string
len     equ     $ - msg                 ;length of our dear string

section .text				;section declaration

			;we must export the entry point to the ELF linker or
    global _start	;loader. They conventionally recognize _start as their
			;entry point. Use ld -e foo to override the default.

_start:

;write our string to stdout

        mov     edx,len ;third argument: message length
        mov     ecx,msg ;second argument: pointer to message to write
        mov     ebx,1   ;first argument: file handle (stdout)
        mov     eax,4   ;system call number (sys_write)
        int     0x80	;call kernel

;and exit

	mov	ebx,0	;first syscall argument: exit code
        mov     eax,1   ;system call number (sys_exit)
        int     0x80	;call kernel

</code></tscreen>

<sect2>GAS (hello.S)
<p>
<tscreen><code>
.data					# section declaration

msg:
	.string	"Hello, world!\n"	# our dear string
	len = . - msg			# length of our dear string

.text					# section declaration

			# we must export the entry point to the ELF linker or
    .global _start	# loader. They conventionally recognize _start as their
			# entry point. Use ld -e foo to override the default.

_start:

# write our string to stdout

	movl	$len,%edx	# third argument: message length
	movl	$msg,%ecx	# second argument: pointer to message to write
	movl	$1,%ebx		# first argument: file handle (stdout)
	movl	$4,%eax		# system call number (sys_write)
	int	$0x80		# call kernel

# and exit

	movl	$0,%ebx		# first argument: exit code
	movl	$1,%eax		# system call number (sys_exit)
	int	$0x80		# call kernel
</code></tscreen>

<sect1>Producing object code
<p>
First step of building binary is producing object file from source
by invoking assembler; we must issue the following:
<p>
For <tt/nasm/ example:

<tt/&dollar; nasm -f elf hello.asm/
<p>
For <tt/gas/ example:

<tt/&dollar; as -o hello.o hello.S/
<p>
This will produce <tt/hello.o/ object file.


<sect1>Producing executable
<p>
Second step is producing executable file itself from object file
by invoking linker:
<p>
<tt/&dollar; ld -s -o hello hello.o/

This will finally build <tt/hello/ executable.
<p>
Hey, try to run it... Works? That's it. Pretty simple.


<sect>RESOURCES<label id="res">
<p>

You main resource for Linux/UNIX assembly programming material
is <bf><url url="http://linuxassembly.org/resources.html"
	name="Linux Assembly resources page"></bf>.
Do visit it, and get plenty of pointers to assembly projects,
tools, tutorials, documentation, guides, etc,
concerning different UNIX operating systems and CPUs.
Because it evolves quickly, I will no longer duplicate it in this HOWTO.

If you are new to assembly in general, here are few starting pointers:

<label id="res-general">
<itemize>
<item><url url="http://webster.cs.ucr.edu/Page_asm/ArtOfAsm.html"
	name="The Art Of Assembly">
<item><url url="http://www2.dgsys.com/&tilde;raymoon/faq/"
	name="x86 assembly FAQ">
<item><url url="ftp://ftp.luth.se/pub/msdos/"
	name="ftp.luth.se"> mirrors the hornet and x2ftp
	former archives of msdos assembly coding stuff
<item><url url="http://www.koth.org" name="CoreWars">,
	a fun way to learn assembly in general
<item>Usenet:
	<url url="news://comp.lang.asm.x86" name="comp.lang.asm.x86">;
	<url url="news://alt.lang.asm" name="alt.lang.asm">
</itemize>


<sect1>Mailing list<label id="res-list">
<p>
If you're are interested in Linux/UNIX assembly programming
(or have questions, or are just curious)
I especially invite you to join Linux assembly programming mailing list.

This is an open discussion of assembly programming under Linux, *BSD, BeOS,
or any other UNIX/POSIX like OS; also it is not limited to x86 assembly
(Alpha, Sparc, PPC and other hackers are welcome too!).

To subscribe send a blank message to <url url="mailto:linux-assembly-subscribe@egroups.com">.

List address is <url url="mailto:linux-assembly@egroups.com">.

List archives are available at <url url="http://www.egroups.com/list/linux-assembly/">.

<sect1>Frequently asked questions (with answers)<label id="faq">
<p>
Here are frequently asked questions. Answers are taken
from the <ref id="res-list" name="linux-assembly mailing list">.

<sect2>How do I do graphics programming in Linux?
<p>
An answer from <url url="mailto:paulf@icom.co.za" name="Paul Furber">:

<verb>
Ok you have a number of options to graphics in Linux. Which one you use
depends on what you want to do. There isn't one Web site with all the
information but here are some tips:

SVGALib: This is a C library for console SVGA access.
Pros: very easy to learn, good coding examples, not all that different
from equivalent gfx libraries for DOS, all the effects you know from DOS
can be converted with little difficulty.
Cons: programs need superuser rights to run since they write directly to
the hardware, doesn't work with all chipsets, can't run under X-Windows.
Search for svgalib-1.4.x on http://ftp.is.co.za

Framebuffer: do it yourself graphics at SVGA res
Pros: fast, linear mapped video access, ASM can be used if you want :)
Cons: has to be compiled into the kernel, chipset-specific issues, must
switch out of X to run, relies on good knowledge of linux system calls
and kernel, tough to debug
Examples: asmutils (http://www.linuxassembly.org) and the leaves example
and my own site for some framebuffer code and tips in asm
(http://ma.verick.co.za/linux4k/)

Xlib: the application and development libraries for XFree86.
Pros: Complete control over your X application
Cons: Difficult to learn, horrible to work with and requires quite a bit
of knowledge as to how X works at the low level.
Not recommended but if you're really masochistic go for it. All the
include and lib files are probably installed already so you have what
you need.

Low-level APIs: include PTC, SDL, GGI and Clanlib
Pros: very flexible, run under X or the console, generally abstract away
the video hardware a little so you can draw to a linear surface, lots of
good coding examples, can link to other APIs like OpenGL and sound libs,
Windows DirectX versions for free
Cons: Not as fast as doing it yourself, often in development so versions
can (and do) change frequently.
Examples: PTC and GGI have excellent demos, SDL is used in sdlQuake,
Myth II, Civ CTP and Clanlib has been used for games as well.

High-level APIs: OpenGL - any others?
Pros: clean api, tons of functionality and examples, industry standard
so you can learn from SGI demos for example
Cons: hardware acceleration is normally a must, some quirks between
versions and platforms
Examples: loads - check out www.mesa3d.org under the links section.

To get going try looking at the svgalib examples and also install SDL
and get it working. After that, the sky's the limit.
</verb>

<sect2>How do I debug pure assembly code under Linux?
<p>

There's an early version of the
<url url="http://www.ellipse.magenet.com/ald.html"
name="Assembly Language Debugger">,
which is designed to work with assembly code,
and is portable enough to run on Linux and *BSD.
It is already functional and should be the right choice, check it out!

You can also try <tt/gdb/ ;).
Although it is source-level debugger, it can be used to debug
pure assembly code, and with some trickery you can make <tt/gdb/ to do what you need.
Here's an answer from <url url="mailto:dl@gazeta.ru" name="Dmitry Bakhvalov">:

<verb>
Personally, I use gdb for debugging asmutils. Try this:

1) Use the following stuff to compile:
   $nasm -f elf -g smth.asm
   $ld -o smth smth.o

2) Fire up gdb:
   $gdb smth

3) In gdb:
   (gdb) disassemble _start
   Place a breakpoint at <_start+1> (If placed at _start the breakpoint
   wouldnt work, dunno why)
   (gdb) b *0x8048075

   To step thru the code I use the following macro:
   (gdb)define n
   >ni
   >printf "eax=%x ebx=%x ...etc...",$eax,$ebx,...etc...
   >disassemble $pc $pc+15
   >end

   Then start the program with r command and debug with n.

   Hope this helps.
</verb>

An additional note from ???:

<verb>
    I have such a macro in my .gdbinit for quite some time now, and it
    for sure makes life easier. A small difference : I use "x /8i $pc",
    which guarantee a fixed number of disassembled instructions. Then,
    with a well chosen size for my xterm, gdb output looks like it is
    refreshed, and not scrolling.
</verb>

If you want to set breakpoints across your code, you can just use
<tt/int 3/ instruction as breakpoint (instead of entering address
manually in <tt/gdb/).

If you're using <tt/gas/, you should consult <tt/gas/ and <tt/gdb/ related
<url url="http://linuxassembly.org/resources.html#tutorials" name="tutorials">.


<sect2>Any other useful debugging tools?
<p>
Definitely <tt/strace/ can help a lot (<tt/ktrace/ and <tt/kdump/ on FreeBSD),
it is used to trace system calls and signals.
Read its manual page (<tt/man strace/) and <tt/strace --help/ output for details.


<sect2>How do I access BIOS functions from Linux (BSD, BeOS, etc)?
<p>
Noway. This is protected mode, use OS services instead.
Again, you can't use <tt/int 0x10/, <tt/int 0x13/, etc.
Fortunately almost everything can be implemented
by means of system calls or library functions.
In the worst case you may go through direct port access,
or make a kernel patch to implement needed functionality.

<em/That's all for now, folks/.

$Id$

</article>