From: Linus Torvalds (torvalds@klaava.Helsinki.FI)
Date: 05/05/93


From: torvalds@klaava.Helsinki.FI (Linus Torvalds)
Subject: Re: GNU __asm__
Date: Wed, 5 May 1993 13:34:53 GMT

In article <C6ILp8.zB@oea.hobby.nl> dan@oea.hobby.nl (Dan Naas) writes:
>Charles Hannum (mycroft@hal.gnu.ai.mit.edu) wrote:
>
>: This really is all documented, but the documentation is not very clear.
>
> It is partially documented and I have read the documentation but
>even looking at it with hindsight, it is not complete. Especially as far
>as i386 architecture is concerned: the fact that the constraints a,b and
>I assume c and d stand for eax, ebx, ecx and edx is nowhere to be found.

Indeed. The best documentation is have found is "trial and error" and
reading the gcc sources (yes, the latter isn't exactly practical, but
you don't have to read it all - you can skim through it to pick out
details). I haven't checked the newest manuals, but the way to handle
floating point operations was especially lacking in some older manual.
I can't very well complain, as everybody knows how eager I'm to write
manuals for my own code :-p

You can look at the examples in the linux header files (and elsewhere),
although some of them are less than obvious due to a bug in the early
gcc 2.x releases (that didn't accept the "b" qualifier for '%ebx', so
they use explicit loads - this is true of the <linux/unistd.h> macros,
for example).

Note that doing things with inline assembly is not necessarily always a
good idea: gcc can't optimize the innards of the assembly code even if
there are constants etc being used, so it's sometimes actually better to
use normal C operations if you know that the operations often can be
optimized away for the constant case.

The argument qualifiers used by gcc are roughly:

 - "a", "b", "c", "d" - %eax, %ebx, %ecx, %edx: size depends on the size
   of the argument (32, 16 or 8 bit)
 - "D", "S" - %edi, %esi: size again depends on the size of the argument
   (32 or 16 bit)
 - "r" - any free register (%eax, %ebx, %ecx, %edx, %esi, %edi, %ebp).
   Size 32 or 16 bits.
 - "q" - any free 8-bit register (%al, %ah, %bl, %bh etc). Note that
   gcc does know about the high registers, but seldom actually gets to
   use them. The support for them seems to be something of a hack.
 - "m" - memory
 - "i" - integer
 - "g" - general - any of the above
 - "t" - top of floating point stack (%st)
 - "u" - next on floating point stack (%st(1))
 - "number" - the same as argument nr 'number'

In the case of registers, you can force the size by using a prefix in
the assembly string (eg %b0 means that the zero-argument is supposed to
be shown as a byte-wide register). This is useful mainly for special
instructions like IO.

Some simple examples:

(1)
        extern inline int inb(int port)
        {
                int result;
                __asm__ __volatile__("inb %w1,%b0"
                        :"=a" (result) /* outputs */
                        :"d" (port),"0" (0)); /* inputs */
                return result;
        }

Here we use 'int's throughout, as gcc often does better code with them
(the gcc code gets easily a bit convoluted when handling character and
word data). But as the 'inb' instruction only operates on the low word
of the port number and the low byte of the %eax register, we use the
size prefix to get the correct assembly output. Also, we force gcc to
clear the full %eax register before generating the assembler output by
using the '.."0" (0)..' sequence - this just tells gcc that assembly
argument 0 (ie "=a" is supposed to have a input value of 0). We could
clear %eax explicitly in the __asm__ statement, but telling gcc to do it
for us gives gcc the possibility to notice that %eax was zero before and
leaving that out.

The __volatile__ keeps gcc from moving the oubt around significantly, as
well as telling gcc not to optimize it away even if the result is never
used: IO instructions can be critical even when their result is
unimportant. So what this code sequence results in is:

 - gcc puts the port address in %edx (...'"d" (port)'...)
 - gcc clears %eax (...'"0" (0)'...)
 - gcc outputs
        inb %dx,%al (%w1 = 16-bit "d" = %dx, %b0 = 8-bit "a" = %al).
 - gcc knows that the result is in %eax (...'"=a" (result)'...)

The actual linux macros to do this look a bit different, but the above
is probably the best way to handle it. Note that there is no way this
will result in the simpler
        inb $0xXX,%al
instruction, as we told it to use only %edx. We could use the "id" ("i"
+ "d") qualifier to tell it to use either integer or %edx values for the
port number, but this won't work if the constant port would be over 255
due to x86 assembly range limitations.

(2) 128-bit integer addition (difficult to do efficiently in C):

        typedef struct {
                unsigned long value[4];
        } int128;

        extern inline void addx(int128 * a, int128 * b)
        {
                __asm__("movl (%1),%%eax\n\t"
                        "addl %%eax,(%2)\n\t"
                        "movl 4(%1),%%eax\n\t"
                        "adcl %%eax,4(%2)\n\t"
                        "movl 8(%1),%%eax\n\t"
                        "adcl %%eax,8(%2)\n\t"
                        "movl 12(%1),%%eax\n\t"
                        "adcl %%eax,12(%2)"
                        :"=m" (*b) /* output */
                        :"r" (a),"r" (b) /* input */
                        :"ax"); /* modified */
        }

This __asm__ statement has three arguments: %0 is never actually used,
but the argument tells gcc that we modify '*b'. %1 is the address of
the first 128-bit integer in a register that gcc can chose freely, and
%2 is the address of the second one (and the destination). Also, the
"ax" finally tells gcc that we modify %eax, as we use it for
intermediate results. That also means that neither %1 nor %2 will use
the %eax register.

                Linus