From: Linus Torvalds (torvalds@klaava.Helsinki.FI)
Date: 02/11/93


From: torvalds@klaava.Helsinki.FI (Linus Torvalds)
Subject: Re: 8086 Assembler in SLS
Date: Thu, 11 Feb 1993 16:20:00 GMT

In article <C25sAJ.DwG@techbook.com> kosina@techbook.com (Martin Kosina) writes:
>
>I know this has been asked before but I have never seen any response:
>Is there *any* documentation for 'as' and 'as86' that come with the SLS
>distribution ? Man page, README, anything ?

The 'as' in SLS is the GNU assembler, often referred to as 'gas'. It
has some documentation: look for gas-doc.tar.Z on your favourite ftp
site (at least nic.funet.fi: pub/gnu had it at some time). The gas
documentation isn't good (ar at least it wasn't the last time I looked
which is a long time ago), mostly because it is written with the purpose
of assembling compiler output, not really for "normal" assemblies.

GNU as uses a very different syntax from "normal" x86 assemblers: if you
are used to the intel syntax you are in for a bad time. The best way to
learn the syntax is probably to read either gcc output (use -S to
produce assembly output) or to read the linux kernel assembly files.
The GNU documentation mentions most of the differences, but it's not
exhaustive.

Finally, GNU as only creates code for the 386 32-bit mode, which is why
linux needs another assembler for the 16-bit startup code required by
the kernel bootloader. This is where as86 comes into play...

as86 is an assembler for the x86 family written by Bruce Evans (along
with ld86 that is the linker associated with it). It is able to
generate code for both 16- and 32-bit segments, and is thus suitable for
the linux bootstrapping code. It's also fast, small, and has features
gas doesn't have (macros, I think, along with some more error checking).
Sadly, it uses a intel-like format (ie closer to MASM, TurboASM etc),
which I personally detest, and gcc is unable to write code for it.
Also, it doesn't do short jump optimizations (the user has to indicate
whether the jump is short or long), which is a pain.

The as86 syntax is not the "normal" MASM syntax, but is based on the
minix assembler syntax, which in turn is based on the PC/IX assembler
(some forgotten unix system for the 8086 - the one ast originally used
to write minix). The best way to find out the syntax is again to read
the linux sources: linux/boot/bootsector.S and linux/boot/setup.S are
written in as86 syntax.

> If there is no
>instructions at all, how different is it from TASM or MASM ? Again,
>simple stuff like how do you set up data and code segments, terminate
>function (I guess INT 21h won't work :-) ) and if there are any
>significant differences in the mnemoics. I tried something very trivial
>like 'mov AX,10', it assembled into a.out but dumped core when
>executed.

You generally don't want to mess with segments under linux: your process
has access to only two segments, both of which cover the same area (the
only reason for there being two segments is that the 386 wants a special
code segment for executable data). On startup, all segment registers
already point to this (I think.. at least %ds and %es do, and I haven't
cared about the others).

The way to write assembly language under linux is to use gas to produce
code: as86+ld86 won't be able to link to the correct linux executable
format. Also, I'd suggest you stick to C as far as possible: use
assembler for only small and important routines. This is not from any
philosophical reason, but for simple practical reasons: gcc usually
creates almost as good code as you would do, and you can get off with
much less work. Also, don't mess with the startup code or the
libraries: let gcc handle those, and you can do the important routines
in assembly if you need to. It's often enough to use inline assembly
(print out the gcc documentation to see how it's done), and do it all
from C.

I've included a small example program at the end of this post to show
how to make standalone programs under linux (no startup routine, no
libraries, no nuttin'), and you may get an idea how to start off from
there. No points for guessing what it does without compiling it.
Compile and run with:

        $ as -o asm.o asm.s
        $ ld -o asm asm.o
        $ ./asm

Maybe it works, maybe it doesn't. I should probably test these things
before posting, but my linux machine is at home, and I'm not. A few
clues: system calls are done using the "int $0x80" instruction, with the
arguments in the registers. %eax contains the system call number (1 is
exit, 2 is fork, 3 is read, and 4 is write etc - look them up in
<linux/unistd.h>) with the other registers containing arguments or
pointers to arguments.

                Linus

===== untested asm=program starts here =====
.text
_entry:
        movl $4,%eax
        movl $1,%ebx
        movl $message,%ecx
        movl $12,%edx
        int $0x80
        movl $1,%eax
        int $0x80

message:
        .ascii "Hello World\n"
===== untested asm=program ends here =====