From: Drew Eckhardt (drew@cs.colorado.edu)
Date: 03/15/92


From: drew@cs.colorado.edu (Drew Eckhardt)
Subject: Re: 'pklite' for Linux.
Date: 16 Mar 1992 02:16:52 GMT

In article <1992Mar16.003027.2064@athena.mit.edu> jyelon@cs.uiuc.edu writes:
>I considering writing binary file compressor for Linux, much like
>pklite for msdos. The salient characteristic will be that you
>can compress a program, and you won't have to decompress it
>in order to run it. (It would decompress itself automatically
>into RAM). I need ideas.
>
>First of all, how do I pull it off?
>
>Here is plan 1:
>
> 1. start the to-be-compressed executable using ptrace.
> 2. read out all of its memory into a file.
> 3. compress the file.
> 4. prepend a tiny in-memory decompression routine to the file,
> 5. prepend the proper headers for an executable program, and presto!
>
>The disadvantages:
>
> 1. I like shared pages. However, a text segment that gets
> decompressed at runtime isn't 'read-only', so that does
> it in for shared pages.

Disk is cheap, memory is expensive. If you can't share
pages, you're going to loose memory, and start swapping to disk
much sooner. I sometimes look at the memory allocation, and
see 100+ shared pages when I'm heavily loaded - that's
400K.

Puting the decompression code in the kernel eliminates this problem :
the pages can still be read / execute only, and clean, and still
be shared.

> 2. Shared libraries would probably get linked in as soon as
> ptrace ran the thing, so the libraries would become a part
> of the compressed file. Yuck!

Why use ptrace? You probably want to set up a new magic number,
and prepend that to a compressed a.out file.

> 3. Thousands of copies of this decompression routine, one in
> every executable, like a virus. Gross. Plus, the kernel
> is constantly loading the same decompression routine from
> disk. Wasteful.

If its in the kernel, there aren't "thousands of copies of this routine"

>The advantages:
>
> 1. Edit the kernel code for exec, to look for a new 'compressed
> executable' magic number.
> 2. Have it then decompress/load the file into ram, and
> 3. Have it then proceed as if it had just loaded a non-demand-paged
> executable.

Much better. You still have shared pages, etc. The only real problem
you still have is not being able to page from the file, and text gets
swapped like data.

>The advantages.
>
> 1. Its very clean.
> 2. The decompression routine is in the kernel, so it only
> needs to be loaded once.
> 3. It sounds more reliable than the above approach.
>
>The disadvantages.
>
> 1. The kernel is getting bigger (although not much,
> assuming that the decompression routine is small).
> 2. You have to have the decompressing-kernel to run
> compressed programs.
> 3. Again, compressed files are not demand-paged. (I, personally,
> could care less - I tend to think that demand-paging slows
> things down as much as it speeds things up).
>

Demand paging speeds things up in several ways :
1, Exec is faster because fewer pages are loaded at once.
2. Real memory use is lower, as the whole thing hasn't
        been loaded. Less real memory used = less
        swapping. Remember, disk is 1000 times slower
        than real memory.

You might look at what happens to compression, etc if the file
is compressed in Xk hunks, X some small multiple of page size
(4K). This way you could seek to that part of
the compressed image (using a directory at the begining),
and still demmand page / not swap text.

>Also, about compression algorithms:
>

Kludge alert :
Why not, atleast for prototyping purposes, do a
pipe(2), fork(2), and exec /<standard location>/uncomrpess?
The code isn't in the kernel.

>PS: this might help alleviate some of the root-disk woes!

PPS : root disk woes :

1. Put a scaled down shell on.
2. Loose init, etc. I'd much rather have compress, tar on my root floppy than
        init, login, getty, and their ilk.