From: Josh Yelon (jyelon@suna0.cs.uiuc.edu)
Date: 03/19/92


From: Josh Yelon <jyelon@suna0.cs.uiuc.edu>
Subject: Pklite for Linux, end of project.
Date: Thu, 19 Mar 1992 15:01:09 GMT

About the PKLITE project that I had undertaken: I am no longer
interested, here are my notes, for anyone who wishes to take over.

Implementation would be trivial if only we had this one
system call:

        execv_in_core(data, len, argv)
        char *data;
        int len;
        char *argv[];

It acts exactly like execv, except that rather than exec'ing
a file, it exec's a block of memory. In other words, these two
pieces of code should have basically the same effect:

        1. execv("/usr/josh/bin/foo",argv)
        
        2. progfile = open("/usr/josh/bin/foo",O_RDONLY);
           progsize = read(progfile, progbuffer, 9999999);
           execv_in_core(progbuffer, progsize, argv);

That's the only kernel modification you would need - the rest
would be user-level code. Pklite would work like this:

1. It compresses the executable.
2. It concatenates a decompression routing to the compressed data,
   and tacks on the appropriate ZMAGIC header.
3. When the new binary runs, it decompresses the data in memory,
   and then runs the data, using execv_in_core.

That's all there is to it! At least, in theory.

The problem is "execv_in_core". It looks trivial. In fact, it's
hopeless. Linux is absolutely infested with the assumption that
every process has an associated executable from which it can page.
execv_in_core would break that assumption, and everything
would fall to pieces. Even if we did somehow manage to patch things
up, we'd need to completely redo the implementation of shared pages in
order to get two execv_in_core programs to share pages.

I do hope that somebody tries: It seems very much worthwhile,
regardless of whether or not stacker exists. I don't disagree that
stacker is useful. However, a 'stacker' just can't get the same
level of compression that a program like pklite could achieve:

    - pklite can take all night to compress your files. Just
      start a batch file, and it'll be done in the morning.
      Stacker, instead, has to do a half-hearted job, since it's
      in a hurry not to slow down your filesystem.
    - pklite can make a useful assumption about your files:
      namely, that they contain mostly 386 instructions. That
      assumption should enable it to increase compression
      significantly.
    - pklite doesn't need to allow random access: there is
      no reason for pklite to chunk things into 1k, 2k, 3k,
      and 4k pages, thereby causing internal fragmentation,
      and reducing compression.

These factors add up to one thing: significantly better
compression ratios for pklite.

Call me if you get the system call implemented, but need a hand
on the data compression part.

                                                  - Josh