From: Hongjiu Lu -- Graduate Student (hlu@decserv1.eecs.wsu.edu)
Date: 03/15/92


From: hlu@decserv1.eecs.wsu.edu (Hongjiu Lu -- Graduate Student)
Subject: Re: 'pklite' for Linux.
Date: 16 Mar 1992 01:09:13 GMT

In article <1992Mar16.003027.2064@athena.mit.edu>, jyelon@cs.uiuc.edu (Josh Yelon) writes:
|> I considering writing binary file compressor for Linux, much like
|> pklite for msdos. The salient characteristic will be that you
|> can compress a program, and you won't have to decompress it
|> in order to run it. (It would decompress itself automatically
|> into RAM). I need ideas.
|>
|> First of all, how do I pull it off?
|>
|> Here is plan 1:
|>
|> 1. start the to-be-compressed executable using ptrace.
|> 2. read out all of its memory into a file.
|> 3. compress the file.
|> 4. prepend a tiny in-memory decompression routine to the file,
|> 5. prepend the proper headers for an executable program, and presto!
|>
|> The disadvantages:
|>
|> 1. I like shared pages. However, a text segment that gets
|> decompressed at runtime isn't 'read-only', so that does
|> it in for shared pages.
|>
|> 2. Shared libraries would probably get linked in as soon as
|> ptrace ran the thing, so the libraries would become a part
|> of the compressed file. Yuck!
|>
|> 3. Thousands of copies of this decompression routine, one in
|> every executable, like a virus. Gross. Plus, the kernel
|> is constantly loading the same decompression routine from
|> disk. Wasteful.
|>
|> The advantages:
|>
|> 1. It's pretty easy, considering that I already have the
|> code for dumping a running process into an executable file.
|> 2. Nobody has to be aware of it. A compressed program would
|> run just like any other.
|>
|> Here's another plan:
|>
|> 1. Edit the kernel code for exec, to look for a new 'compressed
|> executable' magic number.
|> 2. Have it then decompress/load the file into ram, and
|> 3. Have it then proceed as if it had just loaded a non-demand-paged
|> executable.
|>
|> The advantages.
|>
|> 1. Its very clean.
|> 2. The decompression routine is in the kernel, so it only
|> needs to be loaded once.
|> 3. It sounds more reliable than the above approach.
|>
|> The disadvantages.
|>
|> 1. The kernel is getting bigger (although not much,
|> assuming that the decompression routine is small).
|> 2. You have to have the decompressing-kernel to run
|> compressed programs.
|> 3. Again, compressed files are not demand-paged. (I, personally,
|> could care less - I tend to think that demand-paging slows
|> things down as much as it speeds things up).
|>
|> Whaddaya think?
|>
|> Also, about compression algorithms:
|>
|> I have considered using 16-bit Huffman codes, but I'm only getting
|> a 70% compression ratio. Lempel-Ziv is okay (that's what compress
|> uses), but it takes gobs of ram to use it. Besides, I prefer a
|> static scheme- one where the decompression tables are fixed. (after
|> all, everything being compressed will be stylized gnu C 386 code...
|> why use an adaptive algorithm when there is nothing to adapt to?)
|> On that note, I was also considering inventing a specialized
|> algorithm for compressing 386 opcodes (it would still produce correct
|> output on other stuff, it just wouldn't be able to compress much).
|> Anybody know anything concrete about the feasibility of this?
|>
|> PS: In order to do this I would need sample data - namely, assembly
|> listings containing BOTH opcodes and mnemonics, so I can pore them
|> over and see what kinds of regular patterns the gnu C compiler
|> typically produces. I cannot obtain this, since I do not have
|> access to a 386 right now. Would anybody be able to mail me a 100K
|> listing from the assembler?
|> y
|> Thanks in advance 4 ideas!
|> - Josh
|>
|> PS: this might help alleviate some of the root-disk woes!

No. I don't like the idea. Without demand-paging, Linux will be just like
another DOS. I don't want to see any DOS-like program in Linux.

H.J.