From: Stephen Tweedie (sct@dcs.ed.ac.uk)
Date: 04/14/93


From: sct@dcs.ed.ac.uk (Stephen Tweedie)
Subject: ext2fs compressed filesystem *IS* *ALREADY* under development.
Date: 15 Apr 1993 00:09:43 GMT

Well, there certainly seems to be a lot of interest in this, so I
though I'd post about work currently in progress on the ext2fs.

In outline:

    I am working on adding the option to transparently compress and
    uncompress ext2fs files. Compression will be selectable on a
    file-by-file basis.

    This will be an enhancement to the existing ext2fs. The new
    facility will be immediately useable by anybody with an ext2fs
    filesystem without requiring any reformatting.

In article <Afmu_xm00VopQZhEUo@andrew.cmu.edu>, fl0p+@andrew.cmu.edu (Frank T Lofaro) writes:
> NNTP-Posting-Host: andrew.cmu.edu

> It sounds like a useful idea for those pressed on diskspace, and seems
> worth implementing. Some problems that need to be addressed:

> 1. Speed of access.

The filesystem data structures and directory tree will remain
uncompressed for speed. Uncompression of data should not be too slow;
the code is based on gzip which is already reasonably fast at
decompression. Uncompressed data will be buffer-cached on the same
terms as raw device blocks, which will significantly improve
performance.

> 2. Demand-paging. It'll probably be really hard to get this to work
> with it.

Done. :-)

I have a set of kernel patches which implement an extension to the
standard kernel inode operations, to increase the flexibility of the
demand paging software. This will also incidentally allow demand
paging and caching of NFS data.

This work is already complete. As an example, it is now possible to
demand-page binaries and shared libraries from an msdos floppy using
sector (512-byte) alignment, even though such a filesystem cannot
support the bmap() call previously necessary for demand paging.

With this extension, demand paging just degenerates to randomly
accessing compressed data. I am storing a form of indirection data
within the file, similar to the inode's own direct/indirect block
pointers, which will allow random read access to the compressed file.

Incidentally, I don't intend (at least at first) to allow random
writes to the compressed file, but concurrent random reads will be
fully supported.

> 3. Backwards compatibility. Anyone out there know if there are any spare
> fields or bits (seems like this idea only needs to steal one) in the
> inode structure of the current filesystems?

There is in the ext2fs, which is what I am basing my code on.

> 4. How to have the kernel actually decompress? Exec an outside program?
> Build gunzip into the kernel?

Yup - internal kernel compression/decompression.

> This does seem a lot better than trying to compress the whole FS (not
> quite as efficient for space, but probably a hell of a lot easier to
> implement).

Indeed. It is also safer - you don't run nearly as much danger of
corrupting your entire filesystem if it is only individual files which
get compressed - more flexible, and faster (the high-level filesystem
data remains uncompressed).

The other advantage of per-file compression is that the superuser will
be able to directly access the raw compressed form of the data. This
will allow quick backup/restore of compressed files, for example, and
will also allow suid-root programs to do their own compression on
files (to get better compression than the kernel's default faster
compression), or to repair or check for damaged compressed files.

> Anyway, good luck with this idea. It has been mentioned before, but this
> time there is some proposed way of handling some of the implementation
> details; it seems quite feasible. I don't know enough about filesystems
> to know how easy/hard this will be tho...

It is definitely possible. All of the necessary design details have
(I hope) been addressed, and I hope to get something working over the
Easter holidays.

Cheers,
 Stephen Tweedie.