From: hedrick@dartagnan.rutgers.edu (Charles Hedrick) Subject: caching Date: 9 Jul 1992 19:10:18 GMT
There's one point about file system caching that hasn't been made yet.
The original design of the write cache had two pieces: the cache
itself and fsck. According to the documentation (and our experience
with many Unix systems confirms this) fsck is designed to be able to
recover from all kinds of errors that can occur due to the system
going down before a block has been written. This is the reason why
Unix systems force fsck to be run at every startup, unless it can be
determined that the file system was cleanly dismounted.
By the way, it's worth noting that delayed writes don't necessarily
have the effect you might expect. You get problems with file system
consistency when updates happen in the wrong order. Having all your
updates delayed 30 sec is not a problem. What is a problem is making
some changes, waiting 30 sec, and then making other changes that are
necessary in order to return the file system to consistency. Delaying
will be safe if either of the following is true:
- changes are always done in a safe order
- all changes that must be done together are either performed
immediately or all are delayed
If correct order is not preserved, there's actually some danger
whether writes are delayed or not. In either case, there's a window
of opportunity during the writing when a crash will cause trouble.
The window just comes at a different time if there's a delay.
So the actual danger depends upon the ordering of changes, whether all
changes are delayed the same amount, and whether fsck is properly
integrated into the scheme for maintaining file system integrity.
I don't know what the Linux strategy is in this area. I can say that
I had a number of hangs during X11 testing, some of which occured
before the file system had been synced. While sometimes a file that I
had just changed was not updated on disk, I never had a file system
inconsistency. In fact, since some file system race conditions will
fixed back in the 0.12 days, I haven't ever seen a file system
inconsistency. In my view hangs and crashes are rare enough in normal
operation that the danger of losing the last 30 sec. of your work
isn't worth worrying about. I would like to see the following:
- some scheme to make sure that file system integrity is not lost.
This could involve running fsck automatically, or making
sure that updates are done in the right order.
- making sure that whatever shutdown methods you document as
normal do a sync. Probably this means making c-a-d do
a sync.