From: Theodore Ts'o (tytso@ATHENA.MIT.EDU)
Date: 11/11/91


Subject: Re: rename, file system errors
Date: Mon, 11 Nov 1991 14:12:19 -0500
From: tytso@ATHENA.MIT.EDU (Theodore Ts'o)


   Date: Mon, 11 Nov 1991 15:53:55 +0200
   From: Linus Benedict Torvalds <torvalds@cc.helsinki.fi>

   gets symbolic links. I think the easiest way is to move downwards from
   the "to"-directory via ".." until one hits root (or a mount-point) or
   the "from"-directory. It shouldn't be hard, it's just SMOP.

Well, the problem is that another rename() could be happening while you are
traversing up the tree (towards the root) checking to see if you hit the
"from" directory. The simplest case where you will have a problem is if
rename(/a/b, /a/c/d) and rename(a/c, a/b/e) are running in parallel.
Another problem is that two rename()'s could try to move the same
directory to two different places. Now, locking takes care of this
quite nicely, but the interesting CS question is how little locking can
you do and still be make things be "correct." Or, is there some way
that we can finesse the issue without using locks?

   tytso@ATHENA.MIT.EDU (Theodore Ts'o):
> Speaking of hard disk corruptions, I've found an easy way to get Linux
> to corrupt the hard disk. Simply follow the following instructions:

   Happily, things aren't that bad (I think). I don't think it's a
   buffer-cache problem (which is hell to find - lots of race conditions
   etc. I kno - I had those kinds too), but a problem with handling "out
   of disk space". That's still bad, but easier to spot. Could you
   (tytso) check if the partition filled up? It's probably a bug in one of
   tha namei.c-routines which want a new block, and crap out if they cannot
   get it. I'll look.

No, my partition didn't fill. Originally I tried it with a partition
size of 65535, so I had lots of free space. I then tried it with "mkfs
/dev/hd3 32768"; the problem still happened. The second time, I noticed
the following things:

* /etc/fsck reported that the disk was completely happy before I ran
"compress -d < /tmp/gcc.tar.Z | tar xvf -"

* During the untaring process, there was at least one time when tar
printed somthing like this:

./stbout.h
tar: Cannot change owner of stdbout.h: ENOENT
tar: Cannot change modification times of stdbout.h: ENOENT
tar: (One more error message which I don't remember): ENOENT

This tends to indicate that the file just disappeared after tar closed
it, so it couldn't adjust the ownership and mod times. Perhaps a
directory update is getting lost?

* After tar finished, I sync'ed the disks and tried running /etc/fsck
again. This time, it reported 10 "Inode not in use, nlinks=0,
counted=1" errors for 10 consecutive inodes starting at #82, and 9
"Inode in use, nlinks=1, counted=0" errors for 9 consecutive inodes
starting at #143. This persisted after I logged out and reboot the
system.

* Upon reboot the disk statistics which were printed were:

        12876/32768 free blocks
        10461/10930 free inodes
        1500 buffers = 1536000 bytes buffer space
        free mem: 14680064 bytes

* If you run a "ls -l" on the directory into which tar placed files
(where stbout.h disappeared), you will get a kernel panic: "free_inode:
bit already cleared".

I wonder if the problem is due to the fact that I was using the same
drive for both source and destination for the "compress -d | tar xvf -"
pipeline. I did notice that it was very slow, presumably because there
was a lot of buffer thrashing going on.

                                                        - Ted