From: Scott A. Taylor (scott@natinst.com)
Date: 08/14/92


From: scott@natinst.com (Scott A. Taylor)
Subject: Re: Buffer corruption problems.
Date: Fri, 14 Aug 1992 13:31:32 GMT

In article <CORYWEST.92Aug13180939@rio-grande.rice.edu> corywest@rice.edu (Cory West) writes:
>In article <BURLEY.92Aug13153840@geech.gnu.ai.mit.edu>
>burley@geech.gnu.ai.mit.edu (Craig Burley) writes:
>
>> ...I believe there is a bug in Linux that has the following behavior:
>>
>> - causes Linux to "misread" one 1024KB chunk of data from a disk-based file
>> so that what your app ends up with is some _other_ 1024KB chunk
>> (apparently from the same file)
>>
>> - occurs only during very heavy disk access, such as megabytes accessed
>> continually
>>
>> - is intermittent, but happens enough to reproduce fairly easily
>[Etc...]
>
> I have noticed some abnormalities, but I have been writing them
>off to a disk block in the process of going bad. Here's what I have
>noticed:
>
> - Under heavy and prolonged disk I/O (in this example, while compiling
>gdb from scratch) there seem to be problems with the buffer cache. After
>compiling for a while, gcc will choke with a TON of strange errors and die.
>However, if I just restart make, the compiler will continue successfully
>with the file that it had just died on, but it will die a little later
>down the line (after some more intensive I/O) under the same circumstances.
>After a couple of tries, I can usually get through the entire make.
>
> - I am running on MFM drives on a 486-33 with 4 Megs RAM (gcc 2.2.2d and
>gcc 2.2.2 and Linux 0.97 PL1), so while compiling large things, my disks never
>stop to breath, especially if I am trying to do something else while the compile
>is running.
>
> - The errors always include the same file (which is why I thought
>perhaps that that particular file was living on a disk block that was going
>belly up. I plan to rename that file to .deadblock and putting a new copy
>of the file in the directory to test this theory). I am also going to run some
>more large compiles to see if I can reproduce this error elsewhere in the
>system.
>
> I don't know what it is yet, and I'm not sure if it's anything, but
>I'll see what I can reproduce and hopefully we can determine if this is an
>OS bug or a hardware bug and whether or not it has anything to do at all with
>the above problem.
>
> Anyone else out there having problems?
>
>
>
> Cory West
> corywest@rice.edu

I am running 0.96c pl2 with the latest SCSI drivers from woz.headrest.colorado.
edu, and I have not had any problems like this, even under heavy load (i.e.
building a new kernel in one VC while compiling groff, GNU file utils, text
utils, etc. in another). I have 8 megs of RAM and an UltraStor 14F w/ 213 MB
Maxtor disk in a 386-25 with no cache. I do not have swapping enabled. Maybe
this problem is paging- or cache-related?

I have used (and abused ;-) ) linux pretty heavily in the past, and it has
been solid as a rock (Thanks Linus and everyone else involved!).

-- 
Scott Taylor            |
(512) 795-6837          | "Well, I wanted to work with gymnasts." -David Byrne
scott@natinst.com       |
** NI pays me to write their code, not their opinions, and that's what I do **