From: jhood@smoke.marlboro.vt.us (John Hood) Subject: Nasty Bugs: Linux panics with 99p6/ext2fs/2 SCSI drives; serial problems Date: 8 Apr 1993 22:30:40 GMT
Help save me from restoring my Xenix backups :)
I'm having nasty, horrible problems bringing up Linux on my system
here. Due to a web of interlocking bugs, I can't get any of my
important stuff running right. I'm fairly new to Linux, but I've been
beating on Unix systems for years.
First, the configuration:
Linux 99p6, from SLS downloaded March 24th. Right now, I'm using a
99p6 kernel, although I have also seen the SCSI problems with a 99p7A
kernel. It's still a pretty standard 99p6 SLS system, though not for
long...I'm running ext2fs (or trying to) on two 300-ish meg disks; the
first has 10M for MSDOS, 16M swap, the rest for a Linux partition.
The second drive has a single Linux partition, with 150k inodes for
news. I'm trying to run mostly mail, a BBS, and a partial Usenet feed
on this machine.
A Tandy 4000 (first-generation 16MHz 386) with 4 MB of RAM and an
Adaptec 1542B SCSI controller, a Micropolis 1578 at ID 0, and a CDC
Wren IV (I think) OEMed to AT&T that probably has some minor OEM
firmware changes, at ID 1. Standard serial/parallel board with
16550A, AST 4-port board with 16550As on the first two ports (IRQ 3).
An Everex Excel 125 QIC-02 tape drive, with Everex 811B controller and
Wangtek 125M drive (IRQ 5). An old Microsoft bus mouse ("Logitech" to
Linux, actually) (IRQ 2).
[and before you start ragging on me about how bus-mastering
controllers often have problems with caches and old machines, note
that 1) the machine has no cache and 2) Tandy OEM'ed the original
1540 for use in this machine and 3) it's successfully run Xenix/386
for 1.5 years.]
I've installed Linux on a Gateway 4dx-66v, and I don't seem to be
having any problems over there.
Now, the bugs:
The most serious is that the kernel likes to panic a lot. It seems to
be when there is heavy access to both drives. Unbatching or expiring
news serves very nicely. :( I've had uptimes of at most a few hours
and half my uptime has been spent running e2fsck to repair the damage
from the panics. Not Fun.
The message I get is "Kernel panic: scsi_disk: request list
destroyed"; this is apparently from sd.c's single use of the
INIT_REQUEST macro. I haven't bothered to track it further, since
everybody uses the data structure checked in there.
When it panics, it does manage to flush the buffer cache, which seems
an odd thing to do if disk i/o data structures are really trashed.
But I haven't had any obvious filesystem trashing yet, other than the
usual stuff that was half done when the kernel panic'ed cleanup.
I've retreated to putting everything on the first drive. Things have
been ok for a whole eight hours now, but things are a little crowded
:)
Other, more minor kernel problems:
Once in a while, I have gotten a "SCSI request aborted" message from
the kernel. Unfortunately I haven't written it down and it hasn't
gotten into my system log, so I can't be very precise about it.
Fairly often, the system thinks that the second drive has a bad
partition table and refuses to mount it. If I look at the drive with
fdisk or hexdump, it does appear to be trashed, but rebooting usually
clears the problem on the first try. This might be some kind of
hardware problem with the drive. Also, if I do a dd < /dev/sdb >
/dev/null, the drive light shows dim for a second and then comes on
bright for a full blast read. If I then do the dd again, the drive
light comes on brightly immediately.
I can't do an out-of-the box configure that works with my serial
ports. I have a COM1 and an AST 4-port on IRQ5. The default kernel
config has the AST 4-port on IRQ2. (It also seems to have the primary
and secondary AST 4-port cards backwards, but I can deal with that.)
If I have auto config turned off, I can't move the AST ports over to
IRQ5 later with setserial, because the bus mouse is on IRQ2, and so
setserial isn't able to open the ports and apply its ioctl magic. At
least, I think that's what's going on. The error message and code
relating to this are exceedingly obscure, BTW.
If I have autoconfig turned on, Linux doesn't see my COM1 serial port
at all, apparently because it doesn't prod it into generating an
interrupt (I haven't looked at the code really closely yet).
Non-kernel problems:
I can't get getty_ps 2.0.6b to work on cua*. It just goes right on
and opens the port, with no wait. I really do want dial-in/dial-out
lines. Any clues? I have the modems configured for echo and result
codes, but it seems getty_ps shouldn't care about this.
Who-knows-where-it-is problem:
I would move to 99p7A and later, but I have seen my mail get stomped
because of locking problems. Vince Skahan mentions this problem in
his FAQ. Anybody else seen it? Anybody got a fix? :)
Thanks for any help!
--jh
-- John Hood Cthulhu-- just imagine it! jhood@smoke.marlboro.vt.us