From: Carsten Fischer (cally@cs.tu-berlin.de)
Date: 02/21/93


From: cally@cs.tu-berlin.de (Carsten Fischer)
Subject: FS corruption: SCSI+WD8013+TCP/IP *Help*
Date: 21 Feb 1993 19:47:19 GMT


Problem: FS (!!!) (SCSI) corruption when using WD8013 *with*
          TCPIP enabled.

Configuration:

*(1)* :

  486dx-33, 20MB (16 enabled)
  AHA 1542B, (240 + 520) MB, (1.44 + 1.2) MB
  Serial Card, 2 ports RS232
  ET-4000 compatible
  WD8013 compatible

  SLS-0.99p2 (initial release, no updates)
  Kernel 99p2, 99p4, 99p5

  
(2):

  386sx-16, 5 MB
  ATBUS (125 + 43) MB, 1.44 MB
  non-ETxxxx compatible
  WD8013 compatible (-> 1st)

  SLS-0.99p2 (-> (1))
  Kernel 99p2

The following occurs on (1), if :
  o The WD8013 is plugged in AND the kernel is compiled with TCP/IP
    enabled.
Nothing goes wrong, if :
  o The WD8013 is /not/ plugged in (but TCP/IP enabled), or
  o The WD8013 /is/ plugged in but the kernel is /not/ compiled with
    TCP/IP enabled.

Example:

< Booting > (WD plugged in, TCP/IP enabled)
< /etc/rc: mounting / >
< mounting /usr ro >
< mkfs -c /dev/sda5 >
     (no errors)
< mounting /tmp, sda5 >
< cp -rv /usr/bin /tmp (or similar ...) >
< sync >
< fsck -v /dev/sda5 >
==> /Lots/ of errors like ..
     
     Block has been used before, now in file /tmp/xyz
     (...)
     Zone nnn: marked in use, no file uses it.
     (...)
     Zome mmm: in use, counted=n

After the 'cp', the filesystem on sda5 is nearly gone ...
It seems as even writing '/etc/mtab' is enough to damage the FS on '/'
(sometimes /lib/* vanishes ... :-( )
Not even one error occures when writing the data, nothing appeares in
/var/log/{notice,kernel} or as console output.

BTW, a 'cd /usr/src; tar xvzf linux-0.99p4.tar.Z' results in the same ...
This was the first time I discovered this problem: I was unable to recompile
the kernel as many files in src/linux/ were trashed.
Then after unmounting /usr (extfs), mount -t ext gave me a "Bad magic match".

Sometimes after 'damaging' my /tmp (and possible /), when trying to login
on a vc, I get something like "no free utmp entry".
One time, after 'fsck' didn't found any error, when trying to do a
'rm -rf /tmp/bin', I got sth. like
  (...)
  trying to free block not in datazone
  free_block (nnnn:mmmmm): bit already cleared"
  (...)
and then the vc hung, with the 'rm' still reading (stat "R").

It doesn't matter which type of FS I'm using, I reproduced this using
standard MinixFS, extfs and ext2fs(99p5) with Linux 0.99p{2,4,5}.
 
And, of course, it does /not/ matter which partition I use ....
This already broke sda{2,3,7}.

I edited net/tcp/Space.c to use IRQ 5 (networking ok !) and 7.
I even moved the base i/o address ... (-> 0x220).
BTW, since I'm using the AHA1542, nearly every time when booting, I get a funny
orange as background, which disappears when resetting several times ...
(When I used 0x220 as the WD's i/o base, i got a hot green ... :-) )
Now I'll move memstart/memend, allthough I don't know how this should affect
my problem.

The AHA1542 is using DC000 as BIOS address, I already tried CC000, with no
result.

Needless to say that nothing ever happend to the 2nd PC (ATBUS, TCPIP enabled).
And this is my first problem concerning SCSI ...

I already disabled every shadow, switched to low speed, plugged out my serial
card and enabled the AHA's BIOS waitstates (just to be sure :-) ...).

As a workaround I'm now using a kernel with TCP/IP disabled :-(

So, please, please, please ... *Help*

!!! *Any* help would be appreciated !!!

        Carsten

BTW: net/tcp/Space.c: The NET-FAQ says that one would have to alter
      'memory end' (WD8013: d4000).
      But what about "recv. memory end" ??

-- 
==============================================================================
*  Carsten Fischer   *   Zastrowstr. 7   *   W-1000 Berlin 42   *   Germany  *
*     e.mail :    cally@cs.tu-berlin.de                                      *
==============================================================================