Hard disk woes

Michael Abbott michael at araneidae.co.uk
Sun Sep 4 11:12:23 BST 2005


I'm having some very odd behaviour from one of my hard disks and I wonder 
what you make of it.

In brief, the hard disk in questions works just fine much of the time, but 
when high volume data transfers are requested I get the following in 
/var/log/messages:

Sep  3 15:21:02 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting
Sep  3 15:21:02 saturn /kernel: ata3: resetting devices .. done
Sep  3 15:21:12 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting
Sep  3 15:21:12 saturn /kernel: ata3: resetting devices .. done
Sep  3 15:21:23 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting
Sep  3 15:21:23 saturn /kernel: ata3: resetting devices .. done
Sep  3 15:21:33 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting
Sep  3 15:21:33 saturn /kernel: ad6: trying fallback to PIO mode
Sep  3 15:21:33 saturn /kernel: ata3: resetting devices .. done
Sep  3 15:21:43 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting
Sep  3 15:21:43 saturn /kernel: ata3: resetting devices .. ata3-slave: ATA identify retries exceeded
Sep  3 15:21:43 saturn /kernel: done

After this point the hard disk in question is frozen until I reboot, and 
any process that tries to touch it is similarly frozen (doesn't even 
respond to kill -9).  `shutdown -r` is enough to restore operation, and 
the rest of the system seemed happy enough.

Another interesting effect.  I placed a replacement hard disk on the same 
ATA bus (as a slave, device ad7) and tried copying files from ad6 to ad7. 
This time when ad6 froze and the kerned decided to give up on ata3 (and so 
decided to disable ad7 at the same time, naturally enough) the entire 
system froze!  No response from the console, stone cold dead, hard reset 
needed.


So some questions seem to me to arise from this.

1.  Why does FreeBSD handle this so ungracefully?  If restarting is 
sufficient to bring ata3 back then can't the ata driver do a proper 
restart?

2.  Goodness me, FreeBSD froze!  I know it's a hardware failure, but 
still: it's on a auxillary ATA controller with no system files attached. 
Is this problem of general interest?  It's certainly a massive hint to me 
not to consider (parallel) ATA for RAID!

3.  Any thoughts on what is wrong with the hard disk in question?  I've 
changed ATA controllers, so it seems to be the disk, not the controller. 
The behaviour is very odd.  If I copy files off one at a time, eg using:
 	find . -type f -exec cp {} "$TARGET/"{} \; -exec echo -n '.' \;
the disk seems to hang in there, but if I just do
 	cp -R . "$TARGET"
then it freezes!  (This statement may not have been thoroughly tested: 
having to restart each time gets old quite quickly.)


Ok, now for the boring bits.

$ uname -a
FreeBSD saturn.araneidae.co.uk 4.11-RELEASE-p11 FreeBSD 4.11-RELEASE-p11 #6: Sat Aug 27 16:33:58 GMT 2005     root at saturn.araneidae.co.uk:/usr/obj/usr/src/sys/GENERIC  i386
$ dmesg | grep ata
atapci0: <HighPoint HPT370 ATA100 controller> port 0xa000-0xa0ff,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 irq 12 at device 11.0 on pci0
ata2: at 0x9000 on atapci0
ata3: at 0x9800 on atapci0
atapci1: <VIA 8233 ATA133 controller> port 0xa800-0xa80f at device 17.1 on pci0
ata0: at 0x1f0 irq 14 on atapci1
ata1: at 0x170 irq 15 on atapci1
atapci2: <HighPoint HPT372 ATA133 controller> port 0xc400-0xc4ff,0xc000-0xc003,0xbc00-0xbc07,0xb800-0xb803,0xb400-0xb407 irq 10 at device 19.0 on pci0
ata4: at 0xb400 on atapci2
ata5: at 0xbc00 on atapci2
ad0: 39083MB <Maxtor 4D040H2> [79408/16/63] at ata0-master UDMA100
ad1: 190782MB <SAMSUNG SP2014N> [387621/16/63] at ata0-slave UDMA133
ad4: 76319MB <ST380021A> [155061/16/63] at ata2-master UDMA100
ad6: 76319MB <ST380021A> [155061/16/63] at ata3-master UDMA100
acd0: DVD-ROM <CREATIVEDVD-ROM DVD2240E 12/24/97> at ata1-master PIO4
$

That's everything I can think of.





More information about the Ukfreebsd mailing list