more RAM + faster disk -> slower box?!
paul at akita.co.uk
Wed Nov 28 22:03:58 GMT 2001
On Thursday 29 November 2001 12:34 pm, Ian Pallfreeman wrote:
> RAID, which, frankly, sucks, and has been the cause of much sorrow. The
> replacement disk is a SCSI 160MB/s, replacing an 80 -- same size, same
> geometry. And I took the opportunity to increase the memory from 256MB to
> 768MB, in the hope that this might further compensate for the bandwidth
> problem in the IDE RAID.
But not the same speed. I see from your attatched dmesg that you appear to
have one UDMA drive which you mount the root partition from, and two SCSI
disks at different speeds, with an Adaptec that will do 160Mb/sec - I think
putting two disks of different speeds is going to confuse the RAID
controller, and possibly BSD. More importantly, I think the faster drive is
going to physically thrash whilst waiting for the slower disk to catch up.
A good friend told me about one RAID array he saw with a load of 7200rpm
disks and one 5600 (or whatever), and the ensuing fun. You could *hear* the
disks clunking whilst waiting for the slow drive to catch up. It completely
confuses RAID controllers, and can cause serious hardware damage. I would
imagine that much insanity is going on inside this box with a faster drive
compared to the other. Think 'Mr. Toad syndrome' - going fast and shouting
'poop, poop!' whilst those around you prefer a slower pace of life can get
you into trouble. :-)
> popular groups into it OK. Now it ain't catching up at all.
I would normally start looking at the network, but if this is only the case
since the memory and disk got bumped up, that is unlikely to be the cause. To
me, this wreaks of a disk-thrash. I have in the past though, been famously
and dramatically proven wrong, so DYOR around this before taking my advice.
> Looking at ``systat -vm'' tells me none of the disks, even the poxy IDE
> RAID where the articles live, is terribly busy (whereas I'd be seeing
> 80-100% before the "upgrade"). The vinum-mirrored history/overview volume
> is practically idle. The load average has gone up from 1-2 to 4-5, and
> ``top'' shows me far more processes in RUN state, and for far longer,
> than I'd expect:
> 55248 news 64 0 4060K 3592K RUN 112:47 80.03% 80.03% fastrm
Yeah, this to me sounds more like disk thrash. Although [vm|sys]stat will
show a quiet disk, that's because they show the number of operations from an
OS point of view. It won't show that the machine is waiting on a disk or set
of disks, or that there is a speed mismatch in the box. The processes sitting
in RUN for ages is particularly interesting - they're waiting for something.
I'll give you three guesses what that might be... :-)
> I'm not used to seeing a ``fastrm'' burning CPU, even on old 50MHz Suns,
> and it's been running for hours longer than normal. A quick ``truss''
> shows me the expected calls to unlink(2), and nothing else.
Now I'm getting worried. This is starting to sound like serious SCSI and disk
problems. One suggestion - switch it off now. Try and lay your hands on
another 160Mb/sec disk, or another 80Mb/sec disk and try with both disks at
same speed and see where you go. Alternaitvely, drop the hardware RAID and
see if you can get it to work with vinum. I still think you'll have problems
> Does anybody have any suggestions, please? Obviously I could remove some of
> the RAM and see what happens, but that won't help me understand...
Go for it, but if taking out memory fixes the problem, then there will be a
public ceremony for Manchester BSDers in the Lass O' Gowrie where I will eat
my BSD horns and tail, date and time to be announced. :-)
> da1 at ahc1 bus 0 target 0 lun 0
> da1: <IBM DDYS-T18350N S96H> Fixed Direct Access SCSI-3 device
> da1: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing
> Enabled da1: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C)
> da2 at ahc1 bus 0 target 1 lun 0
> da2: <IBM DNES-318350W SAH0> Fixed Direct Access SCSI-3 device
> da2: 80.000MB/s transfers (40.000MHz, offset 30, 16bit), Tagged Queueing
> Enabled da2: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C)
> Mounting root from ufs:/dev/ad0s1a
If those two are in the same RAID set, I think I would have cause for concern.
More information about the Ukfreebsd