Page 2 of 2 <12
Topic Options
#177969 - 2007-07-16 09:21 AM Re: Translation, please [Re: Glenn Barnas]
Björn Offline
Korg Regular
*****

Registered: 2005-12-07
Posts: 953
Loc: Stockholm, Sweden.
@ Glenn, That was something neat I must say - something I really hadn't thought about \:\)
_________________________
as long as it works - why fix it?
If it doesn't work - kix-it!

Top
#177982 - 2007-07-16 04:56 PM Re: Translation, please [Re: Björn]
Richard H. Administrator Offline
Administrator
*****

Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
These are SAS drives connected directly to a RAID controller on each blade, so a chassis backplane issue is unlikely.

There is a known issue (though not with this drive model) where high throughput can break a mirror, presumably due to timing issues. However, these drives fail when the blades are near idle.

I've just received a drive firmware update - transferred via a third party file hosting site as this major player could not mail or FTP host the 9MB file!?

We'll see if it helps...

Top
#178026 - 2007-07-17 11:51 PM Re: Translation, please [Re: Richard H.]
StarwarsKid Offline
Seasoned Scripter
*****

Registered: 2005-06-15
Posts: 506
Loc: Oregon, USA
 Originally Posted By: Richard H.
It's 6 blades, each of which have two disks in a Raid 1 configuration.

On every blade, one of the disks is marked bad - if I reseat the disk it rebuilds and is good for a short while and then one of the pair is marked bad again, though not necessarily the same one that was originally bad

The servers and disks are very new, and all disks failed at around the same time.

So the tech support guy is sending me 12 new disks to replace *all* the disks in the blades. This means I will have to swap out the currently failed drive, wait for the RAID to repair then swap out the other drive.

Sounds like a pointless exercise to me, but then as I've already been instructed to apply a firmware upgrade which isn't supported on these disks and watched as the program failed completely to do any useful update nothing surprises me.


I had a similar issue with a DL360 server. It turned out to be the SCSI backplane. Only a few capacitors and an IC on the little backplane, but some how, it got messed up and a replacement fixed the erroneous RAID disk failure
_________________________
let the wise listen and add to their learning,
and let the discerning get guidance- Proverbs 1:5

Top
#178721 - 2007-08-06 09:38 AM Re: Translation, please [Re: StarwarsKid]
Richard H. Administrator Offline
Administrator
*****

Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
For those of you following this thread, we've got a solution.

It turns out that the problem is Oracle. The Oracle agent buggers up the mirror on the blades, and one of the disks gets marked bad. We've patched the agent, and the disks have been stable through the weekend.

I can only assume that the agent is doing some sort of low level SCSI ioctl or SMART query or something that interrupts the mirror long enough for the RAID card to see it as an error.

I take my hat off to the 3rd line support techs, it's not something that I had considered.

Top
#178722 - 2007-08-06 10:33 AM Re: Translation, please [Re: Richard H.]
Arend_ Moderator Offline
MM club member
*****

Registered: 2005-01-17
Posts: 1896
Loc: Hilversum, The Netherlands
Indeed an odd thing to happen, good to know though, thx for posting the solution.
Top
#178733 - 2007-08-06 06:43 PM Re: Translation, please [Re: Richard H.]
NTDOC Administrator Offline
Administrator
*****

Registered: 2000-07-28
Posts: 11629
Loc: CA
Are the drives concatenated as Dynamic disks on Server 2003 ? If so I've seen this with MS SQL 2000 when it tries to re-index a large database.

I want to blame Windows but not sure if it's Windows or the array controller card that is indicating the error and can't keep up. It then takes the drives off line.

Top
#178787 - 2007-08-07 10:41 AM Re: Translation, please [Re: NTDOC]
Richard H. Administrator Offline
Administrator
*****

Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
 Originally Posted By: NTDOC
Are the drives concatenated as Dynamic disks on Server 2003


Nope. OS is RHEL 4, and the disks are RAIDed on a dedicated hardware card.

The Oracle agent which is causing the problem is not part of the core database - it is more of a monitoring / management service.

The Oracle patch for the agent is specifically targetted at this fault on these blades (Dell 1955), and appears to have resolved the issue.

Top
Page 2 of 2 <12


Moderator:  Arend_, Allen, Jochen, Radimus, Glenn Barnas, ShaneEP, Ruud van Velsen, Mart 
Hop to:
Shout Box

Who's Online
0 registered and 491 anonymous users online.
Newest Members
batdk82, StuTheCoder, M_Moore, BeeEm, min_seow
17885 Registered Users

Generated in 0.057 seconds in which 0.024 seconds were spent on a total of 13 queries. Zlib compression enabled.

Search the board with:
superb Board Search
or try with google:
Google
Web kixtart.org