SATA PT2

Wed Jun 6 09:47:02 CDT 2007

This is an interesting idea.  You would still have the vulnerability of
a stripeset with "A" and "B" where a failure of a single disk would
cause the loss of data from two  stripesets.  Not sure I would want
that.  If your concerned about utilizing all the drive space on disk
drives with different sizes you may be SOL on that unless you use a SAN
disk virtualization methodology.

As for figuring out the second parity calculation on RAID 6, what the
manufacturers are realizing is that they don't necessarily have to have
a different parity algorithm to calculate the second parity.  Simply
putting the same XOR parity data on two separate disks will provide the
same RAID 6 functionality as having a second parity calculation with
lower overhead on controllers.  The old KISS methodology is coming back
into play.  I think you will see more and more of the manufacturers
going this route.

Phil

________________________________

	Alternatively, make it two separate 4-drive RAID 5 arrays.

	To be brutally honest, I tried to read the technical description
of how the second parity is calculated, but I can't make sense of it. 

	In our 8-drive scenario, we have a stripe of 6 data blocks, a
parity block which is the xor of the 6 data, and some funkified thing I
can't wrap my brain around, that somehow allows us to reconstruct any
two failed data blocks, so long as we have the other four plus the two
parity blocks.  And I just don't get how that can work.  For the sake of
argument, I'll assume it can somehow; it's got to be a hell of a lot
more complicated than XOR. 

	I might even recommend the 8 drives be set up as a 7-drive RAID
5 with the 8th drive in the bullpen waiting to be thrown in as a hot
spare. When a drive fails, the relief drive would be brought up to speed
with the others automagically, and you'd get a text message on your cell
phone telling you that a drive had failed in the array. 

	[Tangent alert]

	Looking at the Drobo (http://www.drobo.com/) has me thinking
about how I'd design a software RAID solution to automatically use the
space the best way, regardless of how many and how large the drives are.

	Let's say we have a cage that can handle up to 8 hot-pluggable
drives.  A single drive is trivial; it's just a disk.  With two or more
drives, the system would allocate all the space on the smaller drives to
mirror the same amount of space on the larger one in a RAID-1
configuration: 

	D1 AAAAAAA
	D2 AAAAAAABB
	D3 AAAAAAABB
	...
	DN AAAAAAABBZZZZZ

	The areas marked A would be mirrored across all drives.  Reads
could be done in parallel, fetching different sectors from each drive to
get the benefit of striping.  The areas marked B would be mirrored
across the drives on which they exist, and Z would not be mirrorred at
all, so it would not be available except for applications that have
specifically asked for the unmirrored filesystem (the box would export
several filesystems to the network, each with different redundancy
priority; you could use non-redundant space for work files that could be
reconstructed if a drive failed.). 

	Now, once all of the A and B areas are nearly full, the system
starts converting them, a bit at a time, into RAID 5:

	D1 aAAAAAA
	D2 aAAAAAABB
	D3 aAAAAAABB
	...
	DN aAAAAAABBZZZZZ

	D1 aaAAAAA
	D2 aaAAAAABB
	D3 aaAAAAABB
	...
	DN aaAAAAABBZZZZZ
	. . .
	D1 aaaaaaa
	D2 aaaaaaabb
	D3 aaaaaaabb
	...
	DN aaaaaaabbZZZZZ

	At this point, we've got all the drives doing as much RAID-5 as
they can, then another chunk of RAID-5 that's missing the smallest
drive, etc., until we get to the largest drive that still has that
damned unmirrored section.  This is as big as it can get without losing
redundancy.  We might allow the unmirrored area to grow downward and
take up some of the space toward the right, since that's the area where
parity is the highest percentage of data. 

	But somewhere along the line, we throw another drive into the
cage, and the system would automatically mirror the RAID-1 zone, and the
RAID-5 gets reconfigured to use the new drive, expanding the space
available in the process.  

	To do this right would probably include rewriting LVM code, but
it ought to be possible in userspace as a proof of concept.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://kclug.org/pipermail/kclug/attachments/20070606/b4b4b75e/attachment.htm