100 lines
4.1 KiB
HTML
100 lines
4.1 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
|
|
<TITLE>Software-RAID HOWTO: High Availability RAID</TITLE>
|
|
<LINK HREF="Software-RAID-0.4x-HOWTO-10.html" REL=next>
|
|
<LINK HREF="Software-RAID-0.4x-HOWTO-8.html" REL=previous>
|
|
<LINK HREF="Software-RAID-0.4x-HOWTO.html#toc9" REL=contents>
|
|
</HEAD>
|
|
<BODY>
|
|
<A HREF="Software-RAID-0.4x-HOWTO-10.html">Next</A>
|
|
<A HREF="Software-RAID-0.4x-HOWTO-8.html">Previous</A>
|
|
<A HREF="Software-RAID-0.4x-HOWTO.html#toc9">Contents</A>
|
|
<HR>
|
|
<H2><A NAME="s9">9. High Availability RAID</A></H2>
|
|
|
|
<P>
|
|
<OL>
|
|
<LI><B>Q</B>:
|
|
RAID can help protect me against data loss. But how can I also
|
|
ensure that the system is up as long as possible, and not prone
|
|
to breakdown? Ideally, I want a system that is up 24 hours a
|
|
day, 7 days a week, 365 days a year.
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
High-Availability is difficult and expensive. The harder
|
|
you try to make a system be fault tolerant, the harder
|
|
and more expensive it gets. The following hints, tips,
|
|
ideas and unsubstantiated rumors may help you with this
|
|
quest.
|
|
<UL>
|
|
<LI>IDE disks can fail in such a way that the failed disk
|
|
on an IDE ribbon can also prevent the good disk on the
|
|
same ribbon from responding, thus making it look as
|
|
if two disks have failed. Since RAID does not
|
|
protect against two-disk failures, one should either
|
|
put only one disk on an IDE cable, or if there are two
|
|
disks, they should belong to different RAID sets.</LI>
|
|
<LI>SCSI disks can fail in such a way that the failed disk
|
|
on a SCSI chain can prevent any device on the chain
|
|
from being accessed. The failure mode involves a
|
|
short of the common (shared) device ready pin;
|
|
since this pin is shared, no arbitration can occur
|
|
until the short is removed. Thus, no two disks on the
|
|
same SCSI chain should belong to the same RAID array.</LI>
|
|
<LI>Similar remarks apply to the disk controllers.
|
|
Don't load up the channels on one controller; use
|
|
multiple controllers.</LI>
|
|
<LI>Don't use the same brand or model number for all of
|
|
the disks. It is not uncommon for severe electrical
|
|
storms to take out two or more disks. (Yes, we
|
|
all use surge suppressors, but these are not perfect
|
|
either). Heat & poor ventilation of the disk
|
|
enclosure are other disk killers. Cheap disks
|
|
often run hot.
|
|
Using different brands of disk & controller
|
|
decreases the likelihood that whatever took out one disk
|
|
(heat, physical shock, vibration, electrical surge)
|
|
will also damage the others on the same date.</LI>
|
|
<LI>To guard against controller or CPU failure,
|
|
it should be possible to build a SCSI disk enclosure
|
|
that is "twin-tailed": i.e. is connected to two
|
|
computers. One computer will mount the file-systems
|
|
read-write, while the second computer will mount them
|
|
read-only, and act as a hot spare. When the hot-spare
|
|
is able to determine that the master has failed (e.g.
|
|
through a watchdog), it will cut the power to the
|
|
master (to make sure that it's really off), and then
|
|
fsck & remount read-write. If anyone gets
|
|
this working, let me know.</LI>
|
|
<LI>Always use an UPS, and perform clean shutdowns.
|
|
Although an unclean shutdown may not damage the disks,
|
|
running ckraid on even small-ish arrays is painfully
|
|
slow. You want to avoid running ckraid as much as
|
|
possible. Or you can hack on the kernel and get the
|
|
hot-reconstruction code debugged ...</LI>
|
|
<LI>SCSI cables are well-known to be very temperamental
|
|
creatures, and prone to cause all sorts of problems.
|
|
Use the highest quality cabling that you can find for
|
|
sale. Use e.g. bubble-wrap to make sure that ribbon
|
|
cables to not get too close to one another and
|
|
cross-talk. Rigorously observe cable-length
|
|
restrictions.</LI>
|
|
<LI>Take a look at SSI (Serial Storage Architecture).
|
|
Although it is rather expensive, it is rumored
|
|
to be less prone to the failure modes that SCSI
|
|
exhibits.</LI>
|
|
<LI>Enjoy yourself, its later than you think.</LI>
|
|
</UL>
|
|
</BLOCKQUOTE>
|
|
</LI>
|
|
</OL>
|
|
<HR>
|
|
<A HREF="Software-RAID-0.4x-HOWTO-10.html">Next</A>
|
|
<A HREF="Software-RAID-0.4x-HOWTO-8.html">Previous</A>
|
|
<A HREF="Software-RAID-0.4x-HOWTO.html#toc9">Contents</A>
|
|
</BODY>
|
|
</HTML>
|