503 lines
17 KiB
HTML
503 lines
17 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
|
|
<TITLE>Software-RAID HOWTO: Error Recovery</TITLE>
|
|
<LINK HREF="Software-RAID-0.4x-HOWTO-5.html" REL=next>
|
|
<LINK HREF="Software-RAID-0.4x-HOWTO-3.html" REL=previous>
|
|
<LINK HREF="Software-RAID-0.4x-HOWTO.html#toc4" REL=contents>
|
|
</HEAD>
|
|
<BODY>
|
|
<A HREF="Software-RAID-0.4x-HOWTO-5.html">Next</A>
|
|
<A HREF="Software-RAID-0.4x-HOWTO-3.html">Previous</A>
|
|
<A HREF="Software-RAID-0.4x-HOWTO.html#toc4">Contents</A>
|
|
<HR>
|
|
<H2><A NAME="s4">4. Error Recovery</A></H2>
|
|
|
|
<P>
|
|
<OL>
|
|
<LI><B>Q</B>:
|
|
I have a RAID-1 (mirroring) setup, and lost power
|
|
while there was disk activity. Now what do I do?
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
The redundancy of RAID levels is designed to protect against a
|
|
<B>disk</B> failure, not against a <B>power</B> failure.
|
|
|
|
There are several ways to recover from this situation.
|
|
|
|
<UL>
|
|
<LI>Method (1): Use the raid tools. These can be used to sync
|
|
the raid arrays. They do not fix file-system damage; after
|
|
the raid arrays are sync'ed, then the file-system still has
|
|
to be fixed with fsck. Raid arrays can be checked with
|
|
<CODE>ckraid /etc/raid1.conf</CODE> (for RAID-1, else,
|
|
<CODE>/etc/raid5.conf</CODE>, etc.)
|
|
|
|
Calling <CODE>ckraid /etc/raid1.conf --fix</CODE> will pick one of the
|
|
disks in the array (usually the first), and use that as the
|
|
master copy, and copy its blocks to the others in the mirror.
|
|
|
|
To designate which of the disks should be used as the master,
|
|
you can use the <CODE>--force-source</CODE> flag: for example,
|
|
<CODE>ckraid /etc/raid1.conf --fix --force-source /dev/hdc3</CODE>
|
|
|
|
The ckraid command can be safely run without the <CODE>--fix</CODE>
|
|
option
|
|
to verify the inactive RAID array without making any changes.
|
|
When you are comfortable with the proposed changes, supply
|
|
the <CODE>--fix</CODE> option.
|
|
</LI>
|
|
<LI>Method (2): Paranoid, time-consuming, not much better than the
|
|
first way. Lets assume a two-disk RAID-1 array, consisting of
|
|
partitions <CODE>/dev/hda3</CODE> and <CODE>/dev/hdc3</CODE>. You can
|
|
try the following:
|
|
<OL>
|
|
<LI><CODE>fsck /dev/hda3</CODE></LI>
|
|
<LI><CODE>fsck /dev/hdc3</CODE></LI>
|
|
<LI>decide which of the two partitions had fewer errors,
|
|
or were more easily recovered, or recovered the data
|
|
that you wanted. Pick one, either one, to be your new
|
|
``master'' copy. Say you picked <CODE>/dev/hdc3</CODE>. </LI>
|
|
<LI><CODE>dd if=/dev/hdc3 of=/dev/hda3</CODE></LI>
|
|
<LI><CODE>mkraid raid1.conf -f --only-superblock</CODE></LI>
|
|
</OL>
|
|
|
|
|
|
Instead of the last two steps, you can instead run
|
|
<CODE>ckraid /etc/raid1.conf --fix --force-source /dev/hdc3</CODE>
|
|
which should be a bit faster.
|
|
</LI>
|
|
<LI>Method (3): Lazy man's version of above. If you don't want to
|
|
wait for long fsck's to complete, it is perfectly fine to skip
|
|
the first three steps above, and move directly to the last
|
|
two steps.
|
|
Just be sure to run <CODE>fsck /dev/md0</CODE> after you are done.
|
|
Method (3) is actually just method (1) in disguise.</LI>
|
|
</UL>
|
|
|
|
|
|
In any case, the above steps will only sync up the raid arrays.
|
|
The file system probably needs fixing as well: for this,
|
|
fsck needs to be run on the active, unmounted md device.
|
|
|
|
<P>With a three-disk RAID-1 array, there are more possibilities,
|
|
such as using two disks to ''vote'' a majority answer. Tools
|
|
to automate this do not currently (September 97) exist.
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
I have a RAID-4 or a RAID-5 (parity) setup, and lost power while
|
|
there was disk activity. Now what do I do?
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
The redundancy of RAID levels is designed to protect against a
|
|
<B>disk</B> failure, not against a <B>power</B> failure.
|
|
|
|
Since the disks in a RAID-4 or RAID-5 array do not contain a file
|
|
system that fsck can read, there are fewer repair options. You
|
|
cannot use fsck to do preliminary checking and/or repair; you must
|
|
use ckraid first.
|
|
|
|
<P>The <CODE>ckraid</CODE> command can be safely run without the
|
|
<CODE>--fix</CODE> option
|
|
to verify the inactive RAID array without making any changes.
|
|
When you are comfortable with the proposed changes, supply
|
|
the <CODE>--fix</CODE> option.
|
|
<P>
|
|
<P>If you wish, you can try designating one of the disks as a ''failed
|
|
disk''. Do this with the <CODE>--suggest-failed-disk-mask</CODE> flag.
|
|
<P>Only one bit should be set in the flag: RAID-5 cannot recover two
|
|
failed disks.
|
|
The mask is a binary bit mask: thus:
|
|
<PRE>
|
|
0x1 == first disk
|
|
0x2 == second disk
|
|
0x4 == third disk
|
|
0x8 == fourth disk, etc.
|
|
|
|
</PRE>
|
|
<P>Alternately, you can choose to modify the parity sectors, by using
|
|
the <CODE>--suggest-fix-parity</CODE> flag. This will recompute the
|
|
parity from the other sectors.
|
|
<P>
|
|
<P>The flags <CODE>--suggest-failed-dsk-mask</CODE> and
|
|
<CODE>--suggest-fix-parity</CODE>
|
|
can be safely used for verification. No changes are made if the
|
|
<CODE>--fix</CODE> flag is not specified. Thus, you can experiment with
|
|
different possible repair schemes.
|
|
<P>
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
My RAID-1 device, <CODE>/dev/md0</CODE> consists of two hard drive
|
|
partitions: <CODE>/dev/hda3</CODE> and <CODE>/dev/hdc3</CODE>.
|
|
Recently, the disk with <CODE>/dev/hdc3</CODE> failed,
|
|
and was replaced with a new disk. My best friend,
|
|
who doesn't understand RAID, said that the correct thing to do now
|
|
is to ''<CODE>dd if=/dev/hda3 of=/dev/hdc3</CODE>''.
|
|
I tried this, but things still don't work.
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
You should keep your best friend away from you computer.
|
|
Fortunately, no serious damage has been done.
|
|
You can recover from this by running:
|
|
<BLOCKQUOTE><CODE>
|
|
<PRE>
|
|
mkraid raid1.conf -f --only-superblock
|
|
|
|
</PRE>
|
|
</CODE></BLOCKQUOTE>
|
|
|
|
By using <CODE>dd</CODE>, two identical copies of the partition
|
|
were created. This is almost correct, except that the RAID-1
|
|
kernel extension expects the RAID superblocks to be different.
|
|
Thus, when you try to reactivate RAID, the software will notice
|
|
the problem, and deactivate one of the two partitions.
|
|
By re-creating the superblock, you should have a fully usable
|
|
system.
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
My version of <CODE>mkraid</CODE> doesn't have a
|
|
<CODE>--only-superblock</CODE> flag. What do I do?
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
The newer tools drop support for this flag, replacing it with
|
|
the <CODE>--force-resync</CODE> flag. It has been reported
|
|
that the following sequence appears to work with the latest tools
|
|
and software:
|
|
<BLOCKQUOTE><CODE>
|
|
<PRE>
|
|
umount /web (where /dev/md0 was mounted on)
|
|
raidstop /dev/md0
|
|
mkraid /dev/md0 --force-resync --really-force
|
|
raidstart /dev/md0
|
|
|
|
</PRE>
|
|
</CODE></BLOCKQUOTE>
|
|
|
|
After doing this, a <CODE>cat /proc/mdstat</CODE> should report
|
|
<CODE>resync in progress</CODE>, and one should be able to
|
|
<CODE>mount /dev/md0</CODE> at this point.
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
My RAID-1 device, <CODE>/dev/md0</CODE> consists of two hard drive
|
|
partitions: <CODE>/dev/hda3</CODE> and <CODE>/dev/hdc3</CODE>.
|
|
My best (girl?)friend, who doesn't understand RAID,
|
|
ran <CODE>fsck</CODE> on <CODE>/dev/hda3</CODE> while I wasn't looking,
|
|
and now the RAID won't work. What should I do?
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
You should re-examine your concept of ``best friend''.
|
|
In general, <CODE>fsck</CODE> should never be run on the individual
|
|
partitions that compose a RAID array.
|
|
Assuming that neither of the partitions are/were heavily damaged,
|
|
no data loss has occurred, and the RAID-1 device can be recovered
|
|
as follows:
|
|
<OL>
|
|
<LI>make a backup of the file system on <CODE>/dev/hda3</CODE></LI>
|
|
<LI><CODE>dd if=/dev/hda3 of=/dev/hdc3</CODE></LI>
|
|
<LI><CODE>mkraid raid1.conf -f --only-superblock</CODE></LI>
|
|
</OL>
|
|
|
|
This should leave you with a working disk mirror.
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
Why does the above work as a recovery procedure?
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
Because each of the component partitions in a RAID-1 mirror
|
|
is a perfectly valid copy of the file system. In a pinch,
|
|
mirroring can be disabled, and one of the partitions
|
|
can be mounted and safely run as an ordinary, non-RAID
|
|
file system. When you are ready to restart using RAID-1,
|
|
then unmount the partition, and follow the above
|
|
instructions to restore the mirror. Note that the above
|
|
works ONLY for RAID-1, and not for any of the other levels.
|
|
|
|
<P>It may make you feel more comfortable to reverse the direction
|
|
of the copy above: copy <B>from</B> the disk that was untouched
|
|
<B>to</B> the one that was. Just be sure to fsck the final md.
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
I am confused by the above questions, but am not yet bailing out.
|
|
Is it safe to run <CODE>fsck /dev/md0</CODE> ?
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
Yes, it is safe to run <CODE>fsck</CODE> on the <CODE>md</CODE> devices.
|
|
In fact, this is the <B>only</B> safe place to run <CODE>fsck</CODE>.
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
If a disk is slowly failing, will it be obvious which one it is?
|
|
I am concerned that it won't be, and this confusion could lead to
|
|
some dangerous decisions by a sysadmin.
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
Once a disk fails, an error code will be returned from
|
|
the low level driver to the RAID driver.
|
|
The RAID driver will mark it as ``bad'' in the RAID superblocks
|
|
of the ``good'' disks (so we will later know which mirrors are
|
|
good and which aren't), and continue RAID operation
|
|
on the remaining operational mirrors.
|
|
|
|
<P>This, of course, assumes that the disk and the low level driver
|
|
can detect a read/write error, and will not silently corrupt data,
|
|
for example. This is true of current drives
|
|
(error detection schemes are being used internally),
|
|
and is the basis of RAID operation.
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
What about hot-repair?
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
Work is underway to complete ``hot reconstruction''.
|
|
With this feature, one can add several ``spare'' disks to
|
|
the RAID set (be it level 1 or 4/5), and once a disk fails,
|
|
it will be reconstructed on one of the spare disks in run time,
|
|
without ever needing to shut down the array.
|
|
|
|
<P>However, to use this feature, the spare disk must have
|
|
been declared at boot time, or it must be hot-added,
|
|
which requires the use of special cabinets and connectors
|
|
that allow a disk to be added while the electrical power is
|
|
on.
|
|
<P>
|
|
<P>As of October 97, there is a beta version of MD that
|
|
allows:
|
|
<UL>
|
|
<LI>RAID 1 and 5 reconstruction on spare drives</LI>
|
|
<LI>RAID-5 parity reconstruction after an unclean
|
|
shutdown</LI>
|
|
<LI>spare disk to be hot-added to an already running
|
|
RAID 1 or 4/5 array</LI>
|
|
</UL>
|
|
|
|
By default, automatic reconstruction is (Dec 97) currently
|
|
disabled by default, due to the preliminary nature of this
|
|
work. It can be enabled by changing the value of
|
|
<CODE>SUPPORT_RECONSTRUCTION</CODE> in
|
|
<CODE>include/linux/md.h</CODE>.
|
|
<P>
|
|
<P>If spare drives were configured into the array when it
|
|
was created and kernel-based reconstruction is enabled,
|
|
the spare drive will already contain a RAID superblock
|
|
(written by <CODE>mkraid</CODE>), and the kernel will
|
|
reconstruct its contents automatically (without needing
|
|
the usual <CODE>mdstop</CODE>, replace drive, <CODE>ckraid</CODE>,
|
|
<CODE>mdrun</CODE> steps).
|
|
<P>
|
|
<P>If you are not running automatic reconstruction, and have
|
|
not configured a hot-spare disk, the procedure described by
|
|
Gadi Oxman
|
|
<
|
|
<A HREF="mailto:gadio@netvision.net.il">gadio@netvision.net.il</A>>
|
|
is recommended:
|
|
<UL>
|
|
<LI>Currently, once the first disk is removed, the RAID set will be
|
|
running in degraded mode. To restore full operation mode,
|
|
you need to:
|
|
<UL>
|
|
<LI>stop the array (<CODE>mdstop /dev/md0</CODE>)</LI>
|
|
<LI>replace the failed drive</LI>
|
|
<LI>run <CODE>ckraid raid.conf</CODE> to reconstruct its contents</LI>
|
|
<LI>run the array again (<CODE>mdadd</CODE>, <CODE>mdrun</CODE>).</LI>
|
|
</UL>
|
|
|
|
At this point, the array will be running with all the drives,
|
|
and again protects against a failure of a single drive.</LI>
|
|
</UL>
|
|
<P>Currently, it is not possible to assign single hot-spare disk
|
|
to several arrays. Each array requires it's own hot-spare.
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
I would like to have an audible alarm for
|
|
``you schmuck, one disk in the mirror is down'',
|
|
so that the novice sysadmin knows that there is a problem.
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
The kernel is logging the event with a
|
|
``<CODE>KERN_ALERT</CODE>'' priority in syslog.
|
|
There are several software packages that will monitor the
|
|
syslog files, and beep the PC speaker, call a pager, send e-mail,
|
|
etc. automatically.
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
How do I run RAID-5 in degraded mode
|
|
(with one disk failed, and not yet replaced)?
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
Gadi Oxman
|
|
<
|
|
<A HREF="mailto:gadio@netvision.net.il">gadio@netvision.net.il</A>>
|
|
writes:
|
|
Normally, to run a RAID-5 set of n drives you have to:
|
|
<BLOCKQUOTE><CODE>
|
|
<PRE>
|
|
mdadd /dev/md0 /dev/disk1 ... /dev/disk(n)
|
|
mdrun -p5 /dev/md0
|
|
|
|
</PRE>
|
|
</CODE></BLOCKQUOTE>
|
|
|
|
Even if one of the disks has failed,
|
|
you still have to <CODE>mdadd</CODE> it as you would in a normal setup.
|
|
(?? try using /dev/null in place of the failed disk ???
|
|
watch out)
|
|
Then,
|
|
|
|
The array will be active in degraded mode with (n - 1) drives.
|
|
If ``<CODE>mdrun</CODE>'' fails, the kernel has noticed an error
|
|
(for example, several faulty drives, or an unclean shutdown).
|
|
Use ``<CODE>dmesg</CODE>'' to display the kernel error messages from
|
|
``<CODE>mdrun</CODE>''.
|
|
If the raid-5 set is corrupted due to a power loss,
|
|
rather than a disk crash, one can try to recover by
|
|
creating a new RAID superblock:
|
|
<BLOCKQUOTE><CODE>
|
|
<PRE>
|
|
mkraid -f --only-superblock raid5.conf
|
|
|
|
</PRE>
|
|
</CODE></BLOCKQUOTE>
|
|
|
|
A RAID array doesn't provide protection against a power failure or
|
|
a kernel crash, and can't guarantee correct recovery.
|
|
Rebuilding the superblock will simply cause the system to ignore
|
|
the condition by marking all the drives as ``OK'',
|
|
as if nothing happened.
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
How does RAID-5 work when a disk fails?
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
The typical operating scenario is as follows:
|
|
<UL>
|
|
<LI>A RAID-5 array is active.
|
|
</LI>
|
|
<LI>One drive fails while the array is active.
|
|
</LI>
|
|
<LI>The drive firmware and the low-level Linux disk/controller
|
|
drivers detect the failure and report an error code to the
|
|
MD driver.
|
|
</LI>
|
|
<LI>The MD driver continues to provide an error-free
|
|
<CODE>/dev/md0</CODE>
|
|
device to the higher levels of the kernel (with a performance
|
|
degradation) by using the remaining operational drives.
|
|
</LI>
|
|
<LI>The sysadmin can <CODE>umount /dev/md0</CODE> and
|
|
<CODE>mdstop /dev/md0</CODE> as usual.
|
|
</LI>
|
|
<LI>If the failed drive is not replaced, the sysadmin can still
|
|
start the array in degraded mode as usual, by running
|
|
<CODE>mdadd</CODE> and <CODE>mdrun</CODE>.</LI>
|
|
</UL>
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
Why is there no question 13?
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
If you are concerned about RAID, High Availability, and UPS,
|
|
then its probably a good idea to be superstitious as well.
|
|
It can't hurt, can it?
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
I just replaced a failed disk in a RAID-5 array. After
|
|
rebuilding the array, <CODE>fsck</CODE> is reporting many, many
|
|
errors. Is this normal?
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
No. And, unless you ran fsck in "verify only; do not update"
|
|
mode, its quite possible that you have corrupted your data.
|
|
Unfortunately, a not-uncommon scenario is one of
|
|
accidentally changing the disk order in a RAID-5 array,
|
|
after replacing a hard drive. Although the RAID superblock
|
|
stores the proper order, not all tools use this information.
|
|
In particular, the current version of <CODE>ckraid</CODE>
|
|
will use the information specified with the <CODE>-f</CODE>
|
|
flag (typically, the file <CODE>/etc/raid5.conf</CODE>)
|
|
instead of the data in the superblock. If the specified
|
|
order is incorrect, then the replaced disk will be
|
|
reconstructed incorrectly. The symptom of this
|
|
kind of mistake seems to be heavy & numerous <CODE>fsck</CODE>
|
|
errors.
|
|
|
|
<P>And, in case you are wondering, <B>yes</B>, someone lost
|
|
<B>all</B> of their data by making this mistake. Making
|
|
a tape backup of <B>all</B> data before reconfiguring a
|
|
RAID array is <B>strongly recommended</B>.
|
|
</BLOCKQUOTE>
|
|
|
|
</LI>
|
|
<LI><B>Q</B>:
|
|
The QuickStart says that <CODE>mdstop</CODE> is just to make sure that the
|
|
disks are sync'ed. Is this REALLY necessary? Isn't unmounting the
|
|
file systems enough?
|
|
|
|
<BLOCKQUOTE>
|
|
<B>A</B>:
|
|
The command <CODE>mdstop /dev/md0</CODE> will:
|
|
<UL>
|
|
<LI>mark it ''clean''. This allows us to detect unclean shutdowns, for
|
|
example due to a power failure or a kernel crash.
|
|
</LI>
|
|
<LI>sync the array. This is less important after unmounting a
|
|
filesystem, but is important if the <CODE>/dev/md0</CODE> is
|
|
accessed directly rather than through a filesystem (for
|
|
example, by <CODE>e2fsck</CODE>).</LI>
|
|
</UL>
|
|
</BLOCKQUOTE>
|
|
|
|
|
|
</LI>
|
|
</OL>
|
|
|
|
<HR>
|
|
<A HREF="Software-RAID-0.4x-HOWTO-5.html">Next</A>
|
|
<A HREF="Software-RAID-0.4x-HOWTO-3.html">Previous</A>
|
|
<A HREF="Software-RAID-0.4x-HOWTO.html#toc4">Contents</A>
|
|
</BODY>
|
|
</HTML>
|