From 28f03af05a577efc78f85128140dbbe5441643e8 Mon Sep 17 00:00:00 2001 From: binh <> Date: Mon, 13 Dec 2004 10:00:42 +0000 Subject: [PATCH] Editing of new "Linux-Networking" guide. This copy is not to be distributed. Its just a draft to give people an idea as to the format of the new document and a backup just in case my laptop dies. Removed 8021X.xml, IRC.xml, PSTN.xml, Quota.xml, and RAID.xml Binh. --- LDP/guide/docbook/Linux-Networking/8021X.xml | 1019 -- .../Linux-Networking/About-the-Author.xml | 1 + .../docbook/Linux-Networking/Glossary.xml | 18 +- LDP/guide/docbook/Linux-Networking/IRC.xml | 579 -- LDP/guide/docbook/Linux-Networking/PSTN.xml | 26 - LDP/guide/docbook/Linux-Networking/Quota.xml | 451 - LDP/guide/docbook/Linux-Networking/RAID.xml | 8329 ----------------- LDP/guide/docbook/Linux-Networking/TCP-IP.xml | 12 +- 8 files changed, 26 insertions(+), 10409 deletions(-) delete mode 100644 LDP/guide/docbook/Linux-Networking/8021X.xml delete mode 100644 LDP/guide/docbook/Linux-Networking/IRC.xml delete mode 100644 LDP/guide/docbook/Linux-Networking/PSTN.xml delete mode 100644 LDP/guide/docbook/Linux-Networking/Quota.xml delete mode 100644 LDP/guide/docbook/Linux-Networking/RAID.xml diff --git a/LDP/guide/docbook/Linux-Networking/8021X.xml b/LDP/guide/docbook/Linux-Networking/8021X.xml deleted file mode 100644 index 6075fab5..00000000 --- a/LDP/guide/docbook/Linux-Networking/8021X.xml +++ /dev/null @@ -1,1019 +0,0 @@ - - -8021X - - -This section describes the software and procedures to set up and use -802.1X: Port-Based Network Access Control using Xsupplicant with PEAP (PEAP/ -MS-CHAPv2) as authentication method and FreeRADIUS as back-end authentication -server. - - - -If another authentication mechanism than PEAP is preferred, e.g., EAP-TLS -or EAP-TTLS, only a small number of configuration options needs to be -changed. PEAP/MS-CHAPv2 are also supported by Windows XP SP1/Windows 2000 -SP3. - - -1.1. What is 802.1X? - -The 802.1X-2001 standard states: - - -"Port-based network access control makes use of the physical access -characteristics of IEEE 802 LAN infrastructures in order to provide a means -of authenticating and authorizing devices attached to a LAN port that has -point-to-point connection characteristics, and of preventing access to that -port in cases which the authentication and authorization fails. A port in -this context is a single point of attachment to the LAN infrastructure." --- -802.1X-2001, page 1. - - -[8021X-Overview] - -Figure 802.1X: A wireless node must be authenticated before it can gain -access to other LAN resources. - - 1. When a new wireless node (WN) requests access to a LAN resource, the - access point (AP) asks for the WN's identity. No other traffic than EAP - is allowed before the WN is authenticated (the "port" is closed). - - The wireless node that requests authentication is often called - Supplicant, although it is more correct to say that the wireless node - contains a Supplicant. The Supplicant is responsible for responding to - Authenticator data that will establish its credentials. The same goes for - the access point; the Authenticator is not the access point. Rather, the - access point contains an Authenticator. The Authenticator does not even - need to be in the access point; it can be an external component. - - EAP, which is the protocol used for authentication, was originally used - for dial-up PPP. The identity was the username, and either PAP or CHAP - authentication [[http://www.ietf.org/rfc/rfc1994.txt] RFC1994] was used - to check the user's password. Since the identity is sent in clear (not - encrypted), a malicious sniffer may learn the user's identity. "Identity - hiding" is therefore used; the real identity is not sent before the - encrypted TLS tunnel is up. - - 2. After the identity has been sent, the authentication process begins. - The protocol used between the Supplicant and the Authenticator is EAP, - or, more correctly, EAP encapsulation over LAN (EAPOL). The Authenticator - re-encapsulates the EAP messages to RADIUS format, and passes them to the - Authentication Server. - - During authentication, the Authenticator just relays packets between - the Supplicant and the Authentication Server. When the authentication - process finishes, the Authentication Server sends a success message (or - failure, if the authentication failed). The Authenticator then opens the - "port" for the Supplicant. - - 3. After a successful authentication, the Supplicant is granted access to - other LAN resources/Internet. - - - See figure 802.1X for explanation. - - Why is it called "port"-based authentication? The Authenticator deals with -controlled and uncontrolled ports. Both the controlled and the uncontrolled -port are logical entities (virtual ports), but use the same physical -connection to the LAN (same point of attachment). - -[8021X-Ports] - -Figure port: The authorization state of the controlled port. - - Before authentication, only the uncontrolled port is "open". The only -traffic allowed is EAPOL; see Authenticator System 1 on figure port. After -the Supplicant has been authenticated, the controlled port is opened, and -access to other LAN resources are granted; see Authenticator System 2 on -figure port. - - 802.1X plays a major role in the new IEEE wireless standard 802.11i. ------------------------------------------------------------------------------ - -1.2. What is 802.11i? - -1.2.1. WEP - - -Wired Equivalent Privacy (WEP), which is part of the original 802.11 -standard, should provide confidentiality. Unfortunately WEP is poorly -designed and easily cracked. There is no authentication mechanism, only a -weak form of access control (must have the shared key to communicate). Read -more [http://www.isaac.cs.berkeley.edu/isaac/wep-faq.html] here. - - - -As a response to WEP broken security, IEEE has come up with a new wireless -security standard named 802.11i. 802.1X plays a major role in this new -standard. - - ------------------------------------------------------------------------------ - -1.2.2. 802.11i - - -The new security standard, 802.11i, which was ratified in June 2004, fixes -all WEP weaknesses. It is divided into three main categories: - - - 1. Temporary Key Integrity Protocol (TKIP) is a short-term solution that - fixes all WEP weaknesses. TKIP can be used with old 802.11 equipment - (after a driver/firmware upgrade) and provides integrity and - confidentiality. - - 2. Counter Mode with CBC-MAC Protocol (CCMP) [[http://www.ietf.org/rfc/ - rfc3610.txt] RFC2610] is a new protocol, designed from ground up. It uses - AES [FIPS 197] as its cryptographic algorithm, and, since this is more - CPU intensive than RC4 (used in WEP and TKIP), new 802.11 hardware may be - required. Some drivers can implement CCMP in software. CCMP provides - integrity and confidentiality. - - 3. 802.1X Port-Based Network Access Control: Either when using TKIP or - CCMP, 802.1X is used for authentication. - - -In addition, an optional encryption method called "Wireless Robust -Authentication Protocol" (WRAP) may be used instead of CCMP. WRAP was the -original AES-based proposal for 802.11i, but was replaced by CCMP since it -became plagued by property encumbrances. Support for WRAP is optional, but -CCMP support is mandatory in 802.11i. - - - 802.11i also has an extended key derivation/management, described next. ------------------------------------------------------------------------------ - -1.2.3. Key Management - -1.2.3.1. Dynamic key exchange and management - - To enforce a security policy using encryption and integrity algorithms, -keys must be obtained. Fortunately, 802.11i implements a key derivation/ -management regime. See figure KM. - -[8021X-KeyManagement] - -Figure KM: Key management and distribution in 802.11i. - - 1. When the Supplicant (WN) and Authentication Server (AS) authenticate, - one of the last messages sent from AS, given that authentication was - successful, is a Master Key (MK). After it has been sent, the MK is known - only to the WN and the AS. The MK is bound to this session between the WN - and the AS. - - 2. Both the WN and the AS derive a new key, called the Pairwise Master Key - (PMK), from the Master Key. - - 3. The PMK is then moved from the AS to the Authenticator (AP). Only the - WN and the AS can derive the PMK, else the AP could make access-control - decisions instead of the AS. The PMK is a fresh symmetric key bound to - this session between the WN and the AP. - - 4. PMK and a 4-way handshake are used between the WN and the AP to derive, - bind, and verify a Pairwise Transient Key (PTK). The PTK is a collection - of operational keys: - -   +  Key Confirmation Key (KCK), as the name implies, is used to prove - the posession of the PMK and to bind the PMK to the AP. - -   +  Key Encryption Key (KEK) is used to distributed the Group Transient - Key (GTK). Described below. - -   +  Temporal Key 1 & 2 (TK1/TK2) are used for encryption. Usage of TK1 - and TK2 is ciphersuite-specific. - - - See figure PKH for a overview of the Pairwise Key Hierarchy. - - 5. The KEK and a 4-way group handshake are then used to send the Group - Transient Key (GTK) from the AP to the WN. The GTK is a shared key among - all Supplicants connected to the same Authenticator, and is used to - secure multicast/broadcast traffic. - - -[8021X-KeyHierarchy] - -Figure PKH: Pairwise Key Hierarchy ------------------------------------------------------------------------------ - -1.2.3.2. Pre-shared Key - - For small office / home office (SOHO), ad-hoc networks or home usage, a -pre-shared key (PSK) may be used. When using PSK, the whole 802.1X -authentication process is elided. This has also been called "WPA Personal" -(WPA-PSK), whereas WPA using EAP (and RADIUS) is "WPA Enterprise" or just -"WPA". - - The 256-bit PSK is generated from a given password using PBKDFv2 from -[[http://www.ietf.org/rfc/rfc2898.txt] RFC2898], and is used as the Master -Key (MK) described in the key management regime above. It can be one single -PSK for the whole network (insecure), or one PSK per Supplicant (more -secure). ------------------------------------------------------------------------------ - -1.2.4. TSN (WPA) / RSN (WPA2) - - The industry didn't have time to wait until the 802.11i standard was -completed. They wanted the WEP issues fixed now! [http://www.wi-fi.org/] -Wi-Fi Alliance felt the pressure, took a "snapshot" of the standard (based on -draft 3), and called it Wi-Fi Protected Access (WPA). One requirement was -that existing 802.11 equipment could be used with WPA, so WPA is basically -TKIP + 802.1X. - - WPA is not the long term solution. To get a Robust Secure Network (RSN), -the hardware must support and use CCMP. RSN is basically CCMP + 802.1X. - - RSN, which uses TKIP instead of CCMP, is also called Transition Security -Network (TSN). RSN may also be called WPA2, so that the market don't get -confused. - - Confused? - - Basically: - -  * TSN = TKIP + 802.1X = WPA(1) - -  * RSN = CCMP + 802.1X = WPA2 - - -In addition comes key management, as described in the previous section. ------------------------------------------------------------------------------ - -1.3. What is EAP? - - Extensible Authentication Protocol (EAP) [[http://www.ietf.org/rfc/ -rfc3748.txt] RFC 3748] is just the transport protocol optimized for -authentication, not the authentication method itself: - - " [EAP is] an authentication framework which supports multiple -authentication methods. EAP typically runs directly over data link layers -such as Point-to-Point Protocol (PPP) or IEEE 802, without requiring IP. EAP -provides its own support for duplicate elimination and retransmission, but is -reliant on lower layer ordering guarantees. Fragmentation is not supported -within EAP itself; however, individual EAP methods may support this." --- RFC -3748, page 3 ------------------------------------------------------------------------------ - -1.4. EAP authentication methods - - Since 802.1X is using EAP, multiple different authentication schemes may be -added, including smart cards, Kerberos, public key, one time passwords, and -others. - - Some of the most-used EAP authentication mechanism are listed below. A full -list of registered EAP authentication types is available at IANA: [http:// -www.iana.org/assignments/eap-numbers] http://www.iana.org/assignments/ -eap-numbers. - -Warning Not all authentication mechanisms are considered secure! - -  *  EAP-MD5: MD5-Challenge requires username/password, and is equivalent to - the PPP CHAP protocol [[http://www.ietf.org/rfc/rfc1994.txt] RFC1994]. - This method does not provide dictionary attack resistance, mutual - authentication, or key derivation, and has therefore little use in a - wireless authentication enviroment. - -  *  Lightweight EAP (LEAP): A username/password combination is sent to a - Authentication Server (RADIUS) for authentication. Leap is a proprietary - protocol developed by Cisco, and is not considered secure. Cisco is - phasing out LEAP in favor of PEAP. The closest thing to a published - standard can be found [http://lists.cistron.nl/pipermail/cistron-radius/ - 2001-September/002042.html] here. - -  *  EAP-TLS: Creates a TLS session within EAP, between the Supplicant and - the Authentication Server. Both the server and the client(s) need a valid - (x509) certificate, and therefore a PKI. This method provides - authentication both ways. EAP-TLS is described in [[http://www.ietf.org/ - rfc/rfc2716.txt] RFC2716]. - -  *  EAP-TTLS: Sets up a encrypted TLS-tunnel for safe transport of - authentication data. Within the TLS tunnel, (any) other authentication - methods may be used. Developed by Funk Software and Meetinghouse, and is - currently an IETF draft. - -  *  Protected EAP (PEAP): Uses, as EAP-TTLS, an encrypted TLS-tunnel. - Supplicant certificates for both EAP-TTLS and EAP-PEAP are optional, but - server (AS) certificates are required. Developed by Microsoft, Cisco, and - RSA Security, and is currently an IETF draft. - -  *  EAP-MSCHAPv2: Requires username/password, and is basically an EAP - encapsulation of MS-CHAP-v2 [[http://www.ietf.org/rfc/rfc2759.txt] - RFC2759]. Usually used inside of a PEAP-encrypted tunnel. Developed by - Microsoft, and is currently an IETF draft. - - ------------------------------------------------------------------------------ -1.5. What is RADIUS? - - Remote Authentication Dial-In User Service (RADIUS) is defined in [[http:// -www.ietf.org/rfc/rfc2865.txt] RFC2865] (with friends), and was primarily used -by ISPs who authenticated username and password before the user got -authorized to use the ISP's network. - - 802.1X does not specify what kind of back-end authentication server must be -present, but RADIUS is the "de-facto" back-end authentication server used in -802.1X. - - There are not many AAA protocols available, but both RADIUS and DIAMETER -[[http://www.ietf.org/rfc/rfc3588.txt] RFC3588] (including their extensions) -conform to full AAA support. AAA stands for Authentication, Authorization, -and Accounting (IETF's AAA Working Group). ------------------------------------------------------------------------------ - -2. Obtaining Certificates - -Note OpenSSL must be installed to use either EAP-TLS, EAP-TTLS, or PEAP! - - When using EAP-TLS, both the Authentication Server and all the Supplicants -(clients) need certificates [[http://www.ietf.org/rfc/rfc2459.txt] RFC2459] . -Using EAP-TTLS or PEAP, only the Authentication Server requires certificates; -Supplicant certificates are optional. - - You get certificates from the local certificate authority (CA). If there is -no local CA available, OpenSSL may be used to generate self-signed -certificates. - - Included with the FreeRADIUS source are some helper scripts to generate -self-signed certificates. The scripts are located under the scripts/ folder -included with the FreeRADIUS source: - - CA.all is a shell script that generates certificates based on some -questions it ask. CA.certs generates certificates non-interactively based on -pre-defined information at the start of the script. - -Note The scripts uses a Perl script called CA.pl, included with OpenSSL. The - path to this Perl script in CA.all and CA.certs may need to be changed - to make it work. - -Tip More information on how to generate your own certificates can be found in - the SSL certificates HOWTO. ------------------------------------------------------------------------------ - -3. Authentication Server: Setting up FreeRADIUS - - FreeRADIUS is a fully GPLed RADIUS server implementation. It supports a -wide range of authentication mechanisms, but PEAP is used for the example in -this document. ------------------------------------------------------------------------------ - -3.1. Installing FreeRADIUS - -Installing FreeRADIUS - - 1. Head over to the FreeRADIUS site, [http://www.freeradius.org/] http:// - www.freeradius.org/, and download the latest release. - # cd /usr/local/src - # wget ftp://ftp.freeradius.org/pub/radius/freeradius-1.0.0.tar.gz - # tar zxfv freeradius-1.0.0.tar.gz - # cd freeradius-1.0.0 - - - 2. Configure, make and install: - # ./configure - # make - # make install - - - You can pass options to configure. Use ./configure --help or read the - README file, for more information. - - - The binaries are installed in /usr/local/bin and /usr/local/sbin. The -configuration files are found under /usr/local/etc/raddb. - - If something went wrong, check the INSTALL and README included with the -source. The [http://www.freeradius.org/faq/] RADIUS FAQ also contains -valuable information. ------------------------------------------------------------------------------ - -3.2. Configuring FreeRADIUS - - FreeRADIUS has a big and mighty configuration file. It's so big, it has -been split into several smaller files that are just "included" into the main -radius.conf file. - - There is numerous ways of using and setting up FreeRADIUS to do what you -want: i.e., fetch user information from LDAP, SQL, PDC, Kerberos, etc. In -this document, user information from a plain text file, users, is used. - -Tip The configuration files are thoroughly commented, and, if that is not - enough, the doc/ folder that comes with the source contains additional - information. - -Configuring FreeRADIUS - - 1. The configuration files can be found under /usr/local/etc/raddb/ - # cd /usr/local/etc/raddb/ - - - 2. Open the main configuration file radiusd.conf, and read the comments! - Inside the encrypted PEAP tunnel, an MS-CHAPv2 authentication mechanism - is used. - a. MPPE [[http://www.ietf.org/rfc/rfc3078.txt] RFC3078] is responsible - for sending the PMK to the AP. Make sure the following settings are - set: - # under MODULES, make sure mschap is uncommented! - mschap { - # authtype value, if present, will be used - # to overwrite (or add) Auth-Type during - # authorization. Normally, should be MS-CHAP - authtype = MS-CHAP - - # if use_mppe is not set to no, mschap will - # add MS-CHAP-MPPE-Keys for MS-CHAPv1 and - # MS-MPPE-Recv-Key/MS-MPPE-Send-Key for MS-CHAPv2 - # - use_mppe = yes - - # if mppe is enabled, require_encryption makes - # encryption moderate - # - require_encryption = yes - - # require_strong always requires 128 bit key - # encryption - # - require_strong = yes - - authtype = MS-CHAP - # The module can perform authentication itself, OR - # use a Windows Domain Controller. See the radius.conf file - # for how to do this. - } - - - b. Also make sure the "authorize" and "authenticate" contains: - authorize { - preprocess - mschap - suffix - eap - files - } - - authenticate { - - # - # MSCHAP authentication. - Auth-Type MS-CHAP { - mschap - } - - # - # Allow EAP authentication. - eap - } - - - - 3. Then, change the clients.conf file to specify what network it's - serving: - # Here, we specify which network we're serving - client 192.168.0.0/16 { - # This is the shared secret between the Authenticator (the - # access point) and the Authentication Server (RADIUS). - secret = SharedSecret99 - shortname = testnet - } - - - 4. The eap.conf should also be pretty straightforward. - a. Set "default_eap_type" to "peap": - default_eap_type = peap - - - b. Since PEAP is using TLS, the TLS section must contain: - tls { - # The private key password - private_key_password = SecretKeyPass77 - # The private key - private_key_file = ${raddbdir}/certs/cert-srv.pem - # Trusted Root CA list - CA_file = ${raddbdir}/certs/demoCA/cacert.pem - dh_file = ${raddbdir}/certs/dh - random_file = /dev/urandom - } - - - c. Find the "peap" section, and make sure it contain the following: - peap { - # The tunneled EAP session needs a default - # EAP type, which is separate from the one for - # the non-tunneled EAP module. Inside of the - # PEAP tunnel, we recommend using MS-CHAPv2, - # as that is the default type supported by - # Windows clients. - default_eap_type = mschapv2 - } - - - - 5. The user information is stored in a plain text file users. A more - sophisticated solution to store user information may be preferred (SQL, - LDAP, PDC, etc.). - - Make sure the users file contains the following entry: - "testuser" User-Password == "Secret149" - - - ------------------------------------------------------------------------------ -4. Supplicant: Setting up Xsupplicant - - The Supplicant is usually a laptop or other (wireless) device that requires -authentication. Xsupplicant does the bidding of being the "Supplicant" part -of the IEEE 802.1X-2001 standard. ------------------------------------------------------------------------------ - -4.1. Installing Xsupplicant - -Installing Xsupplicant - - 1. Download the latest source from from [http://www.open1x.org/] http:// - www.open1x.org/ - # cd /usr/local/src - # wget http://belnet.dl.sourceforge.net/sourceforge/open1x/xsupplicant-1.0.tar.gz - # tar zxfv xsupplicant-1.0.tar.gz - # cd xsupplicant - - - 2. Configure, make, and install: - # ./configure - # make - # make install - - - 3. If the configuration file wasn't installed (copied) into the "etc" - folder, do it manually: - # mkdir -p /usr/local/etc/1x - # cp etc/tls-example.conf /usr/local/etc/1x - - - - If installation fails, check the README and INSTALL files included with the -source. You may also check out the official documentation. ------------------------------------------------------------------------------ - -4.2. Configuring Xsupplicant - -Configuring Xsupplicant - - 1. The Supplicant must have access to the root certificate. - - If the Supplicant needs to authenticate against the Authentication - Server (authentication both ways), the Supplicant must have certificates - as well. - - Create a certificate folder, and move the certificates into it: - # mkdir -p /usr/local/etc/1x/certs - # cp root.pem /usr/local/etc/1x/certs/ - # (copy optional client certificate(s) into the same folder) - - - 2. Open and edit the configuration file: - # startup_command: the command to run when Xsupplicant is first started. - # This command can do things such as configure the card to associate with - # the network properly. - startup_command = /usr/local/etc/1x/startup.sh - - - The startup.sh will be created shortly. - - 3. When the client is authenticated, it will transmit a DHCP request or - manually set an IP address. Here, the Supplicant sets its IP address - manually in startup2.sh: - # first_auth_command: the command to run when Xsupplicant authenticates to - # a wireless network for the first time. This will usually be used to - # start a DHCP client process. - #first_auth_command = dhclient %i - first_auth_command = /usr/local/etc/1x/startup2.sh - - - 4. Since "-i" is just for debugging purpose (and may go away according to - the developers), "allow_interfaces" must be set: - allow_interfaces = eth0 - deny_interfaces = eth1 - - - 5. Next, under the "NETWORK SECTION", we'll configure PEAP: - # We'll be using PEAP - allow_types = eap_peap - - # Don't want any eavesdropper to learn the username during the - # first phase (which is unencrypted), so 'identity hiding' is - # used (using a bogus username). - identity = anonymous - - eap-peap { - # As in tls, define either a root certificate or a directory - # containing root certificates. - root_cert = /usr/local/etc/1x/certs/root.pem - #root_dir = /path/to/root/certificate/dir - #crl_dir = /path/to/dir/with/crl - chunk_size = 1398 - random_file = /dev/urandom - #cncheck = myradius.radius.com # Verify that the server certificate - # has this value in its CN field. - #cnexact = yes # Should it be an exact match? - session_resume = yes - - # Currently 'all' is just mschapv2. - # If no allow_types is defined, all is assumed. - #allow_types = all # where all = MSCHAPv2, MD5, OTP, GTC, SIM - allow_types = eap_mschapv2 - - # Right now, you can do any of these methods in PEAP: - eap-mschapv2 { - username = testuser - password = Secret149 - } - } - - - 6. The Supplicant must first associate with the access point. The script - startup.sh does that job. It is also the first command Xsupplicant - executes. - - Note Notice the bogus key we give to iwconfig (enc 000000000)! This key - is used to tell the driver to run in encrypted mode. The key gets - replaced after successful authentication. This can be set to enc off - only if encryption is disabled in the AP (for testing purposes). - - Both startup.sh and startup2.sh must be saved under /usr/local/etc/1x/. - #!/bin/bash - echo "Starting startup.sh" - # Take down interface (if it's up) - /sbin/ifconfig eth0 down - # To make sure the routes are flushed - sleep 1 - # Configuring the interface with a bogus key - /sbin/iwconfig eth0 mode managed essid testnet enc 000000000 - # Bring the interface up and make sure it listens to multicast packets - /sbin/ifconfig eth0 allmulti up - echo "Finished startup.sh" - - - 7. This next file is used to set the IP address statically. This can be - omitted if a DHCP server is present (as it typically is, in many access - points). - #!/bin/bash - echo "Starting startup2.sh" - # Assigning an IP address - /sbin/ifconfig eth0 192.168.1.5 netmask 255.255.255.0 - echo "Finished startup2.sh" - - - ------------------------------------------------------------------------------ -5. Authenticator: Setting up the Authenticator (Access Point) - - During the authentication process, the Authenticator just relays all -messages between the Supplicant and the Authentication Server (RADIUS). EAPOL -is used between the Supplicant and the Authenticator; and, between the -Authenticator and the Authentication Server, UDP is used. ------------------------------------------------------------------------------ - -5.1. Access Point - - Many access point have support for 802.1X (and RADIUS) authentication. It -must first be configured to use 802.1X authentication. - -Note Configuring and setting up 802.1X on the AP may differ between vendors. - Listed below are the required settings to make a Cisco AP350 work. Other - settings to TIKP, CCMP etc. may also be configured. - - The AP must set the ESSID to "testnet" and must activate: - -[8021X-CiscoAP] - -Figure AP350: The RADIUS configuration screen for a Cisco AP-350 - -  *  802.1X-2001: Make sure the 802.1X Protocol version is set to - "802.1X-2001". Some older Access Points support only the draft version of - the 802.1X standard (and may therefore not work). - -  *  RADIUS Server: the name/IP address of the RADIUS server and the shared - secret between the RADIUS server and the Access Point (which in this - document is "SharedSecret99"). See figure AP350. - -  *  EAP Authentication: The RADIUS server should be used for EAP - authentication. - - -[8021X-CiscoAP2] - -Figure AP350-2: The Encryption configuration screen for a Cisco AP-350 - -  *  Full Encryption to allow only encrypted traffic. Note that 802.1X may - be used without using encryption, which is nice for test purposes. - -  *  Open Authentication to make the Supplicant associate with the Access - Point before encryption keys are available. Once the association is done, - the Supplicant may start EAP authentication. - -  *  Require EAP for the "Open Authentication". That will ensure that only - authenticated users are allowed into the network. - - ------------------------------------------------------------------------------ -5.2. Linux Authenticator - - An ordinary Linux node can be set up to function as a wireless Access Point -and Authenticator. How to set up and use Linux as an AP is beyond the scope -of this document. Simon Anderson's Linux Wireless Access Point HOWTO may be -of guidance. ------------------------------------------------------------------------------ - -6. Testbed - -6.1. Testcase - -[8021X-Testbed] - -figure testbed: A wireless node request authentication. - - Our testbed consists of two nodes and one Access Point (AP). One node -functions as the Supplicant (WN), the other as the back-end Authentication -Server running RADIUS (AS). The Access Point is the Authenticator. See figure -testbed for explanation. - -Important It is crucial that the Access Point be able to reach (ping) the - Authentication Server, and vice versa! ------------------------------------------------------------------------------ - -6.2. Running some tests - -Running some tests - - 1. The RADIUS server is started in debug mode. This produces a lot of - debug information. The important snippets are below: - # radiusd -X - Starting - reading configuration files ... - reread_config: reading radiusd.conf - Config: including file: /usr/local/etc/raddb/proxy.conf - Config: including file: /usr/local/etc/raddb/clients.conf - Config: including file: /usr/local/etc/raddb/snmp.conf - Config: including file: /usr/local/etc/raddb/eap.conf - Config: including file: /usr/local/etc/raddb/sql.conf - ...... - Module: Loaded MS-CHAP - mschap: use_mppe = yes - mschap: require_encryption = no - mschap: require_strong = no - mschap: with_ntdomain_hack = no - mschap: passwd = "(null)" - mschap: authtype = "MS-CHAP" - mschap: ntlm_auth = "(null)" - Module: Instantiated mschap (mschap) - ...... - Module: Loaded eap - eap: default_eap_type = "peap" (1) - eap: timer_expire = 60 - eap: ignore_unknown_eap_types = no - eap: cisco_accounting_username_bug = no - rlm_eap: Loaded and initialized type md5 - tls: rsa_key_exchange = no (2) - tls: dh_key_exchange = yes - tls: rsa_key_length = 512 - tls: dh_key_length = 512 - tls: verify_depth = 0 - tls: CA_path = "(null)" - tls: pem_file_type = yes - tls: private_key_file = "/usr/local/etc/raddb/certs/cert-srv.pem" - tls: certificate_file = "/usr/local/etc/raddb/certs/cert-srv.pem" - tls: CA_file = "/usr/local/etc/raddb/certs/demoCA/cacert.pem" - tls: private_key_password = "SecretKeyPass77" - tls: dh_file = "/usr/local/etc/raddb/certs/dh" - tls: random_file = "/usr/local/etc/raddb/certs/random" - tls: fragment_size = 1024 - tls: include_length = yes - tls: check_crl = no - tls: check_cert_cn = "(null)" - rlm_eap: Loaded and initialized type tls - peap: default_eap_type = "mschapv2" (3) - peap: copy_request_to_tunnel = no - peap: use_tunneled_reply = no - peap: proxy_tunneled_request_as_eap = yes - rlm_eap: Loaded and initialized type peap - mschapv2: with_ntdomain_hack = no - rlm_eap: Loaded and initialized type mschapv2 - Module: Instantiated eap (eap) - ...... - Module: Loaded files - files: usersfile = "/usr/local/etc/raddb/users" (4) - ...... - Module: Instantiated radutmp (radutmp) - Listening on authentication *:1812 - Listening on accounting *:1813 - Ready to process requests. (5) - - - (1) Default EAP type is set to PEAP. - (2) RADIUS's TLS settings are initiated here. The certificate type, - location, and password are listet here. - (3) Inside the PEAP tunnel, MS-CHAPv2 is used. - (4) The username/password information is found in the users file. - (5) RADIUS server started successfully. Waiting for incoming requests. - - The radius server is now ready to process requests! - - The most interesting output is included above. If you get any error - message instead of the last line, go over the configuration (above) - carefully. - - 2. Now the Supplicant is ready to get authenticated. Start Xsupplicant in - debug mode. Note that we'll see output produced by the two startup - scripts: startup.sh and startup2.sh. - # xsupplicant -c /usr/local/etc/1x/1x.conf -i eth0 -d 6 - Starting /etc/1x/startup.sh - Finished /etc/1x/startup.sh - Starting /etc/1x/startup2.sh - Finished /etc/1x/startup2.sh - - - 3. At the same time, the RADIUS server is producing a lot of output. Key - snippets are shown below: - ...... - rlm_eap: Request found, released from the list - rlm_eap: EAP/peap - rlm_eap: processing type peap - rlm_eap_peap: Authenticate - rlm_eap_tls: processing TLS (1) - eaptls_verify returned 7 - rlm_eap_tls: Done initial handshake - eaptls_process returned 7 - rlm_eap_peap: EAPTLS_OK (2) - rlm_eap_peap: Session established. Decoding tunneled attributes. - rlm_eap_peap: Received EAP-TLV response. - rlm_eap_peap: Tunneled data is valid. - rlm_eap_peap: Success - rlm_eap: Freeing handler - modcall[authenticate]: module "eap" returns ok for request 8 - modcall: group authenticate returns ok for request 8 - Login OK: [testuser/] (from client testnet port 37 cli 0002a56fa08a) - Sending Access-Accept of id 8 to 192.168.2.1:1032 (3) - MS-MPPE-Recv-Key = 0xf21757b96f52ddaefe084c343778d0082c2c8e12ce18ae10a79c550ae61a5206 (4) - MS-MPPE-Send-Key = 0x5e1321e06a45f7ac9f78fb9d398cab5556bff6c9d003cdf8161683bfb7e7af18 - EAP-Message = 0x030a0004 - Message-Authenticator = 0x00000000000000000000000000000000 - User-Name = "testuser" - - - (1) TLS session startup. Doing TLS-handshake. - (2) The TLS session (PEAP-encrypted tunnel) is up. - (3) The Supplicant has been authenticated successfully by the RADIUS - server. An "Access-Accept" message is sent. - (4) The MS-MPPE-Recv-Key [[http://www.ietf.org/rfc/rfc2548.txt] RFC2548 - section 2.4.3] contains the Pairwise Master Key (PMK) destined to the - Authenticator (access point), encrypted with the MPPE Protocol - [[http://www.ietf.org/rfc/rfc3078.txt] RFC3078], using the shared - secret between the Authenticator and Authentication Server as key. - The Supplicant derives the same PMK from MK, as described in Key - Management. - - - 4. The Authenticator (access point) may also show something like this in - its log: - 00:02:16 (Info): Station 0002a56fa08a Associated - 00:02:17 (Info): Station=0002a56fa08a User="testuser" EAP-Authenticated - - - - That's it! The Supplicant is now authenticated to use the Access Point! ------------------------------------------------------------------------------ - -7. Note about driver support and Xsupplicant - - As described in Key Management, one of the big advantages of using Dynamic -WEP/802.11i with 802.1X is the support for session keys. A new encryption key -is generated for each session. - - Xsupplicant only supports "Dynamic WEP" as of this writing. Support for WPA -and RSN/WPA2 (802.11i) is being worked on, and is estimated to be supported -at the end of the year/early next year (2004/2005), according to Chris -Hessing (one of the Xsupplicants developers). - - Not all wireless drives support dynamic WEP, nor WPA. To use RSN (WPA2), -new support in hardware may even be required. Many older drivers assume only -one WEP key will be used on the network at any time. The card is reset -whenever the key is changed to let the new key take effect. This triggers a -new authentication, and there is a never-ending loop. - - At the time of writing, most of the wireless drivers in the base Linux -kernel require patching to make dynamic WEP/WPA work. They will, in time, be -upgraded to support these new features. Many drivers developed outside the -kernel, however, support for dynamic WEP; HostAP, madwifi, Orinoco, and atmel -should work without problems. - - Instead of using Xsupplicant, [http://hostap.epitest.fi/wpa_supplicant/] -wpa_supplicant may be used. It has support for both WPA and RSN (WPA2), and a -wide range of EAP authentication methods. ------------------------------------------------------------------------------ - -8. FAQ - - Do not forget to check out the FAQ section of both the [http:// -www.freeradius.org/faq/] FreeRADIUS (highly recommended!) and [http:// -sourceforge.net/docman/display_doc.php?docid=23371&group_id=60236#ch7] -Xsupplicant Web sites! - -8.1. Is it possible to allow user-specific Xsupplicant configuration, to - avoid having a global configuration file? -8.2. I don't want to use PEAP; can I use EAP-TTLS or EAP-TLS instead? -8.3. Can I use a Windows Supplicant (client) instead of GNU/Linux? -8.4. Can I use a Active Directory to authenticate users? -8.5. Is there any Windows Supplicant clients available? - -8.1. Is it possible to allow user-specific Xsupplicant configuration, to -avoid having a global configuration file? - -No, not at the moment. - -8.2. I don't want to use PEAP; can I use EAP-TTLS or EAP-TLS instead? - -Yes. To use EAP-TTLS, only small changes to the configuration used in this -document are required. To use EAP-TLS, client certificates must be used as -well. - -8.3. Can I use a Windows Supplicant (client) instead of GNU/Linux? - -Yes. Windows XP SP1/Windows 2000 SP3 has support for PEAP MSCHAPv2 (used in -this document). A Windows HOWTO can be found here: FreeRADIUS/WinXP -Authentication Setup - -8.4. Can I use a Active Directory to authenticate users? - -Yes. FreeRADIUS can authenticate users from AD by using "ntlm_auth". - -8.5. Is there any Windows Supplicant clients available? - -Yes. As of Windows XP SP1 or Windows 2000 SP3, support for WPA (PEAP/ -MS-CHAPv2) is supported. Other clients include (not tested) [http:// -www.securew2.com] Secure W2 (free for non-commercial) and [http:// -wire.cs.nthu.edu.tw/wire1x/] WIRE1X. [http://www.funk.com] Funk Software also -has a commercial client available. ------------------------------------------------------------------------------ - -9. Useful Resources - - Only IEEE standards older than 12 months are available to the public in -general (through the "Get IEEE 802 Program"). So the new 802.11i and -802.1X-2004 standards documents are not available. You must be a IEEE -participant to get hold of any drafts/work in progress papers (which actually -isn't that hard - just join a mailing list and say you are interested). - - 1. FreeRADIUS Server Project[http://www.freeradius.org/] http:// - www.freeradius.org/ - - 2. Open1x: Open Source implementation of IEEE 802.1X (Xsupplicant)[http:// - www.open1x.org/] http://www.open1x.org/ - - 3. The Open1x User's Guide http://sourceforge.net/docman/display_doc.php? - docid=23371&group_id=60236 - - 4. Port-Based Network Access Control (802.1X-2001)[http://standards.ieee.org - /getieee802/download/802.1X-2001.pdf] http://standards.ieee.org/ - getieee802/download/802.1X-2001.pdf - - 5. RFC2246: The TLS Protocol Version 1.0 http://www.ietf.org/rfc/rfc2246.txt - - 6. RFC2459: Internet X.509 Public Key Infrastructure - Certificate and CRL - Profile http://www.ietf.org/rfc/rfc2459.txt - - 7. RFC2548: Microsoft Vendor-specific RADIUS Attributes http://www.ietf.org/ - rfc/rfc2548.txt - - 8. RFC2716: PPP EAP TLS Authentication Protocol http://www.ietf.org/rfc/ - rfc2716.txt - - 9. RFC2865: Remote Authentication Dial-In User Service (RADIUS)[http:// - www.ietf.org/rfc/rfc2865.txt] http://www.ietf.org/rfc/rfc2865.txt - -10. RFC3079: Deriving Keys for use with Microsoft Point-to-Point Encryption - (MPPE)[http://www.ietf.org/rfc/rfc3079.txt] http://www.ietf.org/rfc/ - rfc3079.txt - -11. RFC3579: RADIUS Support For EAP[http://www.ietf.org/rfc/rfc3579.txt] - http://www.ietf.org/rfc/rfc3579.txt - -12. RFC3580: IEEE 802.1X RADIUS Usage Guidelines[http://www.ietf.org/rfc/ - rfc3580.txt] http://www.ietf.org/rfc/rfc3580.txt - -13. RFC3588: Diameter Base Protocol[http://www.ietf.org/rfc/rfc3588.txt] - http://www.ietf.org/rfc/rfc3588.txt - -14. RFC3610: Counter with CBC-MAC (CCM)[http://www.ietf.org/rfc/rfc3610.txt] - http://www.ietf.org/rfc/rfc3610.txt - -15. RFC3748: Extensible Authentication Protocol (EAP)[http://www.ietf.org/rfc - /rfc3748.txt] http://www.ietf.org/rfc/rfc3748.txt - -16. Linux Wireless Access Point HOWTO [http://oob.freeshell.org/nzwireless/ - LWAP-HOWTO.html] http://oob.freeshell.org/nzwireless/LWAP-HOWTO.html - -17. SSL Certificates HOWTO http://www.tldp.org/HOWTO/SSL-Certificates-HOWTO/ - -18. OpenSSL: x509(1) http://www.openssl.org/docs/apps/x509.html - - diff --git a/LDP/guide/docbook/Linux-Networking/About-the-Author.xml b/LDP/guide/docbook/Linux-Networking/About-the-Author.xml index f5e7ff65..96f0d8dd 100644 --- a/LDP/guide/docbook/Linux-Networking/About-the-Author.xml +++ b/LDP/guide/docbook/Linux-Networking/About-the-Author.xml @@ -52,6 +52,7 @@ University of Wales Swansea (United Kingdom), University of Ulster (Ireland), Universität Duisburg-Essen (Germany), + Universidad Rey Juan Carlos (Spain), and Universiti Sains Malaysia (Malaysia)). diff --git a/LDP/guide/docbook/Linux-Networking/Glossary.xml b/LDP/guide/docbook/Linux-Networking/Glossary.xml index 93fd126a..86cfd321 100644 --- a/LDP/guide/docbook/Linux-Networking/Glossary.xml +++ b/LDP/guide/docbook/Linux-Networking/Glossary.xml @@ -930,5 +930,21 @@ YP TCP-IP Transmission Control Protocol/Internet Protocol. It is the data communication protocol most often used on Unix machines. - + +PSTN (Public Service Telephone Network) + is the telephone system that is used thoughout the U.S. and many other + countries. Although never intended for networking, telephone lines can + be used for communications for computers. + + A modem (modulator/demodulator) is used to interface between a computer and + the telephone system. Modems can convert data into audible tones and back. + The fastest two-way modems currently available support a speed of 33.6 Kbps + (kilobits per second). + + Current modems advertise speeds up to 56 Kbps per second. These modems rely + on digital equipment being used in the phone company's central office and in + the facility (such as the Internet Service Provider) you are dialling into. + The 56 Kbps speed also works in only one direction; the other direction supports + 33.6 Kbps. + diff --git a/LDP/guide/docbook/Linux-Networking/IRC.xml b/LDP/guide/docbook/Linux-Networking/IRC.xml deleted file mode 100644 index deb1cc09..00000000 --- a/LDP/guide/docbook/Linux-Networking/IRC.xml +++ /dev/null @@ -1,579 +0,0 @@ - - -IRC - - -This document aims to describe the basics of IRC and respective applications -for Linux. - -1. Introduction - -This document is still WIP, and should be treated as such. I'll do my best to -keep it updated and accurate. - -The following bibles shouldn't be ignored: - -  * RFC1459 by Jarkko Oikarinen and Darren Reed was the first about the - Internet Relay Chat Protocol. It can be found at [http://ftp.isi.edu/ - in-notes/rfc1459.txt] http://ftp.isi.edu/in-notes/rfc1459.txt. - -  * RFC2810 by Christophe Kalt updates RFC1459 and describes the Architecture - of the Internet Relay Chat. It can be found at [http://ftp.isi.edu/ - in-notes/rfc2810.txt] http://ftp.isi.edu/in-notes/rfc2810.txt. - -  * RFC2811 by Christophe Kalt updates RFC1459 and describes the Channel - Management of the Internet Relay Chat. It can be found at [http:// - ftp.isi.edu/in-notes/rfc2811.txt] http://ftp.isi.edu/in-notes/ - rfc2811.txt. - -  * RFC2812 by Christophe Kalt updates RFC1459 and describes the Client - Protocol of the Internet Relay Chat. It can be found at [http:// - ftp.isi.edu/in-notes/rfc2812.txt] http://ftp.isi.edu/in-notes/ - rfc2812.txt. - -  * RFC2813 by Christophe Kalt updates RFC1459 and describes the Server - Protocol of the Internet Relay Chat. It can be found at [http:// - ftp.isi.edu/in-notes/rfc2813.txt] http://ftp.isi.edu/in-notes/ - rfc2813.txt. - - -Also be sure to check the following links: - -[http://www.irchelp.org/] http://www.irchelp.org/. ------------------------------------------------------------------------------ - -1.1. Objectives - -Among others, the objectives of this mini-HOWTO are: - -  * Link important resources about IRC; - -  * Avoid common misuses of IRC by writing an IRC Etiquette; - -  * List popular clients, servers, bots, and bouncers, along with their - maintainers, #channel, small description, download location, home page, - and hints; - -  * List IRC tools available in the latest release of all major - distributions. - ------------------------------------------------------------------------------ - -2. About IRC - -Excerpt from RFC2810: - -The IRC (Internet Relay Chat) protocol is for use with text based -conferencing. It has been developed since 1989 when it was originally -implemented as a mean for users on a BBS to chat amongst themselves. - -First formally documented in May 1993 by RFC 1459 [IRC], the protocol has -kept evolving. - -The IRC Protocol is based on the client-server model, and is well suited to -running on many machines in a distributed fashion. A typical setup involves a -single process (the server) forming a central point for clients (or other -servers) to connect to, performing the required message delivery/multiplexing -and other functions. - -This distributed model, which requires each server to have a copy of the -global state information, is still the most flagrant problem of the protocol -as it is a serious handicap, which limits the maximum size a network can -reach. If the existing networks have been able keep growing at an incredible -pace, we must thank hardware manufacturers for giving us ever more powerful -systems. ------------------------------------------------------------------------------ - -3. Beginner's guide on using IRC - -The standard IRC client is the original ircII client. It's part of most Linux -distributions. ------------------------------------------------------------------------------ - -3.1. Running the ircII program - -It's easy to use ircII. Let's say you want to connect to irc.freenode.net as -mini-HOWTO. - -At the command line, type: - -$ irc mini-HOWTO irc.freenode.net - -You can also export variables, so you won't need to use them at the command -line: - -$ export IRCNICK=mini-HOWTO IRCSERVER=irc.freenode.net - -Add them to your shell profile (e.g. ~/.bash_profile or ~/.zprofile) when -you're done. - -Other common variables are IRCNAME and IRCUSER, to respectively set the -ircname part of a /whois and username as seen at the first line 'mini-HOWTO -is ~username@hostname (ircname)'. Keep in mind that IRCUSER won't work if you -run an ident daemon (default on most distributions). If you still need to -change your username (not recommended, and I hope you're not using IRC logged -as root !), install oidentd from [http://ojnk.sourceforge.net/] http:// -ojnk.sourceforge.net/. To configure, read the oidentd.conf man page. Finally -run '/usr/local/sbin/oidentd -g nobody -u nobody'. Add this to your startup -scripts (e.g. /etc/rc.d/rc.local) when you're done. - -If not set, IRCNICK, IRCUSER, and IRCNAME will be retrieved from /etc/passwd -. ------------------------------------------------------------------------------ - -3.2. Commands - -Use /help to get a list on all available commands (/help help is a good -start). Replace nick by any IRCNICK. - -  * First, /set NOVICE off - -  * /nick IRC-mini-HOWTO changes your IRCNICK to IRC-mini-HOWTO - -  * /set realname The Linux IRC mini-HOWTO changes your IRCNAME to The Linux - IRC mini-HOWTO (doesn't change on the same connection) - -  * /j #mini-HOWTO joins channel #mini-HOWTO - -  * /j #unmaintained-HOWTO joins channel #unmaintained-HOWTO - -  * /j #mini-HOWTO changes the active current channel to #mini-HOWTO - -  * /msg nick Hi. sends a private message to nick containing Hi. - -  * /notice nick (or #mini-HOWTO) Hi. sends a notice to nick (or #mini-HOWTO) - containing Hi. - -  * /query nick starts a private conversation with nick. /query ends the - private conversation - -  * /me loves Linux. sends an action to the current channel or query - containing IRC-mini-HOWTO loves Linux. - -  * /dcc chat nick starts a chat with nick. Use /msg =nick (notice the =) to - send messages over the chat - -  * /dcc send nick /etc/HOSTNAME sends the given file to nick - -  * /dcc get nick receives the file offered by nick - -  * /part leaves the active current channel - -  * /part #unmaintained-HOWTO leaves channel #unmaintained-HOWTO - -  * /discon disconnects from current IRCSERVER - -  * /server irc.us.freenet.net connects to IRCSERVER irc.us.freenet.net - -  * /quit Bye. quits your IRC session with a reason Bye. - - ------------------------------------------------------------------------------ -3.3. IRC Etiquette - -WARNING WARNING WARNING WARNING WARNING - -  * Never use IRC logged as root or any user with excessive privileges. Bad - things may happen sooner or later. You were warned. It's safe if you - create 2 users, one of them to only use IRC. - - -$ man adduser - -On Linux channels you shouldn't: - -  * Act as an idiot. If you want to be respected, then first respect each - other. - -  * Use colors (^C). Most Linux users don't tolerate such mIRC crazes, and - ircII doesn't really support them. The same should apply for ANSI. - -  * Use full CAPS, bold (^B), reverse (^V), underline (^_), blink (^F), and - bell (^G). The first 4 are here to emphasize words, not the whole text. - The last 2 are just very annoying. - -  * Ask if you can ask a question. Just ask, but first read all documentation - available on the subject. Start looking at [file:/usr/doc/] /usr/doc/ , - otherwise go to [http://www.tldp.org/] http://www.tldp.org/ or [http:// - www.ibiblio.org/pub/Linux/docs/] http://www.ibiblio.org/pub/Linux/docs/. - And don't repeat your question immediately. Wait at least 10 minutes. If - you don't get any answer it's because nobody knows or wants to help. - Respect their choice, they're not your personal assistant. Also never - send mass private messages. It's like SPAM. - - ------------------------------------------------------------------------------ -4. Console IRC Clients - -4.1. ircII - -Maintainer: ircII project () - -IRC Channel: #ircII (official channel ?) on [http://www.efnet.org/ -servers.html] EFNet - -Originally written by Michael Sandrof, ircII comes with most Linux -distributions. It uses termcap and shouldn't be a choice for most users, but -is a standard. Mathusalem and other gurus will use it. Less ventured will -regret to have it installed. - -You can get the latest version of ircII from [ftp://ircftp.au.eterna.com.au/ -pub/ircII/] ftp://ircftp.au.eterna.com.au/pub/ircII/. Homepage at [http:// -www.eterna.com.au/ircii/] http://www.eterna.com.au/ircii/. ------------------------------------------------------------------------------ - -4.2. EPIC - -Maintainer: EPIC Software Labs () - -IRC Channel: #EPIC on EFNet - -Based on ircII, EPIC (Enhanced Programmable ircII Client) is meant for real -scripters and users searching freedom. When you start it for the first time -you'll notice that you should really learn the basics of scripting. - -You can get the latest version of EPIC from [ftp://ftp.epicsol.org/pub/epic/] -ftp://ftp.epicsol.org/pub/epic/. Homepage at [http://www.epicsol.org/] http:/ -/www.epicsol.org/. ------------------------------------------------------------------------------ - -4.3. BitchX - -Maintainer: Colten Edwards () - -IRC Channel: #BitchX on EFNet - -Based on ircII and EPIC, BitchX could be compared to the Pine MUA. Bloatware -(doesn't mean you shouldn't use it) and widely used. The choice for users -that want a client with built-in facilities. It can be compiled with the -GNOME libraries by using the configure option --with-gtk. Don't be surprised -if all you get is a XTerm-BitchX instead. - -You can get the latest version of BitchX from [ftp://ftp.bitchx.org/pub/ -BitchX/source/] ftp://ftp.bitchx.org/pub/BitchX/source/. Homepage at [http:// -www.bitchx.com/] http://www.bitchx.com/. Homepage of gtkBitchX at [http:// -www.bitchx.org/gtk/] http://www.bitchx.org/gtk/. ------------------------------------------------------------------------------ - -4.4. irssi - -Maintainer: Timo Sirainen () - -IRC Channel: #irssi on [http://freenode.net/irc_servers.shtml] freenode and -[http://www.ircnet.org/] IRCnet - -Timo released yagIRC ~3 years ago. It was a GUI client using the GTK+ -toolkit. The army called on him, and the new maintainers wouldn't do the job. -yagIRC passed away and he started irssi as a replacement. It used GTK+. GNOME -and curses versions would appear later. As of 0.7.90 it's only a modular text -mode client. Supports Perl scripting. - -You can get the latest version of irssi from [http://irssi.org/?page= -download] http://irssi.org/?page=download. Homepage at [http://irssi.org/] -http://irssi.org/. ------------------------------------------------------------------------------ - -4.5. Other Console IRC Clients - -There are a few others ircII based clients. - -Blackened [http://www.blackened.com/blackened/] http://www.blackened.com/ -blackened/. - -Ninja [http://ninja.qoop.org/] http://ninja.qoop.org/. - -ScrollZ [http://www.scrollz.com/] http://www.scrollz.com/. ------------------------------------------------------------------------------ - -5. X Window IRC Clients - -5.1. Zircon - -Maintainer: Lindsay F. Marshall () - -IRC Channel: None ? - -Written in Tcl/Tk, uses the native network communications of Tcl. - -You can get the latest version of Zircon from [ftp://catless.ncl.ac.uk/pub/] -ftp://catless.ncl.ac.uk/pub/. Homepage at [http://catless.ncl.ac.uk/Programs/ -Zircon/] http://catless.ncl.ac.uk/Programs/Zircon/. ------------------------------------------------------------------------------ - -5.2. KVIrc - -Maintainer: Szymon Stefanek () - -IRC Channel: #KVIrc on freenode - -Also written with the Qt toolkit, KVIrc is a beast. Supports DCC Voice, -built-in scripting language, and plugins. - -You can get the latest version of KVIrc from [http://www.kvirc.net/?id= -download] http://www.kvirc.net/?id=download. Homepage at [http:// -www.kvirc.net/] http://www.kvirc.net/. ------------------------------------------------------------------------------ - -5.3. X-Chat - -Maintainer: Peter Zelezny () - -IRC Channel: #Linux on [http://www.chatjunkies.org/servers.html] ChatJunkies - -Using GTK+ and optionally GNOME, supports Perl and Python scripting. - -You can get the latest version of X-Chat from [http://xchat.org/ -download.html] http://xchat.org/download.html. Homepage at [http://xchat.org -/] http://xchat.org/. ------------------------------------------------------------------------------ - -5.4. QuIRC - -Maintainer: Patrick Earl () - -IRC Channel: #QuIRC on [http://www.dal.net/servers/] DALnet - -Using Tk, supports Tcl for scripting. - -You can get the latest version of QuIRC from his Homepage at [http:// -quirc.org/] http://quirc.org/. ------------------------------------------------------------------------------ - -6. IRC Servers - -6.1. IRCD - -Maintainer: ircd developers() - -IRC Channel: #ircd on IRCnet - -The original IRC daemon, mainly used by IRCnet. - -You can get the latest version of IRCD from [ftp://ftp.irc.org/irc/server/] -ftp://ftp.irc.org/irc/server/. Homepage at [http://www.irc.org/] http:// -www.irc.org/. ------------------------------------------------------------------------------ - -6.2. IRCD-Hybrid - -Maintainer: () - -IRC Channel: None ? - -Mainly used by EFNet. - -You can get the latest version of IRCD-Hybrid from [ftp://ftp.blackened.com/ -pub/irc/hybrid/] ftp://ftp.blackened.com/pub/irc/hybrid/. Homepage at [http:/ -/www.ircd-hybrid.org/] http://www.ircd-hybrid.org/. ------------------------------------------------------------------------------ - -6.3. ircu - -Maintainer: Undernet Coder Committee () - -IRC Channel: #ircu on [http://www.undernet.org/servers.php] Undernet - -Mainly used by Undernet. - -You can get the latest version of ircu from [http://ftp1.sourceforge.net/ -undernet-ircu/] http://ftp1.sourceforge.net/undernet-ircu/. Homepage at -[http://coder-com.undernet.org/] http://coder-com.undernet.org/. ------------------------------------------------------------------------------ - -6.4. Bahamut - -Maintainer: DALnet Coding Team () - -IRC Channel: #Bahamut on DALnet - -Based on DreamForge and Hybrid, Bahamut is the DALnet server. - -You can get the latest version of Bahamut from [http://bahamut.dal.net/ -download/] http://bahamut.dal.net/download/. Homepage at [http:// -bahamut.dal.net/] http://bahamut.dal.net/. ------------------------------------------------------------------------------ - -7. IRC Bots - -7.1. eggdrop - -Maintainer: () - -IRC Channel: #eggdrop on Undernet - -eggdrop is the most known Tcl enabled application on the Net. It's a channel -robot for IRC that can be tailored to any situation. - -You can get the latest version of eggdrop from [ftp://ftp.eggheads.org/pub/ -eggdrop/source/] ftp://ftp.eggheads.org/pub/eggdrop/source/. Homepage at -[http://www.eggheads.org/] http://www.eggheads.org/. ------------------------------------------------------------------------------ - -8. IRC Bouncers (IRC Proxy) - -8.1. bnc - -Maintainer: None ? - -IRC Channel: None ? - -bnc is the original bouncer. - -You can get the latest version of bnc from [http://gotbnc.com/cgi-bin/ -download.cgi] http://gotbnc.com/cgi-bin/download.cgi. Homepage at [http:// -gotbnc.com/] http://gotbnc.com/. ------------------------------------------------------------------------------ - -8.2. muh - -Maintainer: Sebastian Kienzl () - -IRC Channel: None ? - -muh is a smart and versatile irc-bouncing tool that will also go on IRC as -soon as it's launched, guarding or attempting to get your nick. - -You can get the latest version of muh from [http://ftp1.sourceforge.net/muh/] -http://ftp1.sourceforge.net/muh/. Homepage at [http://mind.riot.org/muh/] -http://mind.riot.org/muh/. ------------------------------------------------------------------------------ - -8.3. ezbounce - -Maintainer: Murat Deligönül () - -IRC Channel: None ? - -ezbounce's basic features include password protection, remote administration, -logging and listening on multiple ports. - -You can get the latest version of ezbounce from his Homepage at [http:// -druglord.freelsd.org/ezbounce/] http://druglord.freelsd.org/ezbounce/. ------------------------------------------------------------------------------ - -9. Installation - -9.1. Clients - -All popular clients use GNU autoconf and GNU automake, thus come with a -configure script. Read the installation instructions after you unpack the -sources. Be sure you have the required libraries in order to compile. Doing -cd sources; mkdir objdir; cd objdir; ../configure --help; ../configure -your_options_here; make; make install (or make install_strip) > ~/ -sources_install.log is the right procedure. Also note that for ircII, EPIC, -and BitchX you should really edit include/config.h to suit your needs. ------------------------------------------------------------------------------ - -9.2. Servers - -Do you really need help to set up a server ? - -~$ vim ircd.conf ------------------------------------------------------------------------------ - -10. But what's already included in my distribution ? (Linux on x86) - -10.1. Debian - -IRC Channel: #Debian on freenode (irc.debian.org -> irc.freenode.net) - -[http://www.debian.org/] Debian includes too many IRC tools to list. You can -find them at the following places: - -  * Debian [http://ftp.debian.org/debian/dists/stable/main/binary-i386/] - stable. - -  * Debian [http://ftp.debian.org/debian/dists/unstable/main/binary-i386/] - unstable (didn't receive enough testing). - -  * Also be sure to check the [http://ftp.debian.org/debian/dists/ - proposed-updates/] proposed updates. It may contain IRC clients as well. - - ------------------------------------------------------------------------------ -10.2. Red Hat - -IRC Channel: #RedHat on freenode (irc.redhat.com -> irc.freenode.net) - -[http://www.redhat.com/] Red Hat 8.0 includes the following clients: - -  * [ftp://ftp.redhat.com/pub/redhat/linux/8.0/en/os/i386/RedHat/RPMS/ - epic-1.0.1-8.i386.rpm] EPIC 1.0.1. - -  * [ftp://ftp.redhat.com/pub/redhat/linux/8.0/en/os/i386/RedHat/RPMS/ - ksirc-3.0.3-3.i386.rpm] KSirc from KDE Network 3.0.3. - -  * [ftp://ftp.redhat.com/pub/redhat/linux/8.0/en/os/i386/RedHat/RPMS/ - xchat-1.8.10-8.i386.rpm] X-Chat 1.8.10. - - -  * Red Hat Raw Hide (current development) - -  * [ftp://rawhide.redhat.com/pub/redhat/linux/rawhide/] ftp:// - rawhide.redhat.com/pub/redhat/linux/rawhide/. Use at your own risk. - - ------------------------------------------------------------------------------ -10.3. Slackware - -IRC Channel: #Slackware on freenode - -[http://www.slackware.com/] Slackware 8.1 includes the following clients: - -  * [ftp://ftp.slackware.com/pub/slackware/slackware-8.1/slackware/n/ - bitchx-1.0c19-i386-1.tgz] BitchX 1.0c19 - -  * [ftp://ftp.slackware.com/pub/slackware/slackware-8.1/slackware/n/ - epic4-1.0.1-i386-2.tgz] EPIC4 1.0.1 - -  * KSirc from [ftp://ftp.slackware.com/pub/slackware/slackware-8.1/slackware - /kde/kdenetwork-3.0.1-i386-2.tgz] KDE Network 3.0.1. - -  * [ftp://ftp.slackware.com/pub/slackware/slackware-8.1/slackware/gnome/ - xchat-1.8.9-i386-1.tgz] X-Chat 1.8.9. - - -  * Slackware -current (current development) - -  * [ftp://ftp.slackware.com/pub/slackware/slackware-current/] ftp:// - ftp.slackware.com/pub/slackware/slackware-current/. Use at your own risk. - - ------------------------------------------------------------------------------ -11. Hell and Paradise - -11.1. Gods (developers) - -  * Thanks to all authors. Without their hard and volunteer work I'd never - write it, and we'd never get our hands on Linux nor IRC. - - ------------------------------------------------------------------------------ -11.2. Saints (contributors) - -  * See [http://www.pervalidus.net/documentation/IRC-mini-HOWTO/] http:// - www.pervalidus.net/documentation/IRC-mini-HOWTO/. - - ------------------------------------------------------------------------------ -11.3. Angels (feedback) - -  * See above. - - ------------------------------------------------------------------------------ -11.4. Devils - -  * Khaled Mardam-Bey must be stopped :-) - -  * 'If idiots could fly, IRC would be an airport'. I don't know who wrote - that, but it makes sense. For those of you using IRC to annoy people I - ordered a /kill. - - ------------------------------------------------------------------------------ -12. Revision History - -  * 20021121 - v0.3, fourth draft - - diff --git a/LDP/guide/docbook/Linux-Networking/PSTN.xml b/LDP/guide/docbook/Linux-Networking/PSTN.xml deleted file mode 100644 index ba743583..00000000 --- a/LDP/guide/docbook/Linux-Networking/PSTN.xml +++ /dev/null @@ -1,26 +0,0 @@ - - -PSTN - - -PSTN (Public Service Telephone Network) is the telephone system that is used -thoughout the U.S. and many other countries. Although never intended for -networking, telephone lines can be used for communications for computers. - - - -A modem (modulator/demodulator) is used to interface between a computer and -the telephone system. Modems can convert data into audible tones and back. -The fastest two-way modems currently available support a speed of 33.6 Kbps -(kilobits per second). - - - -Current modems advertise speeds up to 56 Kbps per second. These modems rely -on digital equipment being used in the phone company's central office and in -the facility (such as the Internet Service Provider) you are dialling into. -The 56 Kbps speed also works in only one direction; the other direction supports -33.6 Kbps. - - - diff --git a/LDP/guide/docbook/Linux-Networking/Quota.xml b/LDP/guide/docbook/Linux-Networking/Quota.xml deleted file mode 100644 index 70363141..00000000 --- a/LDP/guide/docbook/Linux-Networking/Quota.xml +++ /dev/null @@ -1,451 +0,0 @@ - - -Quota - - -This section describes how to enable file system quota on a Linux -host, assigning quota for users and groups, as well as the usage of -miscellaneous quota commands. It is intended for users running kernel -2.x (recently tested on kernel 2.4.21). - - -1. What is quota? - -1.1. What is quota for? - - -Quota allows you to specify limits on two aspects of disk storage: the -number of inodes a user or a group of users may possess; and the -number of disk blocks that may be allocated to a user or a group of -users. - - - -The idea behind quota is that users are forced to stay under their -disk consumption limit, taking away their ability to consume unlimited -disk space on a system. Quota is handled on a per user, per file -system basis. If there is more than one file system which a user is -expected to create files, then quota must be set for each file system -separately. Various tools are available for you to administer and -automate quota policies on your system. - - -1.2. Current Status of Quota on Linux - - -Currently, there are some major changes in the way quota works. There -are two different setups. The tools works the same, but there's a -difference in used files. This document describes the setup and -operation of the _new_ quota setup. As the new setup of quota is not -in the regular kernel source, this setup needs some patching. We will -describe this patching and installation of the linuxquota package. If -you already have the quota software installed on your system, you may -or may not have to install this patch and package. You can email me if -you have any questions about this. I'll try to include a overview of -Linux distro's and it's implications in a later version of this -document. - - -2. Requirements for quota - -2.1. Kernel - - -The 2.x kernel source is available from http://www.kernel.org - Please use an available mirror close to your -location to save bandwidth. If you have a recent version of tar, you -can download the .bz2 compressed file. - - - -Untar the kernel: - - - - -______________________________________________________________________ -cd /usr/src -tar jxvf /path/to/linux-2.4.21-tar.bz2 - for bzip2 kernel - -tar zxvf /path/to/linux-2.4.21-tar.gz - for gzip kernel - -ln -s /usr/src/linux-2.4.21 /usr/src/linux -______________________________________________________________________ - - - -2.2. Quota software - - -Depending on the Linux distribution you have, you may, or may not have -the quota softwares installed on your system. The most recent version -of quota is available through SourceForge and is in active -development. You can reach the homepage of the quota-development at -http://www.sourceforge.net/projects/linuxquota -. - - -3. Quota setup: installation and configuration - -3.1. Patch the kernel - - -Download the patch for your kernel at: - - - -ftp::/atrey.karlin.mff.cuni.cz/pub/local/jack/quota/ -. - - - -Choose your kernel version and download the patch(es). Patch your -kernel with the 'patch' command. If there is more than 1 patch for -your kernel version, be sure to apply the patches in the correct -order. - - - -You can use this script ( I assume the downloaded patches are in -/tmp/quota/ and the kernel has been untarred to /usr/src/linux) : - - - - -______________________________________________________________________ -#!/bin/sh - -gunzip /tmp/quota/*.gz -cd /usr/src/linux -COUNT=`ls -1 /tmp/quota/*.diff | wc -l` -for I in `seq 1 $COUNT` -do - patch -p1 < /tmp/quota/quota-2.4.21-$I-*.diff -done -______________________________________________________________________ - - - -3.2. Reconfigure your kernel - - -Reconfigure your kernel and add quota support. - - - -Via `make menuconfig` or `make xconfig` you can find the option to -support quota under the Filesystems-menu. You can specify extra -options if you need them, like 32-bit UID support. - - - -Save the configuration and compile the kernel. Make sure the new -kernel will be used when rebooting the system. - - -3.3. Compile and install the quota softwares - - -To be able to use all the features of the new quota system, you'll -probably need to download the new quota-package. Download the new -quota software via the URL provided above. - - - -When downloaded do: - - - - -______________________________________________________________________ -$ gzip -dc | tar xvf -$ cd quota-tools (or whatever directory the software is put in) -$ ./configure -$ make -$ su -# make install -______________________________________________________________________ - - - - -3.4. time Modify your system init script to check quota and turn -quota on at boot - - - -Here's an example: - - - - -______________________________________________________________________ -# Check quota and then turn quota on. -if [ -x /usr/sbin/quotacheck ] - then - echo "Checking quotas. This may take some time." - /usr/sbin/quotacheck -avug - echo " Done." - fi - if [ -x /usr/sbin/quotaon ] - then - echo "Turning on quota." - /usr/sbin/quotaon -avug - fi -______________________________________________________________________ - - - - -The golden rule is that always turn quota on after your file systems -in /etc/fstab have been mounted, otherwise quota will fail to work. I -recommend turning quota on right after the part where file systems are -mounted in your system init script. - - -3.5. Modify /etc/fstab - - -Partitions that you have not yet enabled quota normally look something -like: - - - - - -______________________________________________________________________ -/dev/hda1 / ext2 defaults 1 1 -/dev/hda2 /usr ext2 defaults 1 1 -______________________________________________________________________ - - - - -To enable user quota support on a file system, add "usrquota" to the -fourth field containing the word "defaults" (man fstab for details). - - - - -______________________________________________________________________ -/dev/hda1 / ext2 defaults 1 1 -/dev/hda2 /usr ext2 defaults,usrquota 1 1 -______________________________________________________________________ - - - - -Replace "usrquota" with "grpquota", should you need group quota -support on a file system. - - - - -______________________________________________________________________ -/dev/hda1 / ext2 defaults 1 1 -/dev/hda2 /usr ext2 defaults,grpquota 1 1 -______________________________________________________________________ - - - - -Need both user quota and group quota support on a file system? - - - - -______________________________________________________________________ -/dev/hda1 / ext2 defaults 1 1 -/dev/hda2 /usr ext2 defaults,usrquota,grpquota 1 1 -______________________________________________________________________ - - - -3.6. Activate the quota system - - -To activate the quota software you have to reboot the system for the -changes you have made to take effect. The new kernel with quota -support will be loaded and the startup scripts you've just created -will be executed. At first run, quotacheck will generate the -appropiate files to maintain the quota databases. - - -3.7. Add quotacheck to crontab - - -Although quota should work with periodical checks, it sometimes helps -to run quotacheck periodically, e.g. weekly. Add the following line to -your root's crontab: - - - - -______________________________________________________________________ -0 3 * * 0 /sbin/quotacheck -avug -______________________________________________________________________ - - - -4. Quota setup: tools - - -This operation is performed with the edquota command (`man edquota` -for details). - - -4.1. Assigning quota for a particular user - - -Here's an example. I have a user with the login id bob on my system. -The command "edquota -u bob" takes me into vi (or editor specified in -my $EDITOR environment variable) to edit quota for user bob on each -partition that has quota enabled: - - - - -______________________________________________________________________ -Quotas for user bob: -/dev/hda3: blocks in use: 2594, limits (soft = 5000, hard = 6500) - inodes in use: 356, limits (soft = 1000, hard = 1500) -______________________________________________________________________ - - - - -"blocks in use" is the total number of blocks (in kilobytes) a user -has consumed on a partition. - - - -"inodes in use" is the total number of inodes a user has consumed on a -partition. - - -4.2. Assigning quota for a particular group - - -Now I have a group games on my system. "edquota -g games" takes me -into the vi editor again to edit quota for the group games: - - - - -______________________________________________________________________ - Quotas for group games: - /dev/hda4: blocks in use: 5799, limits (soft = 8000, hard = 10000) - inodes in use: 1454, limits (soft = 3000, hard = 4000) -______________________________________________________________________ - - - -4.3. Assigning quota for a bunch of users with the same value - - -To rapidly set quotas for, say 100 users, on my system to the same -value as my user bob, I would first edit bob's quota information by -hand, then execute: - - - - -______________________________________________________________________ -edquota -p bob `awk -F: '$3 > 499 {print $1}' /etc/passwd` -______________________________________________________________________ - - - - -assuming that you are using csh, and that you assign your user UID's -starting with 500. - - - -In addition to edquota, there are 3 terms which you should familiarize -yourself with: Soft Limit, Hard Limit, and Grace Period. - - -4.4. Soft Limit - - -_Soft limit_ indicates the maximum amount of disk usage a quota user -has on a partition. When combined with grace period, it acts as the -border line, which a quota user is issued warnings about his impending -quota violation when passed. - - -4.5. Hard Limit - - -Hard limit works only when grace period is set. It specifies the -absolute limit on the disk usage, which a quota user can't go beyond -his hard limit. - - -4.6. Grace Period - - -Executed with the command "edquota -t", grace period is a time limit -before the soft limit is enforced for a file system with quota -enabled. Time units of sec(onds), min(utes), hour(s), day(s), week(s), -and month(s) can be used. This is what you'll see with the command -"edquota -t": - - - - -______________________________________________________________________ -Time units may be: days, hours, minutes, or seconds -Grace period before enforcing soft limits for users: -/dev/hda2: block grace period: 0 days, file grace period: 0 days -______________________________________________________________________ - - - - -Change the 0 days part to any length of time you feel reasonable. I -personally would choose 7 days (or 1 week). - - -5. Miscellaneous Quota Commands - -5.1. Quotacheck - - -Quotacheck is used to scan a file system for disk usages, and updates -the quota record file "aquota.user" to the most recent state. I -recommend running quotacheck at system bootup, and via cronjob -periodically (say, every week?). - - -5.2. Repquota - - -Repquota produces a summarized quota information for a file system. -Here is a sample output repquota gives: - - - - -______________________________________________________________________ -# repquota -a - Block limits File limits - User used soft hard grace used soft hard grace - root -- 175419 0 0 14679 0 0 - bin -- 18000 0 0 735 0 0 - uucp -- 729 0 0 23 0 0 - man -- 57 0 0 10 0 0 - user1 -- 13046 15360 19200 806 1500 2250 - user2 -- 2838 5120 6400 377 1000 1500 -______________________________________________________________________ - - - -5.3. Quotaon and Quotaoff - - -Quotaon is used to turn on quota accounting; quotaoff to turn it off. -Actually both files are similar. They are executed at system startup -and shutdown. - - - diff --git a/LDP/guide/docbook/Linux-Networking/RAID.xml b/LDP/guide/docbook/Linux-Networking/RAID.xml deleted file mode 100644 index 930b1f99..00000000 --- a/LDP/guide/docbook/Linux-Networking/RAID.xml +++ /dev/null @@ -1,8329 +0,0 @@ - - -RAID - - -10.2. RAID - -RAID, short for Redundant Array of Inexpensive Disks, is a method -whereby information is spread across several disks, using techniques -such as disk striping (RAID Level 0) and disk mirroring (RAID level 1) -to achieve redundancy, lower latency and/or higher bandwidth for -reading and/or writing, and recoverability from hard-disk crashes. -Over six different types of RAID configurations have been defined. -There are three types of RAID solution options available to Linux -users: software RAID, outboard DASD boxes, and RAID disk controllers. - -· Software RAID: Pure software RAID implements the various RAID - levels in the kernel disk (block device) code. -· Outboard DASD Solutions: DASD (Direct Access Storage Device) are - separate boxes that come with their own power supply, provide a - cabinet/chassis for holding the hard drives, and appear to Linux as - just another SCSI device. In many ways, these offer the most robust - RAID solution. -· RAID Disk Controllers: Disk Controllers are adapter cards that plug - into the ISA/EISA/PCI bus. Just like regular disk controller cards, - a cable attaches them to the disk drives. Unlike regular disk - controllers, the RAID controllers will implement RAID on the card - itself, performing all necessary operations to provide various RAID - levels. - -Related HOWTOs: - -· http://metalab.unc.edu/mdw/HOWTO/mini/DPT-Hardware-RAID.html -· http://metalab.unc.edu/mdw/HOWTO/Root-RAID-HOWTO.html -· http://metalab.unc.edu/mdw/HOWTO/Software-RAID-HOWTO.html - -RAID at linas.org: - -· http://linas.org/linux/raid.html - - - - RAID, short for Redundant Array of Inexpensive Disks, is a method - whereby information is spread across several disks, using techniques - such as disk striping (RAID Level 0) and disk mirroring (RAID level 1) - to achieve redundancy, lower latency and/or higher bandwidth for - reading and/or writing, and recoverability from hard-disk crashes. - Over six different types of RAID configurations have been defined. - There are three types of RAID solution options available to Linux - users: software RAID, outboard DASD boxes, and RAID disk controllers. - - · Software RAID: Pure software RAID implements the various RAID - levels in the kernel disk (block device) code. - - · Outboard DASD Solutions: DASD (Direct Access Storage Device) are - separate boxes that come with their own power supply, provide a - cabinet/chassis for holding the hard drives, and appear to Linux as - just another SCSI device. In many ways, these offer the most robust - RAID solution. - - · RAID Disk Controllers: Disk Controllers are adapter cards that plug - into the ISA/EISA/PCI bus. Just like regular disk controller cards, - a cable attaches them to the disk drives. Unlike regular disk - controllers, the RAID controllers will implement RAID on the card - itself, performing all necessary operations to provide various RAID - levels. - - Related HOWTOs: - - · http://metalab.unc.edu/mdw/HOWTO/mini/DPT-Hardware-RAID.html - - · http://metalab.unc.edu/mdw/HOWTO/Root-RAID-HOWTO.html - - · http://metalab.unc.edu/mdw/HOWTO/Software-RAID-HOWTO.html - - RAID at linas.org: - - · http://linas.org/linux/raid.html - - Root RAID HOWTO cookbook - Michael A. Robinton, michael@bzs.org - - v1.13, July 17, 2000 - - This document only applys to the OLD raidtools, versions 0.50 and - under. The workarounds and solutions addressed in this write up have - largely been made obsolete by the vast improvment in the 0.90 raid­ - tools and accompanying kernel patch to the 2.0.37, 2.2x and 2.3x - series kernels. You may find the detailed descriptions useful, partic­ - ularly if you plan to run root raid or use initrd. Check these links - for a reference to set up of ``Boot Root Raid using conventional - LILO'' and accompanying initrd working scripts. What follows is the - description of the now OBSOLETE Root RAID HOWTO. This document was - originally written to provide a cookbook for creating a root mounted - raid filesystem and companion fallback rescue system using linux ini­ - trd. There are complete step-by-step instruction for both raid1 and - raid5 md0 devices. Each step is accompanied by an explanation of it's - purpose. Included with this revision is a generic linuxrc initrd file - which may be configured with a single three line ``/etc/raid­ - boot.conf'' file for raid1 and raid5 configurations. - ______________________________________________________________________ - - Table of Contents - - - - 1. Introduction - - 1.1 Where to get Up-to-date copies of this document. - 1.2 More up-to-date Boot Root Raid with LILO howto - 1.3 Bugs - 1.4 Acknowledgements - 1.5 Copyright Notice - - 2. What you need BEFORE YOU START - - 2.1 Required Packages - 2.2 Other similar implementations. - 2.3 Documentation -- Recommended Reading - 2.4 RAID resources - - 3. Quick Start for ROOT RAID - - 4. (IT - - 4.1 Security Reminder - 4.2 Build the Kernel and Raid Tools - 4.3 Build the - 4.4 Start the STEP by STEP instructions - 4.5 Install the distribution - Slackware Specific - 4.6 Install linux - 4.7 Install Raid Tools - 4.8 Remove un-needed directories and files from new filesystem. - 4.9 Create /dev/md - 4.10 Create a bare filesystem suitable for - 4.10.1 Create the BOOT/RESCUE - 4.10.2 Corrections for the Rescue System - 4.11 Making 'initrd' boot the RAID device - linuxrc - 4.12 Modifying the rc-scripts for SHUTDOWN - 4.13 Configuring RAIDBOOT - raidboot.conf - 4.14 Kernel 'loadlin and lilo' variables for RESCUE and RAID - - 5. Configuring the Production RAID system. - - 5.1 System specs. Two systems with identical motherboards were configured. - 5.2 Partitioning the hard drives. - - 6. Building the RAID file system. - - 6.1 /etc/raid5.conf - 6.2 /etc/raid1.conf - 6.3 Step by Step procedures for building production RAID file system. - - 7. One last thought. - - 8. Appendix A. - Bohumil Chalupa's md0 shutdown - - 9. Appendix B. - Sample SHUTDOWN scripts - - 9.1 Slackware - /etc/rc.d/rc.6 - 9.2 Debian bo - /etc/init.d/halt and /etc/init.d/reboot - 9.2.1 /etc/init.d/halt - 9.2.2 /etc/init.d/reboot - - 10. Appendix C. - other setup files - - 10.1 linuxrc - 10.2 loadlin -- linux.bat file - boot.par - 10.3 linuxthreads Makefile.diff - 10.4 raid1.conf - 10.5 raid5.conf - 10.6 raidboot.conf - 10.7 rc.raidown - - 11. Appendix D. - obsolete linuxrc and shutdown scripts - - 11.1 Obsolete working - linuxrc - 11.2 Obsolete working - shutdown scripts - - 12. Appendix E. - Gadi's raid stop patch for the linux kernel - - 13. Appendix F. - rc.raidown - - 14. Appendix G. - linuxrc theory of operation - - 15. Appendix H. Setting up ROOT RAID on RedHat - - - - ______________________________________________________________________ - - 1. Introduction - - The reader is assumed to be familiar with the various types of raid - implementations, their advantages and drawbacks. This is not a - tutorial, just a set of instructions on how to implement root mounted - raid on a linux system. All of the information necessary to become - familiar with linux raid is listed here directly or by reference, - please read it before send e-mail questions. - - - 1.1. Where to get Up-to-date copies of this document. - - Click here to browse the author's latest version - of this - document. Corrections and suggestions welcome! - - Root-RAID-HOWTO -- OBSOLETE - - Available in LaTeX (for DVI and PostScript), plain text, and HTML. - - http://www.linuxdoc.org/HOWTO/Root-RAID-HOWTO.html - - - - Available in SGML and HTML. - - ftp.bizsystems.net/pub/raid/ - - - - 1.2. More up-to-date Boot Root Raid with LILO howto - - Available in LaTeX (for DVI and PostScript), plain text, and HTML. - - http://www.linuxdoc.org/HOWTO/Boot+Root+Raid+LILO.html - - - - Available in SGML and HTML. - - ftp.bizsystems.net/pub/raid/ - - - - 1.3. Bugs - - As of this writing, the problem of stopping a root mounted RAID device - has not yet been solved in a satisfactory way. A work-around proposed - by Ed Welbon and implemented by Bohumil Chalupa is incorporated into - this document which eliminates the need for a long ckraid at each boot - for raid1 and raid5 devices. Without the workaround, it is necessary - to ckraid the md device each time the system is re-booted. On a large - array this can cause a severe availability performance degradation. - On my 6 gig RAID1 device running on a Pentium 166 with 128 megs of - ram, it takes well over half an hour to ckraid :-( after each re-boot. - It takes over an hour on my 13 gig RAID5 array with a 20mb/sec scsi - adaptor. - - The workaround stores the status of the array at shutdown on the real - boot device and compares it to a reference status placed there when - the system is first built. If the status's match at reboot, the - superblock on the array is rebuilt on the next boot, otherwise the - operator is notified of the status error and the rescue system is left - running with all the raid tools available. - - Rebuilding the superblock causes the system to ignore that the array - was powered down without mdstop by marking all the drives as OK, as if - nothing happened. This only works if all the drives are OK at - shutdown. If the array was operating with a bad drive, the operator - must remove the bad drive prior to restarting the md device or the - data can be corrupted. - - None of this applies to raid0 which does not have to be mdstopped - before shutdown. - - Final proposed solutions to this problem include a finalrd similar to - initrd, and mdrootstop which writes the clean flags to the array - during shutdown when it is mounted read only. I am sure there are - others. - - In the mean time, the problem has been by-passed for now Please let me - know when this problem is solved more cleanly!!! - - - 1.4. Acknowledgements - - The writings and e-mail from the following individuals helped to make - this document possible. Many of the ideas were stolen from the - helpful work of others, I have just tried to put it all in COOKBOOK - form so that it is straightforward to use. My thanks to: - - · Linas Vepstas - for the RAID howto that explained most of this to me. - - · Gadi Oxman - for answering my dumb 'newbie' questions. - - · Ed Welbon - for the execellent initrd.md package that inspired me to write - this. - - · Bohumil Chalupa for - implementing the re-boot 'workaround' that allows root-mounted-raid - to work in a production environment. - - · Keith W. for his explaination of - setting up root raid with RedHat. - - - - · and many others who contributed to this work in one way or another. - - - 1.5. Copyright Notice - - This document is GNU copyleft by Michael Robinton michael@bzs.org - . - - Permission to use, copy, distribute this document for any purpose is - hereby granted, provided that the author's / editor's name and this - notice appear in all copies and/or supporting documents; and that an - unmodified version of this document is made freely available. This - document is distributed in the hope that it will be useful, but - WITHOUT ANY WARRANTY, either expressed or implied. While every effort - has been taken to ensure the accuracy of the information documented - herein, the author / editor / maintainer assumes NO RESPONSIBILITY for - any errors, or for any damages, direct or consequential, as a result - of the use of the information documented herein. - - - 2. What you need BEFORE YOU START - - The packages you need and the documentation that answers the most - common questions about setting up and running raid are listed below. - Please review them throughly. - - - 2.1. Required Packages - - You need to obtain the most recent versions of these packages. - - · a linux kernel that supports raid, initrd and /dev/loopx - - I used linux-2.0.33 from sunsite - - - · raid145-971022-2.0.31 - patch adds support - for raid1/4/5 - - · raidtools-pre3-0.42 - tools to create and maintain raid devices (documentation too). - - · ``Gadi's raid stop patch'' in Appendix E. - - · linuxthreads-0.71 - required - threads package. Use ftp, browser doesn't work - ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy - - · A Linux distribution, ready to install. - - I used Slackware-3.4 - - - Helpful but not required - - · raidboot-0.01.tar.gz pre-built - raid rescue/boot system. - - The detailed instructions in this document are based on the above - packages. If the packages have been updated or you use a different - linux distribution, you may have to modify the procedures you find - here. - - The patches, tool assortment, etc... may vary with 2.1 kernels. - Please check the most recent documentation at: - - - ftp.kernel.org/pub/linux/daemons/raid/ - - - - 2.2. Other similar implementations. - - I chose to include in the kernel all of the pieces necessary to run - from boot without loading any modules. My kernel image is a little - over 300k compressed. - - Take a look at Ed Welbon's initrd.md.tar.gz - for another way to make a bootable raid device. He uses loadable - modules. A look at his concise scripts will show you how it is done - if you need a very small kernel with modules. - - - http://www.realtime.net/~welbon/initrd.md.tar.gz - - - - - 2.3. Documentation -- Recommended Reading - - Please read: - - /usr/src/linux/Documentation/initrd.txt - - - - as well as the documentation and man pages that accompany the - raidtools set. In particular, read man mdadd as well as the - QuickStart.RAID document included in the raidtools package. - - - You may also wish to review: - - · BootPrompt-HOWTO - - · man lilo - - · man lilo.conf - - - 2.4. RAID resources - - - · www.linas.org/linux/Software-RAID/Software-RAID.html - - - · www.ssc.com/lg/issue17/raid.html - - - · linas.org/linux/raid.html - - · ftp.kernel.org/pub/linux/daemons/raid/ - - - · www.realtime.net/~welbon/initrd.md.tar.gz - - · luthien.nuclecu.unam.mx/~miguel/raid/ - - - Mailing lists can be joined at: - - · majordomo@nuclecu.unam.mx send a - message to subscribe raiddev - - send mail to: raiddev@nuclecu.unam.mx - - - · majordomo@vger.rutgers.edu send - a message to subscribe linux-raid - - send mail to: linux-raid@vger.rutgers.edu (this seems to be the most active list) - - - 3. Quick Start for ROOT RAID - - If you use RedHat, see the ``Howto set up RedHat'' section in Appendix - H. I have not tried this. If you use it successfully, please let me - know so I can update this document. - - If you don't want to try and build and debug the rescue system, you - can get a generic one created from Slackware-3.4 from: - - ftp.bizsystems.com/pub/raid/raidboot-0.01.tar.gz - - - - Perform the following steps: - - · Compile the raid enabled kernel with built in support for your disk - subsystem - - · Test that the raid array will configure and mount correctly - - · Build your OS on the raid system - - · Correct the entries in fstab to show /dev/md0 as the root device. - Make sure that the partition(s) you use for booting are included in - fstab. - - · Modify your shutdown halt and reboot script(s) (mine is - /etc/rc.d/rc.6) as shown in ``Modifying the rc-scripts for - SHUTDOWN'' - - · Copy the following from you development filesystem to the rescue - system AND the new raid system - - - cd /root/raidboot - mkdir mnt - gzip -d rescue.clean - losetup /dev/loop0 rescue.clean - mount /dev/loop0 mnt - - copy these files - - cp -p /etc/* mnt/etc - cp -p /etc/rc.d/* mnt/etc/rc.d - {or as appropriate for your system} - cp -a /lib/modules/* mnt/lib/modules - - - Some Linux distributions include a test for the ro/rw status of the - root file system. The rc startup files need to have this test removed - for the initrd rescue system. See the instructions in the section on - ``Correctons for Rescue System''. - - Correct the entries in fstab to show /dev/md0 as the root device. Make - sure that the partition(s) you use for booting is included in fstab. - - - Create /etc/raidboot.conf which describes the raid boot configuration. - This file may NOT contain comments in the first three lines, after - that it doesn't matter. - - raidboot.conf - - /dev/sda1 /dev/sda2 - raidboot - raid5.conf - # comments may only be placed 'after' the three - # configuration lines. - # - # This is '/etc/raidboot.conf' - # - # line one, the partition(s) containing the 'initrd' raid-rescue system - # It is not necessary to boot from these partitions, however, - # since the rescue system will not fit on floppy, it is necessary - # to know which partitions are to be used to load the rescue system - # - # line two, the path to the raidboot config information - # Where the shutdown status, etc... is located at boot time - # It does NOT include the mount point information, only 'path' - # /mntpoint/'path' - # - # line -3-, name of the raid configuration file - # Current raid configuration file i.e. raid1.conf, raid5.conf - - - A few more things to do and the raid systems is ready to boot. - - Create ``rc.raidown'', as described in Appendix F, and copy it to - /etc/rc.d on the rescue, development, and raid system. Unmount the - rescue system and zip it. - - umount mnt - losetup -d /dev/loop0 - mv rescue.clean rescue - gzip rescue - - - Copy the rescue file to the raidboot partitions. - - cp rescue.gz /mnt_point(1)/raidboot - cp rescue.gz /mnt_point(2)/raidboot - - - Activate the raid array. - - mdadd -ar - - - Save the good reference status to the raidboot partition - - cat /proc/mdstat | grep md0 > /mnt_point(1)/raidboot/raidgood.ref - cat /proc/mdstat | grep md0 > /mnt_point(1)/raidboot/raidgood.ref - - - Lastly, configure the boot program as outlined in ``Boot Time Configu­ - ration Parameters'' and reboot your system onto the raid array. - - - - 4. initrd Cookbook for root mounted RAID - - This is the procedure to make an 'initrd' ramdisk with rescue tools - for raid. - - Specifically, this document referrs to RAID1 and RAID5 - implementations. - - 4.1. Security Reminder - - The rescue file system may be used stand alone. Should your raid array - fail to mount, you are left with the rescue system mounted and - running. TAKE THE APPROPRIATE SECURITY PRECAUTIONS!!! - - - 4.2. Build the Kernel and Raid Tools - - The first thing that must be done is to patch and build your kernel - and become familiar with the raid tools. Make sure and include - ``Gadi's raid stop patch'' in Appendix E. Configure, mount and test - your raid device(s). The details of how to do this are included in the - raidtools package and briefly reviewed later in this document. - - - 4.3. Build the initrd Rescue and Boot filesystem - - I used the Slackware-3.4 distribution to build both the Rescue/Boot - filesystem and the filesystem for the production machine. Any linux - distribution should work fine. If you use a different distribution, - review the Slackware specific portion of this procedure and modify it - to suit your needs. - - - I use loadlin to boot the kernel image and ramdisk from a dos - partition simply because there are oddball devices in my system that - have dos configuration software. Lilo will work just as well and a - small linux partition can be used instead containing only the - raid/boot files and the lilo record. - - For the raid boot/rescue system, I chose to create a minimum ramdisk - system using the Slackware 'setup' script followed by installing the - 'linuxthreads' package and 'raidtools' over the clean Slackware - installation on my ramdisk. I used the identical procedure to build - the production system. So the rescue and production systems are very - similar. - - This installation process gives me a 'bare' system (save a copy of the - file) to which I overlay - - - /lib/modules/2.x.x...... - /etc .... with a modified fstab, mdtab, raidX.conf, raidboot.conf - /etc/rc.d - /dev/md* - - - - from my current system to customize it for the particular kernel and - machine that it is/will-be running on. - - This makes the boot/rescue system the same system that is running on - the root mounted raid device, just skinnyed down a bit, while allowing - the library, etc... revisions to always be current. - - - 4.4. Start the STEP by STEP instructions - - From the root home directory (/root): - - - cd /root - mkdir raidboot - cd raidboot - - - - Create a mountpoints to work on - - - mkdir mnt - mkdir mnt2 - - - - Make a file large enough to do the file system install. This will be a - lot larger than the final rescue file system. I chose 24 megs since - 16 megs is not large enough - - dd if=/dev/zero of=build bs=1024k count=24 - - - associate the file with a loop device and generate an ext2 file system - on the file - - - losetup /dev/loop0 build - mke2fs -v -m0 -L initrd /dev/loop0 - mount /dev/loop0 mnt - - - - 4.5. Install the distribution - Slackware Specific - - ``...skip Slackware Specific stuff'' and go to next section. - - Now that an empty filesystem is created and mounted, run "setup". - - - Specify /root/raidboot/mnt - - - - as the 'target'. The source is whatever you normally install from. - Select the packages you wish to install and proceed but DO NOT - configure. - - Choose 'EXPERT' prompting mode. - - I chose 'A', 'AP, and 'N' installing only the minimum to run the - system plus an editor I am familiar with (vi, jed, joe) that is - reasonably compact. - - - - lqqqqqqqq SELECTING PACKAGES FROM SERIES A (BASE LINUX SYSTEM) qqqqqqqqk - x lqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk x - x x [X] aaa_base Basic filesystem, shell, and utils - REQUIRED x x - x x [X] bash GNU bash-1.14.7 shell - REQUIRED x x - x x [X] devs Device files found in /dev - REQUIRED x x - x x [X] etc System config files & utilities - REQUIRED x x - x x [X] shadow Shadow password suite - REQUIRED x x - x x [ ] ide Linux 2.0.30 no SCSI (YOU NEED 1 KERNEL) x x - x x [ ] scsi Linux 2.0.30 with SCSI (YOU NEED 1 KERNEL) x x - x x [ ] modules Modular Linux device drivers x x - x x [ ] scsimods Loadable SCSI device drivers x x - x x [X] hdsetup Slackware setup scripts - REQUIRED x x - x x [ ] lilo Boots Linux (not UMSDOS), DOS, OS/2, etc. x x - x x [ ] bsdlpr BSD lpr - printer spooling system x x - x x [ ] loadlin Boots Linux (UMSDOS too!) from MS-DOS x x - x x [ ] pnp Plug'n'Play configuration tool x x - x x [ ] umsprogs Utilities needed to use the UMSDOS filesystem x x - x x [X] sysvinit System V-like INIT programs - REQUIRED x x - x x [X] bin GNU fileutils 3.12, elvis, etc. - REQUIRED x x - x x [X] ldso Dynamic linker/loader - REQUIRED x x - x x [ ] ibcs2 Runs SCO/SysVr4 binaries x x - x x [X] less A text pager utility - REQUIRED x x - x x [ ] pcmcia PCMCIA card services support x x - x x [ ] getty Getty_ps 2.0.7e - OPTIONAL x x - x x [X] gzip The GNU zip compression - REQUIRED x x - x x [X] ps Displays process info - REQUIRED x x - x x [X] aoutlibs a.out shared libs - RECOMMENDED x x - x x [X] elflibs The ELF shared C libraries - REQUIRED x x - x x [X] util Util-linux utilities - REQUIRED x x - x x [ ] minicom Serial transfer and modem comm package x x - x x [ ] cpio The GNU cpio backup/archiving utility x x - x x [X] e2fsbn Utilities for the ext2 file system x x - x x [X] find GNU findutils 4.1 x x - x x [X] grep GNU grep 2.0 x x - x x [ ] kbd Change keyboard mappings x x - x x [X] gpm Cut and paste text with your mouse x x - x x [X] sh_utils GNU sh-utils 1.16 - REQUIRED x x - x x [X] sysklogd Logs system and kernel messages x x - x x [X] tar GNU tar 1.12 - REQUIRED x x - x x [ ] tcsh Extended C shell version 6.07 x x - x x [X] txtutils GNU textutils-1.22 - REQUIRED x x - x x [ ] zoneinfo Configures your time zone x x - x mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj x - - - From the 'AP series, I use only 'JOE', and editor I like, and 'MC' a - small and useful file management tool. You choose the utilities you - will need on your system. - - - - lqqqqqqqqq SELECTING PACKAGES FROM SERIES AP (APPLICATIONS) qqqqqqqqqk - x x [ ] ispell The International version of ispell x x - x x [ ] jove Jonathan's Own Version of Emacs text editor x x - x x [ ] manpgs More man pages (online documentation) x x - x x [ ] diff GNU diffutils x x - x x [ ] sudo Allow special users limited root access x x - x x [ ] ghostscr GNU Ghostscript version 3.33 x x - x x [ ] gsfonts1 Ghostscript fonts (part one) x x - x x [ ] gsfonts2 Ghostscript fonts (part two) x x - x x [ ] gsfonts3 Ghostscript fonts (part three) x x - x x [ ] jed JED programmer's editor x x - x x [X] joe joe text editor, version 2.8 x x - x x [ ] jpeg JPEG image compression utilities x x - x x [ ] bc GNU bc - arbitrary precision math language x x - x x [ ] workbone a text-based audio CD player x x - x x [X] mc The Midnight Commander file manager x x - x x [ ] mt_st mt ported from BSD - controls tape drive x x - x x [ ] groff GNU troff document formatting system x x - x x [ ] quota User disk quota utilities x x - x x [ ] sc The 'sc' spreadsheet x x - x x [ ] texinfo GNU texinfo documentation system x x - x x [ ] vim Improved vi clone x x - x x [ ] ash A small /bin/sh type shell - 62K x x - x x [ ] zsh Zsh - a custom *nix shell x x - x mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj x - - - From the 'N' package I only loaded TCPIP. This isn't really neces­ - sary, but is very handy and allows access to the network while working - on a repair or update with the root raid array dismounted. TCPIP also - contains 'biff' which is used by some of the applications in 'A'. If - you don't install 'N' you might want to install the biff package any­ - way. - - lqqqq SELECTING PACKAGES FROM SERIES N (NETWORK/NEWS/MAIL/UUCP) qqqqqk - x lqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk x - x x [ ] apache Apache WWW (HTTP) server x x - x x [ ] procmail Mail delivery/filtering utility x x - x x [ ] dip Handles SLIP/CSLIP connections x x - x x [ ] ppp Point-to-point protocol x x - x x [ ] mailx The mailx mailer x x - x x [X] tcpip TCP/IP networking programs x x - x x [ ] bind Berkeley Internet Name Domain server x x - x x [ ] rdist Remote file distribution utility x x - x x [ ] lynx Text-based World Wide Web browser x x - x x [ ] uucp Taylor UUCP 1.06.1 with HDB && Taylor configs x x - x x [ ] elm Menu-driven user mail program x x - x x [ ] pine Pine menu-driven mail program x x - x x [ ] sendmail The sendmail mail transport agent x x - x x [ ] metamail Metamail multimedia mail extensions x x - x x [ ] smailcfg Extra configuration files for sendmail x x - x x [ ] cnews Spools and transmits Usenet news x x - x x [ ] inn InterNetNews news transport system x x - x x [ ] tin The 'tin' news reader (local or NNTP) x x - x x [ ] trn 'trn' for /var/spool/news x x - x x [ ] trn-nntp 'trn' for NNTP (install 1 'trn' maximum) x x - x x [ ] nn-spool 'nn' for /var/spool/news x x - x x [ ] nn-nntp 'nn' for NNTP (install 1 'nn' maximum) x x - x x [ ] netpipes Network pipe utilities x x - x mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj x - - - With the installation complete, say no to everything else (no to all - configuration requests) and exit the script. - - - 4.6. Install linux pthreads - - Now you must install the 'linuxthreads-0.71' library. I have included - this diff for the linuxthreads Makefile rather than explain the - details of the installation by hand. Save the original Makefile, - apply the diff and then: - - - cd /usr/src/linuxthreads-0.71 - patch - make - make install - - - - -------------------diff Makefile.old Makefile.raid----------------- - 2a3,13 - > # If you are building "linuxthreads" for installation on a mount - > # point which is not the "root" partition, redefine 'BUILDIR' to - > # the mount point to use as the "root" directory - > # You may wish to do this if you are building an 'initial ram disk' - > # such as used with bootable root raid devices. - > # REQUIRES ldconfig version 1.9.5 or better - > # do ldconfig -v to check - > # - > BUILDIR=/root/raidboot/mnt - > #BUILDIR= - > - 81,82c92,93 - < install pthread.h $(INCLUDEDIR)/pthread.h - < install semaphore.h $(INCLUDEDIR)/semaphore.h - --- - > install pthread.h $(BUILDIR)$(INCLUDEDIR)/pthread.h - > install semaphore.h $(BUILDIR)$(INCLUDEDIR)/semaphore.h - 84c95 - < test -f /usr/include/sched.h || install sched.h $(INCLUDEDIR)/sched.h - --- - > test -f $(BUILDIR)/usr/include/sched.h || install sched.h $(BUILDIR)$(INCLUDEDIR)/sched.h - 86,89c97,103 - < install $(LIB) $(LIBDIR)/$(LIB) - < install $(SHLIB) $(SHAREDLIBDIR)/$(SHLIB) - < rm -f $(LIBDIR)/$(SHLIB0) - < ln -s $(SHAREDLIBDIR)/$(SHLIB) $(LIBDIR)/$(SHLIB0) - --- - > install $(LIB) $(BUILDIR)$(LIBDIR)/$(LIB) - > install $(SHLIB) $(BUILDIR)$(SHAREDLIBDIR)/$(SHLIB) - > rm -f $(BUILDIR)$(LIBDIR)/$(SHLIB0) - > ln -s $(SHAREDLIBDIR)/$(SHLIB) $(BUILDIR)$(LIBDIR)/$(SHLIB0) - > ifneq ($(BUILDIR),) - > ldconfig -r ${BUILDIR} -n $(SHAREDLIBDIR) - > else - 91c105,106 - < cd man; $(MAKE) MANDIR=$(MANDIR) install - --- - > endif - > cd man; $(MAKE) MANDIR=$(BUILDIR)$(MANDIR) install - - - - 4.7. Install Raid Tools - - The next step is the installation of the raid tools. raidtools-0.42 - - You must run the "configure" script to point the Makefile at the build - directory for the ramdisk files - cd /usr/src/raidtools-0.42 - configure --sbindir=/root/raidboot/mnt/sbin --prefix=/root/raidboot/mnt/usr - make - make install - - - Now!! the Makefile for install is not quite right so do the following - to clean up. This will be fixed in future releases so that the re- - linking will not be necessary. - - - Fix the make install error - - - The file links specified in the Makefile at 'LINKS' must be removed - and re-linked to operate properly. - - cd /root/raidboot/mnt/sbin - ln -fs mdadd mdrun - ln -fs mdadd mdstop - - - - 4.8. Remove un-needed directories and files from new filesystem. - - Delete the following directories from filesystem (CAUTION DON'T DELETE - FROM YOUR RUNNING SYSTEM) it's easy to do, guess how I found out!!! - - cd /root/raidboot/mnt - rm -r home/ftp/* - rm -r lost+found - rm -r usr/doc - rm -r usr/info - rm -r usr/local/man - rm -r usr/man - rm -r usr/openwin - rm -r usr/share/locale - rm -r usr/X* - rm -r var/man - rm -r var/log/packages - rm -r var/log/setup - rm -r var/log/disk_contents - - - - 4.9. Create /dev/md x - - The last step simply copies the /dev/md* devices from the current file - system onto the rescue file system. You could create these with - mknode. - - cp -a /dev/md* /root/raidboot/mnt/dev - - - - 4.10. Create a bare filesystem suitable for initrd - - Now you have a clean re-useable filesystem ready for customization. - Once customized, this file system can be used for rescue should the - raid device(s) become corrupted and the raid tools needed to fix them. - It will also be used to boot and root-mount the raid device by adding - the linuxrc file which will be discussed next. - - - Copy the file system to a smaller device for the initrd file, 16 megs - should be large enough. - - Create the smaller file system and mount it - - cd /root/raidboot - dd if=/dev/zero of=bare.fs bs=1024k count=16 - - - associate the file with a loop device and generate a ext2 file system - on the file - - losetup /dev/loop1 bare.fs - mke2fs -v -m0 -L initrd /dev/loop1 - mount /dev/loop1 mnt2 - - - Copy the 'build' file system to 'bare.fs' - - cp -a mnt/* mnt2 - - - Save the 'bare.fs' system before customization so later update is - easy. The 'build' file system is no longer needed and may be deleted. - - cd /root/raidboot - umount mnt - umount mnt2 - losetup -d /dev/loop0 - losetup -d /dev/loop1 - rm build - cp bare.fs rescue - gzip -9 bare.fs - - - - 4.10.1. Create the BOOT/RESCUE initrd filesystem - - Now copy the system dependent items that match the kernel from the - development platform, or you can manually modify the files in the - rescue file system to match your target system. - - losetup /dev/loop0 rescue - mount /dev/loop0 mnt - - - Make sure your etc directory is clean of *~, core and log files. The - next 2 commands creates some warning messages, ignore them. - - cp -dp /etc/* mnt/etc - cp -dp /etc/rc.d/* mnt/etc/rc.d - - mkdir mnt/lib/modules - cp -a /lib/modules/2.x.x mnt/lib/modules <--- your current 2.x.x - - - - 4.10.2. Corrections for the Rescue System - - Edit the following files to correct them for your rescue system. Some - file names listed below are Slackware specific but have equivalents in - other distributions. - - - - cd mnt - - Non-network - etc/fstab - etc/mdtab should work OK - Network - etc/hosts - etc/resolv.conf - etc/hosts.equiv and related files - etc/rc.d/rc.inet1 correct ip#, mask, gateway, etc... - etc/rc.d/rc.S remove entire section on file system status - from: - # Test to see if the root partition is read-only - to but not including: - # remove /etc/mtab* so that mount will ..... - This avoids the annoying warning that - the ramdisk is mounted rw. - etc/rc.d/rc.xxxxx others as required, see later on in this doc - root/.rhosts if present - home/xxxx/xxxx others as required - - WARNING: The above procedure moves your password and shadow - files onto the rescue disk!!!!! - - WARNING: You may not wish to do this for security reasons. - - - Create any directories for mounting /dev/disk... as may be required - that are unique to your system. These are the mountpoints for booting - the system (boot partition and backup boot partition). My system boot - from dos using loadlin, however linux partition(s) and lilo will work - fine. My system uses: - - cd /root/raidboot/mnt <--- initrd root - mkdir dosa dos partition mount point - mkdir dosb dos mirror mount point - - - The rescue file system is complete! - - You will note upon examination of the files in the rescue file system, - that there are still many files that could be deleted. I have not - done this since it would overly complicate this procedure and most - raid systems have adequate disk and memory. If you wish to skinny - down the file system, go to it! - - - 4.11. Making 'initrd' boot the RAID device - linuxrc - - To make the rescue disk boot the raid device, you need only copy the - executable script file: - - - linuxrc - - - to the root of the device. - - The theory of operation for this linuxrc file is discussed in - ``Appendix G, linuxrc theory of operation''. - - A very simple and much easier to understand (working) linuxrc is - included in ``Appendix D'', obsolete linuxrc and shutdown scripts. - Copy the following text to linuxrc and save in your development area. - - - -------------------- linuxrc ---------------------- - #!/bin/sh - # ver 1.13 3-6-98 - # - ################# BEGIN 'linuxrc' ################## - # DEFINE FUNCTIONS # - #################################################### - # Define 'Fault' function in the event something - # goes wrong during the execution of 'linuxrc' - # - FaultExit () { - # correct fstab to show '/dev/ram0' for rescue system - /bin/cat /etc/fstab | { - while read Line - do - if [ -z "$( echo ${Line} | /usr/bin/grep md0 )" ]; then - echo ${Line} - else - echo "/dev/ram0 / ext2 defaults 1 1" - fi - done - } > /etc/tmp.$$ - /bin/mv /etc/tmp.$$ /etc/fstab - # point root at /dev/ram0 (the rescue system) - echo 0x100>/proc/sys/kernel/real-root-dev - /bin/umount /proc - exit - } - - # Define 'Warning' procdure to print banner on boot terminal - # - Warning () { - echo '*********************************' - echo -e " $*" - echo '*********************************' - } - - # Define 'SplitKernelArg' to help extract 'Raid' related kernel arguments - SplitKernelArg () { eval $1='$( IFS=,; echo $2)' } - - #Define 'SplitConfArgs' to help extract system configuration arguments - SplitConfArgs () { - RaidBootType=$1 - RaidBootDevice=$2 - RaidConfigPath=$3 - } - ######################################################## - ################### MAIN linuxrc ####################### - ######################################################## - # mount the proc file system - /bin/mount /proc - - # Get the boot partition and configuration location from command line - CMDLINE=`/bin/cat /proc/cmdline` - for Parameter in $CMDLINE; do - Parameter=$( IFS='='; echo ${Parameter} ) - case $Parameter in - Raid*) SplitKernelArg $Parameter;; - esac - done - - # check for 'required raid boot' - if [ -z "${Raid_Conf}" ]; then - Warning Kernel command line \'Raid_Conf\' missing - FaultExit - fi - SplitConfArgs $Raid_Conf - - # tmp mount the boot partition - /bin/mount -t ${RaidBootType} ${RaidBootDevice} /mnt - - # get etc files from primary raid system - pushd /etc - - # this will un-tar into 'etc' (see rc.6) - if [ ! -f /mnt/${RaidConfigPath}/raidboot.etc ]; then - # bad news, this file should be here - Warning required file \'raidboot.etc\' \ - missing from ${RaidBootDevice}/${RaidConfigPath} \\n \ - \\tUsing rescue system defaults - else - /bin/tar -xf /mnt/${RaidConfigPath}/raidboot.etc - fi - # get 'real' raidboot device for this boot - # status path, and name of raidX.conf - if [ ! -f /mnt/${RaidConfigPath}/raidboot.cfg ]; then - # bad news, this file should be here - Warning required file 'raidboot.cfg' \ - missing from ${RaidBootDevice}/${RaidConfigPath}\\n \ - \\tUsing rescue system defaults - # Get the first raidX.conf file name in $RArg1 - RaidBootDevs=$RaidBootDevice - RaidStatusPath=$RaidConfigPath - for RaidConfigEtc in $( ls raid*.conf ) - do break; done - else - { - read RaidBootDevs - read RaidStatusPath - read RaidConfigEtc - } < /mnt/${RaidConfigPath}/raidboot.cfg - - fi - popd - /bin/umount /mnt - - # Set a flag in case the raid status file is not found - # - RAIDOWN="raidboot.ro not found" - RAIDREF="raidgood.ref not found" - echo "Reading md0 shutdown status." - - # search for raid shutdown status - for Device in ${RaidBootDevs} - do - # these filesystem types should be in 'fstab' since - # the partitions must be mounted for a clean raid shutdown - /bin/mount ${Device} /mnt - if [ -f /mnt/${RaidStatusPath}/raidboot.ro ]; then - RAIDOWN=`/bin/cat /mnt/${RaidStatusPath}/raidboot.ro` - RAIDREF=`/bin/cat /mnt/${RaidStatusPath}/raidgood.ref` - /bin/umount /mnt - break - fi - /bin/umount /mnt - done - # Test for a clean shutdown with array matching reference - if [ "${RAIDOWN}" != "${RAIDREF}" ]; then - Warning shutdown ERROR ${RAIDOWN} - FaultExit - fi - - # The raid array is clean, remove shutdown status files - for Device in ${RaidBootDevs} - do - /bin/mount ${Device} /mnt - /bin/rm -f /mnt/${RaidStatusPath}/raidboot.ro - /bin/umount /mnt - done - - # Write a clean superblock on all raid devices - - echo "write clean superblocks" - /sbin/mkraid -f --only-superblock /etc/${RaidConfigEtc} - - # Activate raid array(s) - if [ -z "$Raid_ALT" ]; then - /sbin/mdadd -ar - else - /sbin/mdadd $Raid_ALT - fi - - # If there are errors - BAIL OUT and leave rescue running - if [ $? -ne 0 ]; then - Warning some RAID device has errors - FaultExit - fi - - # Everything is fine, let the kernel mount /dev/md0 - # tell the kernel to switch to /dev/md0 as the /root device - # The 0x900 value is the device number calculated by: - # 256*major_device_number + minor_device number - echo "/dev/md0 mounted on root" - echo 0x900>/proc/sys/kernel/real-root-dev - # umount /proc to deallocate initrd device ram space - /bin/umount /proc - exit - #------------------ end linuxrc ---------------------- - - - Add 'linuxrc' to initrd boot device - - cd /root/raidboot - chmod 777 linuxrc - cp -p linuxrc mnt - - - - 4.12. Modifying the rc-scripts for SHUTDOWN - - To complete the installation, modify the rc scripts to save the md - status to the real root device when shutdown occurs. - - In slackware this is rc.0 -> rc.6 - In debian 'bo' this is in both 'halt' and 'reboot' - - If you implement this in another distribution, please e-mail - the instructions and sample files so they can be included here. - - - I have modified Bohumil Chalupa's raid stop work-around slightly. His - original solution is presented in ``Appendix A''. - - Since there are no linux partitions left on the production system - except md0, the boot partitions are used to store the raidOK readonly - status. I chose to write a file to each of the duplicate boot - partitions containing the status of the md array at shutdown and - signifying that the md device has been remounted RO. This allows the - system to be fail safe when any of the hard drives die. - - The shutdown script is modified to call ``rc.raidown'' which saves the - necessary information to successfully reboot and mount the raid - device. Examples of shutdown scripts for various linux distributions - are shown in ``Appendix B''. - - - To capture the raid array shutdown status insert a call to - ``rc.raidown'' after any case statements (if present) but before the - actual shutdown (kills, status saves, etc...) begins and before the - file systems are dismounted. - - ############ Save raid boot and status info ############## - # - if [ -x /etc/rc.d/rc.raidown ]; then - /etc/rc.d/rc.raidown - fi - ################## end raid boot ######################### - - - After all the file systems are dismounted (the root file system - - ################ for raid arrays ######################### - # Stop all known raid arrays (except root which won't stop) - if [ -x /sbin/mdstop ]; then - echo "Stopping raid" - /sbin/mdstop -a - fi - ########################################################## - - - This will cleanly stop all raid devices except root. Root status is - passed to the next boot in raidstat.ro. - - - Copy the rc file to your new raid array, the rescue file system that - is still mounted on /root/raidboot/mnt and the development system if - it is on the same machine. - - Modify rescue etc/fstab as needed and make sure rescue mdtab is - correct. - - Now copy the rescue disk to your dos partition and everything should - be ready to boot the raid device as root. - - umount mnt - losetup -d /dev/loop0 - gzip -9 rescue - - - Copy rescue.gz to your boot partitions. - - All that remains is to creat the configuration file raidboot.conf and - test the new file system by rebooting. - - - 4.13. Configuring RAIDBOOT - raidboot.conf - - The comments following the example configuration file explain each of - the three lines. This example file is for a 4 drive raid5 scsii array - with duplicate boot partitions on drives sda1 and sdb1. Put the - paramaters descriptive of your file systems here instead. - - - /dev/sda1 /dev/sdb1 - linux - raid5.conf - # comments may only be placed 'after' the three - # configuration lines. - # - # This is 'raidboot.conf' - # - # line one, the partition(s) containing the 'initrd' raid-rescue system - # It is not necessary to boot from these partitions, however, - # since the rescue system will not fit on floppy, it is necessary - # to know which partitions are to be used to load the rescue system - # - # line two, the path to the raidboot config information - # Where the shutdown status, etc... is located at boot time - # It does NOT include the mount point information, only 'path' - # /mntpoint/'path' - # - # line -3-, name of the raid configuration file - # Current raid configuration file i.e. raid1.conf, raid5.conf - - - - 4.14. Kernel 'loadlin and lilo' variables for RESCUE and RAID - - There are two kernel variables for the RESCUE and RAID system, only - the first need be specified. - - · Raid_Conf=msdos,/dev/sda1,raidboot - - This variable points to raid boot device and configuration - file. For floppy rescue boot, you may want to specify this - on the kernel command line or in the loadlin or lilo boot - file - - - format: 'filesystem-type,device,path-to-config-from-mount­ - point' - - - · Raid_ALT=-r,-p5,/dev/md0,/dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 - - Alternate mdadd parameters necessary when booting with non- - redundant raid array. These are the comma separated command - line parameters for mdadd. Unless they are needed to start a - failed/non-redundant array, COMMENT OUT OR SPECIFY WITH A - 'NULL'. - - - i.e. Raid_ALT= - - - Either of these parameters may be specified in the lilo or loadlin - boot parameter file or on the loadlin kernel command line. Care must - be taken that the maximum line length is not exceeded, however, if the - command line is used (128 characters). - - - When booting with lilo, the parameters are included in the lilo config - file in the form: - - append="Raid_Conf=msdos,/dev/sda1,raidboot" - append="Raid_ALT=-r,-p5,/dev/md0,/dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3" - - - See man lilo.conf for more detailed information. - - Since I have some hardware that requires DOS configuration utilities, - I have a small dos partition on the system. Therefore, I used loadlin - to boot the raid5 system from the dos partition with a mirror (copy) - on the companion disk. An identical method is used for the raid1 - system. The example below uses loadlin, but the procedure is very - similar for lilo. - - My dos root system contains a small editor among the utilities so I - can modify the boot parameters of loadlin if necessary, allowing me to - reboot the linux system on my swap disk while testing. - - The dos system contains this tree for linux" - - c:\raidboot.bat - c:\raidboot\loadlin.exe - c:\raidboot\zimage - c:\raidboot\rescue.gz - c:\raidboot\raidboot.cfg - c:\raidboot\raidboot.etc - c:\raidboot\raidgood.ref - c:\raidboot\raidstat.ro (only at shutdown) - - - ---------------------- linux.bat --------------------------- - echo "Start the LOADLIN process:" - c:\raidboot\loadlin @c:\raidboot\boot.par - -------------------- end linux.bat ------------------------- - - - boot.par contains: - - - - # loadlin boot parameter file - # - # version 1.02 3-6-98 - - # linux kernel image - c:\linux\zimage - - # target root device - root=/dev/md0 - #root=/dev/ram0 - #root=/dev/sdc5 - - # mount root device as 'ro' - ro - - # size of ram disk - ramdisk_size=16384 - - # initrd file name - initrd=c:\raidboot\rescue.gz - #noinitrd - - # memory ends here - mem=131072k - - # points to raid boot device, configuration file - # for floppy rescue boot, you may want to specify - # this on the command line instead of here - # format 'filesystem-type,device,path-to-config-frm_mntpnt' - Raid_Conf=msdos,/dev/sda1,raidboot - - # Alternate mdadd parameters - # necessary when boot with non-redundant raid - # otherwise, COMMENT OUT OR SPECIFY 'NULL' - #Raid_ALT=-r,-p5,/dev/md0,/dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 - - # ethernet devices - ether=10,0x300,eth0 - - ***** >> NOTE!! the only difference between forcing the rescue system to - run and the raid device mounting, is the loadlin parameter - - root=/dev/ram0 for the rescue system - root=/dev/md0 for RAID - - With root=/dev/ram0 the RAID device will not mount - and the rescue system will run unconditionally. - - - - If the RAID array fails, the rescue system is left mounted and - running. - - - 5. Configuring the Production RAID system. - - - 5.1. Two systems with identical motherboards were configured. System - specs. - - - - Raid-1 Raid-5 - Motherboard: Iwill P55TU dual ide adaptec scsi - Processor: Intel P200 - Disks: 2ea 7 gig 4 ea Segate 4.2 gig - Maxtors wide scsii - - - The disk drives are designated by linux as 'sda' through 'sdd' on the - raid5 system and 'hda' and 'hdc' on the raid1 system. - - - 5.2. Partitioning the hard drives. - - Since testing a large root mountable RAID array is difficult because - of the ckraid re-boot problem, I re-partitioned my swap space to - include a smaller RAID partition for testing purposes, - sda6,sdb6,sdc6,sdd6, and a small root and /usr/src partition pair for - developing and testing the raid kernel and tools. You may find this - helpful. - - - - /dosx/raidboot/raidgood.ref - - shutdown -r now - - - to do a clean reboot, and the system is up again. - - 6. Building the RAID file system. - - This description is for my RAID systems described in the system specs. - Your system may have a different RAID architecture, so modify as - appropriate. Please read the man pages and QuickStart.RAID that come - with the raidtools-0.42 - - 6.1. /etc/raid5.conf - - - # raid-5 configuration - raiddev /dev/md0 - raid-level 5 - nr-raid-disks 4 - chunk-size 32 - - # Parity placement algorithm - parity-algorithm left-symmetric - - # Spare disks for hot reconstruction - #nr-spare-disks 0 - - device /dev/sda3 - raid-disk 0 - - device /dev/sdb3 - raid-disk 1 - - device /dev/sdc3 - raid-disk 2 - - device /dev/sdd3 - raid-disk 3 - - - 6.2. /etc/raid1.conf - - - # raid-1 configuration - raiddev /dev/md0 - raid-level 1 - nr-raid-disks 2 - nr-spare-disks 0 - - device /dev/hda4 - raid-disk 0 - - device /dev/hdc4 - raid-disk 1 - - - - 6.3. Step by Step procedures for building production RAID file sys­ - tem. - - For my RAID5 system I did a complete install of: - - Slackware-3.4 any current distribution should work OK - linuxthreads-0.71 - raidtools-0.42 - linux-2.0.33 with raid145 patch and Gadi's patch - - - - Create and format the raid device. - - mkraid /etc/raid5.conf - mdcreate raid5 /dev/md0 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 - mdadd -ar - mke2fs /dev/md0 - mkdir /md - mount -t ext2 /dev/md0 /md - - - Create the reference files that reboot will use, this may be different - on your system. - - cat /proc/mdstat | grep md0 > /dosa/raidboot/raidgood.ref - cat /proc/mdstat | grep md0 > /dosb/raidboot/raidgood.ref - - - Use Slackware-3.4 or another distribution to build your OS - - setup - - - Specify '/md' as the target, and the source whatever your normally - use. Select and install the disksets of interest except for the ker­ - nel. Configure the system, but skip the section on lilo and kernel - booting. Exit setup. - - Install 'pthreads' - - cd /usr/src/linuxthreads-0.71 - - - edit the Makefile and specify - - - - BUILDIR=/md - - make - make install - - - Install 'raidtools' - - cd /usr/src/raidtools-0.42 - configure --sbindir=/md/sbin --prefix=/md/usr - - - fix the raidtools make install error - - cd /md/sbin - rm mdrun - rm mdstop - ln -s mdadd mdrun - ln -s mdadd mdstop - - - Create /dev/mdx - - cp -a /dev/md* /md/dev - - - Add the system configuration from the current system (ignore errors). - - cp -dp /etc/* mnt/etc - cp -dp /etc/rc.d/* mnt/etc/rc.d (include the new rc.6) - mkdir mnt/lib/modules - cp -a /lib/modules/2.x.x mnt/lib/modules <--- your current 2.x.x - - - Edit the following files to correct them for your file system - - cd /md - - Non-network - etc/fstab correct for real root and raid devices. - etc/mdtab should work OK - Network - etc/hosts - etc/resolv.conf - etc/hosts.equiv and related files - etc/rc.d/rc.inet1 correct ip#, mask, gateway, etc... - etc/rc.d/rc.S remove entire section on file system status - from: - # Test to see if the root partition isread-only - to but not including: - # remove /etc/mtab* so that mount will ..... - This avoids the annoying warning that - the ramdisk is mounted rw. - etc/rc.d/rc.xxxxx others as required - root/.rhosts if present - home/xxxx/xxxx others as required - - WARNING: The above procedure moves your password and shadow - files onto the new file system!!!!! - - WARNING: You may not wish to do this for security reasons. - - - Create any directories for mounting /dev/disk... as may be required - that are unique to your system. Mine need: - - cd /md <--- new file system root - mkdir dosa dos partition mount point - mkdir dosb dos mirror mount point - - - The new file system is complete. Make sure and save the md reference - status to the 'real' root device and you are ready to boot. - - mount the dos partitions on dosa and dosb - - cat /proc/mdstat | grep md0 > /dosa/raidboot/raidgood.ref - cat /proc/mdstat | grep md0 > /dosb/raidboot/raidgood.ref - - mdstop /dev/md0 - - - - 7. One last thought. - - Remember that an expert is someone who knows at least 1% more than you - do about a subject. Bear this in mind when you e-mail me for help. - I'll try, but I've only done this once for raid1 and once for raid5! - - Michael Robinton Michael@bzs.org - - - 8. Appendix A. - Bohumil Chalupa's md0 shutdown - - Bohumil Chalupa's post to the linux raid list on the work around for - the raid1 + 5 mdstop problem. His solution does not address the - possibility of the raid device being corrupt at shutdown. So I have - added a simple status comparison to a good reference status at boot. - This allows the operator to intervene if something is wrong with a - disk in the array. The description of this is in the main body of this - document. - - - - > From: Bohumil Chalupa - > - > I can now boot initrd and use linuxrc to start the RAID1 array, - > then successfully switch root to /dev/md0. - > - > I don't know, however, any way how to cleanly _stop_ the array. - - Well. I have to answer myself :-) - - > Date: Mon, 29 Dec 1997 02:21:38 -0600 (CST) - > From: Edward Welbon - > Subject: Re: dismounting root raid device - > - > For md devices other than raid0, there is probably state that needs to - > be saved that is only known once all writes have completed. Such state - > of course can't be saved to root once it is mounted readonly. In that - > case, you would have to be able to mount a writeable filesystem "X" - > on the readonly root and be able to write to "X" (I recall doing this - > during "rescue" operations, but not as an automated procedure). - > - > The filesystem "X" would presumably be a boot device from which the raid - > (during linuxrc exection via initrd) would pickup it's initial state from. - > Fortunately raid0 isn't required to write out any state (though it would - > be pleasant to be able to write the check sums to mdtab after an mdstop). - > Eventually, I will fiddle with this but it doesn't seem difficult though - > the "devil" is always in the "details". - - Yes, that's it. - I had this idea in mind for some time already, but had no time to try it. - Yesterday I did, and it works. - - With my RAID1 (mirror), I don't save any checksums or raid superblock data. - I only save an information on the "real" boot partition, that the root md - volume was remounted readonly during shutdown. Then, during boot, the - linuxrc script runs mkraid --only-superblock when it finds this - information; otherwise, it runs ckraid. - This means, that the raid superblock information is not updated during - shutdown; it's updated at the boot time. - It is not very clean, I'm afraid, :-( but it works. - - I'm using Slackware and initrd.md by Edward Welbon to boot the root raid - device. - As far as I remember now, the only modified files are - mkdisk and linuxrc, and /etc/rc.d/rc.6 shutdown script. - And lilo.conf, of course. - - I'm appending the important parts. - - Bohumil Chalupa - - --------------- my.linuxrc follows ----------------- - #!/bin/sh - # we need /proc - /bin/mount /proc - # start up the md0 device. let the /etc/rc.d scripts get the rest of them - # we should do as little as possible here - # ________________________________________ - # root raid1 shutdown test & recreation - # /start must be created on the rd image in my.mkdisk - echo "preparing md0: mounting /start" - /bin/mount /dev/sda2 /start -t ext2 - echo "reading saved md0 state from /start" - if [ -f /start/root.raid.ok ]; then - echo "raid ok, modyfying superblock" - rm /start/root.raid.ok - /sbin/mkraid /etc/raid1.conf -f --only-superblock - else - echo "raid not clean, runing ckraid --fix" - /sbin/ckraid --fix /etc/raid1.conf - fi - echo "unmounting /start" - /bin/umount /start - # _________________________________________ - # - echo "adding md0 for root file system" - /sbin/mdadd /dev/md0 /dev/sda1 /dev/sdb1 - echo "starting md0" - /sbin/mdrun -p1 /dev/md0 - # tell kernel we want to switch to /dev/md0 as root device, the 0x900 value - # is arrived at via 256*major_device_number + minor_device number. - echo "setting real-root-dev" - /bin/echo 0x900>/proc/sys/kernel/real-root-dev - # unmount /proc so that the ram disk can be deallocated. - echo "unmounting /proc" - /bin/umount /proc - /bin/echo "We are hopefully ready to mount /dev/md0 (major 9, minor 0) as - root" - exit - --------------- end of my.linuxrc ---------------------------------- - - - ----------- extract from /etc/rc.d/rc.6 follows ----------------- - # Turn off swap, then unmount local file systems. - echo "Turning off swap." - swapoff -a - echo "Unmounting local file systems." - umount -a -tnonfs - # Don't remount UMSDOS root volumes: - if [ ! "`mount | head -1 | cut -d ' ' -f 5`" = "umsdos" ]; then - mount -n -o remount,ro / - fi - - # Save raid state - echo "Saving RAID state" - /bin/mount -n /dev/sda2 /start -t ext2 - touch /start/root.raid.ok - /bin/umount -n /start - - -------------- end of excerpt from rc.6 ------------------------ - - - ------------------ part of my.mkdisk follows ---------------------- - # - # now we have the filesystem ready to be populated, we need to - # get a few important directories. I had endless trouble till - # I created a pristine mtab. In my case, it is convenient that - # /etc/mdtab is copied over, this way I can activate md with - # a simple "/sbin/mdadd -ar" in linuxrc. - # - cp -a $ROOT/etc $MOUNTPNT 2>cp.stderr 1>cp.stdout - rm -rf $MOUNTPNT/etc/mtab - rm -rf $MOUNTPNT/etc/ppp* - rm -rf $MOUNTPNT/etc/termcap - rm -rf $MOUNTPNT/etc/sendmail* - rm -rf $MOUNTPNT/etc/rc.d - rm -rf $MOUNTPNT/etc/dos* - cp -a $ROOT/sbin $ROOT/dev $ROOT/lib $ROOT/bin $MOUNTPNT 2>>cp.stderr - 1>>cp.stdout - # _____________________________________________________________________ - # RAID: will need mkraid and ckraid - cp -a $ROOT/usr/sbin/mkraid $ROOT/usr/sbin/ckraid $MOUNTPNT/sbin - 2>>cp.stderr 1>>cp.stdout - # --------------------------------------------------------------------- - # it seems that init wont come out to play unless it has utmp. this can - # probably be pruned back alot. no telling what the real bug was 8-). - # - mkdir $MOUNTPNT/var $MOUNTPNT/var/log $MOUNTPNT/var/run $MOUNTPNT/initrd - touch $MOUNTPNT/var/run/utmp $MOUNTPNT/etc/mtab - chmod a+r $MOUNTPNT/var/run/utmp $MOUNTPNT/etc/mtab - ln -s /var/run/utmp $MOUNTPNT/var/log/utmp - ln -s /var/log/utmp $MOUNTPNT/etc/utmp - ls -lstrd $MOUNTPNT/etc/utmp $MOUNTPNT/var/log/utmp $MOUNTPNT/var/run/utmp - # - # since I wanted to change the mount point, I needed this though - # I suppose that I could have done a "mkdir /proc" in linuxrc. - # - mkdir $MOUNTPNT/proc - chmod 555 $MOUNTPNT/proc - # - # ------------------------------------------------------ - # we'll mount the real boot device to /start temporarily - # to check the root raid state saved at shutdown time - # - mkdir $MOUNTPNT/start - # ------------------------------------------------------- - # - # need linuxrc (it is, after all, the point of this exercise). - # - if [ -x ./my.linuxrc ]; then - cp -a ./my.linuxrc $MOUNTPNT/linuxrc - chmod 777 $MOUNTPNT/linuxrc - else - ln -s /bin/sh $MOUNTPNT/linuxrc - fi - # - ----------------- part of my.mkdisk ends ----------------- - - - - 9. Appendix B. - Sample SHUTDOWN scripts - - - · ``Slackware'' - - · ``Debian'' - - - 9.1. Slackware - /etc/rc.d/rc.6 - - - - #! /bin/sh - # - # rc.6 This file is executed by init when it goes into runlevel - # 0 (halt) or runlevel 6 (reboot). It kills all processes, - # unmounts file systems and then either halts or reboots. - # - # Version: @(#)/etc/rc.d/rc.6 1.50 1994-01-15 - # - # Author: Miquel van Smoorenburg - # Modified by: Patrick J. Volkerding, - # - # Modified by: Michael A. Robinton < michael@bizsystems.com > - # to add call to rc.raidown - # Set the path. - PATH=/sbin:/etc:/bin:/usr/bin - - # Set linefeed mode to avoid staircase effect. - stty onlcr - - echo "Running shutdown script $0:" - - # Find out how we were called. - case "$0" in - *0) - message="The system is halted." - command="halt" - ;; - *6) - message="Rebooting." - command=reboot - ;; - *) - echo "$0: call me as \"rc.0\" or \"rc.6\" please!" - exit 1 - ;; - esac - - ############ Save raid boot and status info ############## - # - if [ -x /etc/rc.d/rc.raidown ]; then - /etc/rc.d/rc.raidown - fi - ################## end raid boot ######################### - - # Kill all processes. - # INIT is supposed to handle this entirely now, but this didn't always - # work correctly without this second pass at killing off the processes. - # Since INIT already notified the user that processes were being killed, - # we'll avoid echoing this info this time around. - if [ "$1" != "fast" ]; then # shutdown did not already kill all processes - killall5 -15 - killall5 -9 - fi - - # Try to turn off quota and accounting. - if [ -x /usr/sbin/quotaoff ] - then - echo "Turning off quota." - /usr/sbin/quotaoff -a - fi - if [ -x /sbin/accton ] - then - echo "Turning off accounting." - /sbin/accton - fi - - # Before unmounting file systems write a reboot or halt record to wtmp. - $command -w - - # Save localtime - [ -e /usr/lib/zoneinfo/localtime ] && cp /usr/lib/zoneinfo/localtime /etc - - # Asynchronously unmount any remote filesystems: - echo "Unmounting remote filesystems." - umount -a -tnfs & - - # Turn off swap, then unmount local file systems. - echo "Turning off swap." - swapoff -a - echo "Unmounting local file systems." - umount -a -tnonfs - # Don't remount UMSDOS root volumes: - if [ ! "`mount | head -1 | cut -d ' ' -f 5`" = "umsdos" ]; then - mount -n -o remount,ro / - fi - - ################ for raid arrays ######################### - # Stop all known raid arrays (except root which won't stop) - if [ -x /sbin/mdstop ]; then - echo "Stopping raid" - /sbin/mdstop -a - fi - ########################################################## - - # See if this is a powerfail situation. - if [ -f /etc/powerstatus ]; then - echo "Turning off UPS, bye." - /sbin/powerd -q - exit 1 - fi - - # Now halt or reboot. - echo "$message" - [ ! -f /etc/fastboot ] && echo "On the next boot fsck will be FORCED." - $command -f - ############### end rc.6 ################################# - - - - 9.2. Debian bo - /etc/init.d/halt and /etc/init.d/reboot - - The modifications shown here for Debian bo halt and reboot files are - NOT TESTED. When you test this, please e-mail me so I can remove this - comment. - - - 9.2.1. /etc/init.d/halt - - - - #! /bin/sh - # - # halt The commands in this script are executed as the last - # step in runlevel 0, ie halt. - # - # Version: @(#)halt 1.10 26-Apr-1997 miquels@cistron.nl - # - - PATH=/sbin:/bin:/usr/sbin:/usr/bin - - ############ Save raid boot and status info ############## - # - if [ -x /etc/rc.d/rc.raidown ]; then - /etc/rc.d/rc.raidown - fi - ################## end raid boot ######################### - - # Kill all processes. - echo -n "Sending all processes the TERM signal... " - killall5 -15 - echo "done." - sleep 5 - echo -n "Sending all processes the KILL signal... " - killall5 -9 - echo "done." - - # Write a reboot record to /var/log/wtmp. - halt -w - - # Save the random seed between reboots. - /etc/init.d/urandom stop - - echo -n "Deactivating swap... " - swapoff -a - echo "done." - - echo -n "Unmounting file systems... " - umount -a - echo "done." - - mount -n -o remount,ro / - - ################ for raid arrays ######################### - # Stop all known raid arrays (except root which won't stop) - if [ -x /sbin/mdstop ]; then - echo "Stopping raid" - /sbin/mdstop -a - fi - ########################################################## - - # See if we need to cut the power. - if [ -x /etc/init.d/ups-monitor ] - then - /etc/init.d/ups-monitor poweroff - fi - - halt -d -f - ############# end halt #################### - - - - 9.2.2. /etc/init.d/reboot - - - - #! /bin/sh - # - # reboot The commands in this script are executed as the last - # step in runlevel 6, ie reboot. - # - # Version: @(#)reboot 1.9 02-Feb-1997 miquels@cistron.nl - # - - PATH=/sbin:/bin:/usr/sbin:/usr/bin - - ############ Save raid boot and status info ############## - # - if [ -x /etc/rc.d/rc.raidown ]; then - /etc/rc.d/rc.raidown - fi - ################## end raid boot ######################### - - # Kill all processes. - echo -n "Sending all processes the TERM signal... " - killall5 -15 - echo "done." - sleep 5 - echo -n "Sending all processes the KILL signal... " - killall5 -9 - echo "done." - - # Write a reboot record to /var/log/wtmp. - halt -w - - # Save the random seed between reboots. - /etc/init.d/urandom stop - - echo -n "Deactivating swap... " - swapoff -a - echo "done." - - echo -n "Unmounting file systems... " - umount -a - echo "done." - - mount -n -o remount,ro / - - ################ for raid arrays ######################### - # Stop all known raid arrays (except root which won't stop) - if [ -x /sbin/mdstop ]; then - echo "Stopping raid" - /sbin/mdstop -a - fi - ########################################################## - - echo -n "Rebooting... " - reboot -d -f -i - - - - 10. Appendix C. - other setup files - - - 10.1. linuxrc``linuxrc file'' - - - 10.2. loadlin -- linux.bat file - boot.par``linux.bat file - - boot.par'' - - - 10.3. linuxthreads Makefile.diff``linuxthreads Makefile.diff'' - - - 10.4. raid1.conf``raid1.conf'' - - - 10.5. raid5.conf``raid5.conf'' - - - 10.6. raidboot.conf``raidboot.conf'' - - - 10.7. rc.raidown``rc.raidown'' - - 11. Appendix D. - obsolete linuxrc and shutdown scripts - - - 11.1. Obsolete working - linuxrc - - This linuxrc file works fine with the shutdown procedure in the next - subsection. - - - - ---------------------- linuxrc -------------------- - #!/bin/sh - # ver 1.07 2-12-98 - # linuxrc - for raid1 using small dos partition and loadlin - # - - # mount the proc file system - /bin/mount /proc - - # This may vary for your system. - # Mount the dos partitions, try both - # in case one disk is dead - /bin/mount /dosa - /bin/mount /dosc - - # Set a flag in case the raid status file is not found - # then check both drives for the status file - RAIDOWN="raidstat.ro not found" - /bin/echo "Reading md0 shutdown status." - if [ -f /dosa/raidboot/raidstat.ro ]; then - RAIDOWN=`/bin/cat /dosa/raidboot/raidstat.ro` - RAIDREF=`/bin/cat /dosc/raidboot/raidgood.ref` - else - if [ -f /dosc/raidboot/raidstat.ro ]; then - RAIDOWN=`/bin/cat /dosc/raidboot/raidstat.ro` - RAIDREF=`/bin/cat /dosc/raidboot/raidgood.ref` - fi - fi - - # Test for a clean shutdown with all disks operational - if [ "${RAIDOWN} != ${RAIDREF}" ]; then - echo "ERROR ${RAIDOWN}" - # Use the next 2 lines to BAIL OUT and leave rescue running - /bin/echo 0x100>/proc/sys/kernel/real-root-dev - exit # leaving the error files in dosa/raidboot,etc... - fi - - # The raid array is clean, proceed by removing - # status file and writing a clean superblock - /bin/rm /dosa/raidboot/raidstat.ro - /bin/rm /dosc/raidboot/raidstat.ro - /sbin/mkraid /etc/raid1.conf -f --only-superblock - - /bin/umount /dosa - /bin/umount /dosc - - # Mount raid array - echo "Mounting md0, root filesystem" - /sbin/mdadd -ar - - # If there are errors - BAIL OUT and leave rescue running - if [ $? -ne 0 ]; then - echo "RAID device has errors" - # Use the next 3 lines to BAIL OUT - /bin/rm /etc/mtab # remove bad mtab - /bin/echo 0x100>/proc/sys/kernel/real-root-dev - exit - fi - - # else tell the kernel to switch to /dev/md0 as the /root device - # The 0x900 value the device number calculated by: - # 256*major_device_number + minor_device number - /bin/echo 0x900>/proc/sys/kernel/real-root-dev - - # umount /proc to deallocate initrd device ram space - /bin/umount /proc - /bin/echo "/dev/md0 mounted as root" - exit - #------------------ end linuxrc ---------------------- - - - - 11.2. Obsolete working - shutdown scripts - - This shutdown procedure works fine with the preceeding linuxrc - - To capture the raid array shutdown status, just before the file - systems are dismounted insert: - - RAIDSTATUS=`/bin/cat /proc/mdstat | /usr/bin/grep md0` - - - After all the file systems are dismounted (the root file system - - # root device remains mounted RO - # mount dos file systems RW - mount -n -o remount,ro / - echo "Writing RAID read-only boot FLAG(s)." - mount -n /dosa - mount -n /dosc - # create raid mounted RO flag in duplicate - # containing the shutdown status of the raid array - echo ${RAIDSTATUS} > /dosa/raidboot/raidstat.ro - echo ${RAIDSTATUS} > /dosc/raidboot/raidstat.ro - - umount -n /dosa - umount -n /dosc - - # Stop all the raid arrays (except root) - echo "Stopping raid" - mdstop -a - - - This will cleanly stop all raid devices except root. Root status is - passed to the next boot in raidstat.ro. - - The complete shutdown script from my old raid1 Slackware system - follows, I have switched raid1 to the new procedure with the - /etc/raidboot.conf file. - - - - #! /bin/sh - # - # rc.6 This file is executed by init when it goes into runlevel - # 0 (halt) or runlevel 6 (reboot). It kills all processes, - # unmounts file systems and then either halts or reboots. - # - # Version: @(#)/etc/rc.d/rc.6 1.50 1994-01-15 - # - # Author: Miquel van Smoorenburg - # Modified by: Patrick J. Volkerding, - # Modified by: Michael A. Robinton, for RAID shutdown - - # Set the path. - PATH=/sbin:/etc:/bin:/usr/bin - - # Set linefeed mode to avoid staircase effect. - stty onlcr - - echo "Running shutdown script $0:" - - # Find out how we were called. - case "$0" in - *0) - message="The system is halted." - command="halt" - ;; - *6) - message="Rebooting." - command=reboot - ;; - *) - echo "$0: call me as \"rc.0\" or \"rc.6\" please!" - exit 1 - ;; - esac - - # Kill all processes. - # INIT is supposed to handle this entirely now, but this didn't always - # work correctly without this second pass at killing off the processes. - # Since INIT already notified the user that processes were being killed, - # we'll avoid echoing this info this time around. - if [ "$1" != "fast" ]; then # shutdown did not already kill all processes - killall5 -15 - killall5 -9 - fi - - # Try to turn off quota and accounting. - if [ -x /usr/sbin/quotaoff ] - then - echo "Turning off quota." - /usr/sbin/quotaoff -a - fi - if [ -x /sbin/accton ] - then - echo "Turning off accounting." - /sbin/accton - fi - - # Before unmounting file systems write a reboot or halt record to wtmp. - $command -w - - # Save localtime - [ -e /usr/lib/zoneinfo/localtime ] && cp /usr/lib/zoneinfo/localtime /etc - - # Asynchronously unmount any remote filesystems: - echo "Unmounting remote filesystems." - umount -a -tnfs & - - # you must have issued - # 'cat /proc/mdstat | grep md0 > {your boot vol}/raidboot/raidgood.ref' - # before linuxrc will execute properly with this info - RAIDSTATUS=`/bin/cat /proc/mdstat | /usr/bin/grep md0 # capture raid status` - - # Turn off swap, then unmount local file systems. - # clearing mdtab as well - echo "Turning off swap." - swapoff -a - echo "Unmounting local file systems." - umount -a -tnonfs - - # Don't remount UMSDOS root volumes: - if [ ! "`mount | head -1 | cut -d ' ' -f 5`" = "umsdos" ]; then - mount -n -o remount,ro / - fi - - # root device remains mounted - # mount dos file systems RW - echo "Writing RAID read-only boot FLAG(s)." - mount -n /dosa - mount -n /dosc - # create raid mounted RO flag in duplicate - # containing the shutdown status of the raid array - echo ${RAIDSTATUS} > /dosa/raidboot/raidstat.ro - echo ${RAIDSTATUS} > /dosc/raidboot/raidstat.ro - - umount -n /dosa - umount -n /dosc - - # Stop all the raid arrays (except root) - echo "Stopping raid" - mdstop -a - - # See if this is a powerfail situation. - if [ -f /etc/power_is_failing ]; then - echo "Turning off UPS, bye." - /sbin/powerd -q - exit 1 - fi - - # Now halt or reboot. - echo "$message" - [ ! -f /etc/fastboot ] && echo "On the next boot fsck will be FORCED." - $command -f - - - - 12. Appendix E. - Gadi's raid stop patch for the linux kernel - - - - --- linux/drivers/block/md.c.old Fri Nov 21 13:37:11 1997 - +++ linux/drivers/block/md.c Sat Dec 6 13:34:28 1997 - @@ -622,8 +622,13 @@ - return do_md_run (minor, (int) arg); - - case STOP_MD: - - return do_md_stop (minor, inode); - - - + err = do_md_stop(minor, inode); - + if (err) { - + printk("md: enabling auto mdstop for %s\n", - devname(inode->i_rdev)); - + md_dev[minor].auto_mdstop = 1; - + } - + return err; - + - case BLKGETSIZE: /* Return device size */ - if (!arg) return -EINVAL; - err=verify_area (VERIFY_WRITE, (long *) arg, sizeof(long)); - @@ -692,6 +697,10 @@ - - sync_dev (inode->i_rdev); - md_dev[minor].busy--; - + if (!md_dev[minor].busy && md_dev[minor].auto_mdstop) { - + do_md_stop(minor, inode); - + md_dev[minor].auto_mdstop = 0; - + } - } - - static int md_read (struct inode *inode, struct file *file, - --- linux/include/linux/md.h~ Fri Nov 21 13:29:14 1997 - +++ linux/include/linux/md.h Fri Nov 21 13:29:14 1997 - @@ -260,6 +260,7 @@ - int repartition; - int busy; - int nb_dev; - + int auto_mdstop; - void *private; - }; - - - - 13. Appendix F. - rc.raidown - - Copy the following text into the script file rc.raidown and save it in - /etc/rc.d. - - - - #! /bin/sh - # - # rc.raidown This file is executed by init when it goes into runlevel - # 0 (halt) or runlevel 6 (reboot). It saves the status of - # a root mounted raid array for subsequent re-boot - # - # Version: 1.08 3-25-98 Michael A. Robinton < michael@bizsystems.com > - # - ############ Save raid boot and status info ############## - if [ -f /etc/raidboot.conf ] - then - { - read RaidBootDevs - read RaidStatusPath - read RaidConfigEtc - } < /etc/raidboot.conf - - # you must have issued - # cat /proc/mdstat | grep md0 > - # {your boot vol mnt(s)}/{RaidStatusPath}/raidgood.ref - # before linuxrc will execute properly with this info - # - # capture raid status - RAIDSTATUS=`/bin/cat /proc/mdstat | /usr/bin/grep md0` - mkdir /tmp/raid$$ - echo "Writing RAID read-only boot FLAG(s)." - for Device in ${RaidBootDevs} - do - # get mount point for raid boot device or use tmp - RBmount=$( cat /proc/mounts | /usr/bin/grep ${Device} ) - if [ -n ${RBmounts} ]; then - RBmount=$( echo ${RBmount} | cut -f 2 -d ' ' ) - else - RBmount="/tmp/raid$$" - mount ${Device} ${RBmount} - fi - if [ -d ${RBmount}/${RaidStatusPath} ]; then - # Create raid mounted RO flag = shutdown status of raid array - echo ${RAIDSTATUS} > ${RBmount}/${RaidStatusPath}/raidboot.ro - # Don't propagate 'fstab' from ramdisk - if [ -f /linuxrc ]; then - FSTAB= - else - FSTAB=fstab - fi - pushd /etc - # Save etc files for rescue system - /bin/tar --ignore-failed-read \ - -cf ${RBmount}/${RaidStatusPath}/raidboot.etc \ - raid*.conf mdtab* ${FSTAB} lilo.conf - popd - # Create new raidboot.cfg - { - /bin/echo ${RaidBootDevs} - /bin/echo ${RaidStatusPath} - /bin/echo ${RaidConfigEtc} - } > ${RBmount}/${RaidStatusPath}/raidboot.cfg - /bin/umount ${RBmount} - fi - done - rmdir /tmp/raid$$ - echo "Raid boot armed" - fi - ################## end raid boot ######################### - - - 14. Appendix G. - linuxrc theory of operation - - This is the complex form of the linuxrc file for root mounted raid. - It must be processed with 'bash' or another shell that recognizes - shell functions. - - The advantage is that it is generic and is not dependent on startup - files and parameters located in the initrd image. - - A Raid_Conf parameter passed to linuxrc by the kernel at boot from - lilo or loadlin contains a pointer to the boot devices and location - the of initial 2 raidboot files needed by linuxrc (raidboot.etc and - raidboot.cfg placed by the shutdown script). - - raidboot.etc containing the 'tar'ed files: - - raid* - mdtab* - fstab - lilo.conf ( if applicable ) - - - from the primary system that are transferred to the initrd - /etcetc directory at startup. With care, this file may be - edited if necessary when your system 'really' crashes. - - raidboot.cfg contains the name of the boot partition in use - and applicable backup(s) as well as the path to the rest of - the raid start up file used by linuxrc. This file is - normally created by the shutdown file and may be created - manually if necessary. - - raidboot.cfg is of the form, 3 lines - no comments - - /dev/bootdev1 /dev/bootdev2 [/dev/bootdev3 ... and so on] - raid-status/path - name_of_raidX.conf_file - - - - the raid-status/path does not include the name of the mount­ - point - - the raidX.conf filename is that one found in /etc and - normally used for ckraid and mkraid. - - - - The following additional files reside on the permanent raid boot par­ - titions. This is usually the same as above, but in emergency situa­ - tions may be loaded from anywhere they are available, such as a floppy - boot disk. - - · raidgood.ref created by the command cat /proc/mdstat | grep md0 > - /{raid_status_path}/raidgood.ref - - - See the ``shutdown scripts'' for saving this file and the next - - - · raidstat.ro created at each shutdown by the shutdown rc file, - saving the exit status of the raid array. - - - - 15. Appendix H. Setting up ROOT RAID on RedHat - - From the linux-raid@vger.rutgers.edu mail list. - - - ! Has anyone figured out how to do root-mounted RAID (as per - ! the Root-RAID HOWTO) using RedHat? The problem is that there - ! is no equivalent of Slackware's setup to install the root - ! filesystem to the RAID device. All RedHat installs have to - ! run from the install floppy, which makes it almost - ! impossible to get at the md devices and utilities during the - ! install. - ! - ! I think it's much easier to go out of the distribution and do it by - ! hand!! - - Assuming you have enough RAM (or a spare hard disk), install a minimal - system onto what will be your swap space (or onto your spare hard disk) - and/or /boot. Now do your mkraid, your mke2fs, mdrun, and mount. Next, do: - - tar clf - / | tar xpfC - /mnt/raidwasmountedhere - - (you may want a "v" in the second tar's flags) - Once this is done, you can set up lilo (or whatever) so that the new - raid partition is root. Then go in with RPM and/or glint (I hate - glint's behavior in the face of failed dependencies, which was fixed - but they broke it again for RH5.0 plus you can go back and forth - forever between an old and a new version of a package without - realizing the other version is installed) and install what you - really wanted. - - All this assuming you couldn't sneak in at some point in the install - and do your mkraid then at the VC with the shell prompt... - - ! I'm building a server at the moment and I think it would be tidier - ! and less likely to cause problems in the future if I start with - ! glibc2, rather than move to it later. - ! - ! Me too. - ! - ! The reason I'd like to be able to use RedHat is that they - ! are the only major distribution that I know of with a - ! glibc2-based release. - ! - ! Debian works fine with me. There isn't a CD yet, but you can grab the - ! distribution by ftp. - - I avoided root-raid like the plague, largely because initrd is an - extra, very fragile step (having to rdev, and having lilo depend on - the bios' ID number to find the kernel's partition, are bad enough!). - However, Red Hat does have a nice mkinitrd script, needed since they - left all their SCSI drivers modular. Hack that to include your - raid utils, make sure your mdadd -ar is in the right spot in - /etc/rc.d/rc.sysinit (before any fscking) and make sure mdstop -a is - in /etc/rc.d/init.d/halt after the RO-remount of /, and go for it! - - - - Keith kwrohrer@enteract.com - - - Software-RAID HOWTO - Linas Vepstas, linas@linas.org - v0.54, 21 November 1998 - - RAID stands for ''Redundant Array of Inexpensive Disks'', and is meant - to be a way of creating a fast and reliable disk-drive subsystem out - of individual disks. RAID can guard against disk failure, and can - also improve performance over that of a single disk drive. This docu- - ment is a tutorial/HOWTO/FAQ for users of the Linux MD kernel exten- - sion, the associated tools, and their use. The MD extension imple- - ments RAID-0 (striping), RAID-1 (mirroring), RAID-4 and RAID-5 in - software. That is, with MD, no special hardware or disk controllers - are required to get many of the benefits of RAID. - ______________________________________________________________________ - - Table of Contents - - - 1. Introduction - - 2. Understanding RAID - - 3. Setup & Installation Considerations - - 4. Error Recovery - - 5. Troubleshooting Install Problems - - 6. Supported Hardware & Software - - 7. Modifying an Existing Installation - - 8. Performance, Tools & General Bone-headed Questions - - 9. High Availability RAID - - 10. Questions Waiting for Answers - - 11. Wish List of Enhancements to MD and Related Software - - - - ______________________________________________________________________ - - - Preamble - This document is copyrighted and GPL'ed by Linas Vepstas - (linas@linas.org). Permission to use, copy, distribute this - document for any purpose is hereby granted, provided that the - author's / editor's name and this notice appear in all copies - and/or supporting documents; and that an unmodified version of - this document is made freely available. This document is - distributed in the hope that it will be useful, but WITHOUT ANY - WARRANTY, either expressed or implied. While every effort has - been taken to ensure the accuracy of the information documented - herein, the author / editor / maintainer assumes NO - RESPONSIBILITY for any errors, or for any damages, direct or - consequential, as a result of the use of the information - documented herein. - - - RAID, although designed to improve system reliability by adding - redundancy, can also lead to a false sense of security and - confidence when used improperly. This false confidence can lead - to even greater disasters. In particular, note that RAID is - designed to protect against *disk* failures, and not against - *power* failures or *operator* mistakes. Power failures, buggy - development kernels, or operator/admin errors can lead to - damaged data that it is not recoverable! RAID is *not* a - substitute for proper backup of your system. Know what you are - doing, test, be knowledgeable and aware! - - 1. Introduction - - - 1. Q: What is RAID? - - A: RAID stands for "Redundant Array of Inexpensive Disks", - and is meant to be a way of creating a fast and reliable - disk-drive subsystem out of individual disks. In the PC - world, "I" has come to stand for "Independent", where mar- - keting forces continue to differentiate IDE and SCSI. In - it's original meaning, "I" meant "Inexpensive as compared to - refrigerator-sized mainframe 3380 DASD", monster drives - which made nice houses look cheap, and diamond rings look - like trinkets. - - - - 2. Q: What is this document? - - A: This document is a tutorial/HOWTO/FAQ for users of the - Linux MD kernel extension, the associated tools, and their - use. The MD extension implements RAID-0 (striping), RAID-1 - (mirroring), RAID-4 and RAID-5 in software. That is, with - MD, no special hardware or disk controllers are required to - get many of the benefits of RAID. - - - This document is NOT an introduction to RAID; you must find - this elsewhere. - - - - 3. Q: What levels of RAID does the Linux kernel implement? - - A: Striping (RAID-0) and linear concatenation are a part of - the stock 2.x series of kernels. This code is of production - quality; it is well understood and well maintained. It is - being used in some very large USENET news servers. - - - RAID-1, RAID-4 & RAID-5 are a part of the 2.1.63 and greater - kernels. For earlier 2.0.x and 2.1.x kernels, patches exist - that will provide this function. Don't feel obligated to - upgrade to 2.1.63; upgrading the kernel is hard; it is - *much* easier to patch an earlier kernel. Most of the RAID - user community is running 2.0.x kernels, and that's where - most of the historic RAID development has focused. The - current snapshots should be considered near-production - quality; that is, there are no known bugs but there are some - rough edges and untested system setups. There are a large - number of people using Software RAID in a production - environment. - - - RAID-1 hot reconstruction has been recently introduced - (August 1997) and should be considered alpha quality. - RAID-5 hot reconstruction will be alpha quality any day now. - - - A word of caution about the 2.1.x development kernels: these - are less than stable in a variety of ways. Some of the - newer disk controllers (e.g. the Promise Ultra's) are - supported only in the 2.1.x kernels. However, the 2.1.x - kernels have seen frequent changes in the block device - driver, in the DMA and interrupt code, in the PCI, IDE and - SCSI code, and in the disk controller drivers. The - combination of these factors, coupled to cheapo hard drives - and/or low-quality ribbon cables can lead to considerable - heartbreak. The ckraid tool, as well as fsck and mount put - considerable stress on the RAID subsystem. This can lead to - hard lockups during boot, where even the magic alt-SysReq - key sequence won't save the day. Use caution with the 2.1.x - kernels, and expect trouble. Or stick to the 2.0.34 kernel. - - - - 4. Q: I'm running an older kernel. Where do I get patches? - - A: Software RAID-0 and linear mode are a stock part of all - current Linux kernels. Patches for Software RAID-1,4,5 are - available from - . See also the - quasi-mirror for patches, tools and other goodies. - - - - 5. Q: Are there other Linux RAID references? - - A: - - o Generic RAID overview: - . - - o General Linux RAID options: - . - - o Latest version of this document: - . - - o Linux-RAID mailing list archive: - . - - o Linux Software RAID Home Page: - . - - o Linux Software RAID tools: - . - - o How to setting up linear/stripped Software RAID: - . - - o Bootable RAID mini-HOWTO: - . - - o Root RAID HOWTO: . - - o Linux RAID-Geschichten: - . - - - - 6. Q: Who do I blame for this document? - - A: Linas Vepstas slapped this thing together. However, most - of the information, and some of the words were supplied by - - o Bradley Ward Allen - - o Luca Berra - - o Brian Candler - - o Bohumil Chalupa - - o Rob Hagopian - - o Anton Hristozov - - o Miguel de Icaza - - o Marco Meloni - - o Ingo Molnar - - o Alvin Oga - - o Gadi Oxman - - o Vaughan Pratt - - o Steven A. Reisman - - o Michael Robinton - - o Martin Schulze - - o Geoff Thompson - - o Edward Welbon - - o Rod Wilkens - - o Johan Wiltink - - o Leonard N. Zubkoff - - o Marc ZYNGIER - - - Copyrights - - o Copyright (C) 1994-96 Marc ZYNGIER - - o Copyright (C) 1997 Gadi Oxman, Ingo Molnar, Miguel de - Icaza - - o Copyright (C) 1997, 1998 Linas Vepstas - - o By copyright law, additional copyrights are implicitly - held by the contributors listed above. - - Thanks all for being there! - - - - 2. Understanding RAID - - - 1. Q: What is RAID? Why would I ever use it? - - A: RAID is a way of combining multiple disk drives into a - single entity to improve performance and/or reliability. - There are a variety of different types and implementations - of RAID, each with its own advantages and disadvantages. - For example, by putting a copy of the same data on two disks - (called disk mirroring, or RAID level 1), read performance - can be improved by reading alternately from each disk in the - mirror. On average, each disk is less busy, as it is han- - dling only 1/2 the reads (for two disks), or 1/3 (for three - disks), etc. In addition, a mirror can improve reliability: - if one disk fails, the other disk(s) have a copy of the - data. Different ways of combining the disks into one, - referred to as RAID levels, can provide greater storage - efficiency than simple mirroring, or can alter latency - (access-time) performance, or throughput (transfer rate) - performance, for reading or writing, while still retaining - redundancy that is useful for guarding against failures. - - Although RAID can protect against disk failure, it does not - protect against operator and administrator (human) error, or - against loss due to programming bugs (possibly due to bugs - in the RAID software itself). The net abounds with tragic - tales of system administrators who have bungled a RAID - installation, and have lost all of their data. RAID is not - a substitute for frequent, regularly scheduled backup. - - RAID can be implemented in hardware, in the form of special - disk controllers, or in software, as a kernel module that is - layered in between the low-level disk driver, and the file - system which sits above it. RAID hardware is always a "disk - controller", that is, a device to which one can cable up the - disk drives. Usually it comes in the form of an adapter card - that will plug into a ISA/EISA/PCI/S-Bus/MicroChannel slot. - However, some RAID controllers are in the form of a box that - connects into the cable in between the usual system disk - controller, and the disk drives. Small ones may fit into a - drive bay; large ones may be built into a storage cabinet - with its own drive bays and power supply. The latest RAID - hardware used with the latest & fastest CPU will usually - provide the best overall performance, although at a - significant price. This is because most RAID controllers - come with on-board DSP's and memory cache that can off-load - a considerable amount of processing from the main CPU, as - well as allow high transfer rates into the large controller - cache. Old RAID hardware can act as a "de-accelerator" when - used with newer CPU's: yesterday's fancy DSP and cache can - act as a bottleneck, and it's performance is often beaten by - pure-software RAID and new but otherwise plain, run-of-the- - mill disk controllers. RAID hardware can offer an advantage - over pure-software RAID, if it can makes use of disk-spindle - synchronization and its knowledge of the disk-platter - position with regard to the disk head, and the desired disk- - block. However, most modern (low-cost) disk drives do not - offer this information and level of control anyway, and - thus, most RAID hardware does not take advantage of it. - RAID hardware is usually not compatible across different - brands, makes and models: if a RAID controller fails, it - must be replaced by another controller of the same type. As - of this writing (June 1998), a broad variety of hardware - controllers will operate under Linux; however, none of them - currently come with configuration and management utilities - that run under Linux. - - Software-RAID is a set of kernel modules, together with - management utilities that implement RAID purely in software, - and require no extraordinary hardware. The Linux RAID - subsystem is implemented as a layer in the kernel that sits - above the low-level disk drivers (for IDE, SCSI and Paraport - drives), and the block-device interface. The filesystem, be - it ext2fs, DOS-FAT, or other, sits above the block-device - interface. Software-RAID, by its very software nature, - tends to be more flexible than a hardware solution. The - downside is that it of course requires more CPU cycles and - power to run well than a comparable hardware system. Of - course, the cost can't be beat. Software RAID has one - further important distinguishing feature: it operates on a - partition-by-partition basis, where a number of individual - disk partitions are ganged together to create a RAID - partition. This is in contrast to most hardware RAID - solutions, which gang together entire disk drives into an - array. With hardware, the fact that there is a RAID array - is transparent to the operating system, which tends to - simplify management. With software, there are far more - configuration options and choices, tending to complicate - matters. - - As of this writing (June 1998), the administration of RAID - under Linux is far from trivial, and is best attempted by - experienced system administrators. The theory of operation - is complex. The system tools require modification to - startup scripts. And recovery from disk failure is non- - trivial, and prone to human error. RAID is not for the - novice, and any benefits it may bring to reliability and - performance can be easily outweighed by the extra - complexity. Indeed, modern disk drives are incredibly - reliable and modern CPU's and controllers are quite - powerful. You might more easily obtain the desired - reliability and performance levels by purchasing higher- - quality and/or faster hardware. - - - - 2. Q: What are RAID levels? Why so many? What distinguishes them? - - A: The different RAID levels have different performance, - redundancy, storage capacity, reliability and cost charac- - teristics. Most, but not all levels of RAID offer redun- - dancy against disk failure. Of those that offer redundancy, - RAID-1 and RAID-5 are the most popular. RAID-1 offers bet- - ter performance, while RAID-5 provides for more efficient - use of the available storage space. However, tuning for - performance is an entirely different matter, as performance - depends strongly on a large variety of factors, from the - type of application, to the sizes of stripes, blocks, and - files. The more difficult aspects of performance tuning are - deferred to a later section of this HOWTO. - - The following describes the different RAID levels in the - context of the Linux software RAID implementation. - - - o RAID-linear is a simple concatenation of partitions to - create a larger virtual partition. It is handy if you - have a number small drives, and wish to create a single, - large partition. This concatenation offers no - redundancy, and in fact decreases the overall - reliability: if any one disk fails, the combined - partition will fail. - - - - o RAID-1 is also referred to as "mirroring". Two (or more) - partitions, all of the same size, each store an exact - copy of all data, disk-block by disk-block. Mirroring - gives strong protection against disk failure: if one disk - fails, there is another with the an exact copy of the - same data. Mirroring can also help improve performance in - I/O-laden systems, as read requests can be divided up - between several disks. Unfortunately, mirroring is also - the least efficient in terms of storage: two mirrored - partitions can store no more data than a single - partition. - - - - o Striping is the underlying concept behind all of the - other RAID levels. A stripe is a contiguous sequence of - disk blocks. A stripe may be as short as a single disk - block, or may consist of thousands. The RAID drivers - split up their component disk partitions into stripes; - the different RAID levels differ in how they organize the - stripes, and what data they put in them. The interplay - between the size of the stripes, the typical size of - files in the file system, and their location on the disk - is what determines the overall performance of the RAID - subsystem. - - - - o RAID-0 is much like RAID-linear, except that the - component partitions are divided into stripes and then - interleaved. Like RAID-linear, the result is a single - larger virtual partition. Also like RAID-linear, it - offers no redundancy, and therefore decreases overall - reliability: a single disk failure will knock out the - whole thing. RAID-0 is often claimed to improve - performance over the simpler RAID-linear. However, this - may or may not be true, depending on the characteristics - to the file system, the typical size of the file as - compared to the size of the stripe, and the type of - workload. The ext2fs file system already scatters files - throughout a partition, in an effort to minimize - fragmentation. Thus, at the simplest level, any given - access may go to one of several disks, and thus, the - interleaving of stripes across multiple disks offers no - apparent additional advantage. However, there are - performance differences, and they are data, workload, and - stripe-size dependent. - - - - o RAID-4 interleaves stripes like RAID-0, but it requires - an additional partition to store parity information. The - parity is used to offer redundancy: if any one of the - disks fail, the data on the remaining disks can be used - to reconstruct the data that was on the failed disk. - Given N data disks, and one parity disk, the parity - stripe is computed by taking one stripe from each of the - data disks, and XOR'ing them together. Thus, the storage - capacity of a an (N+1)-disk RAID-4 array is N, which is a - lot better than mirroring (N+1) drives, and is almost as - good as a RAID-0 setup for large N. Note that for N=1, - where there is one data drive, and one parity drive, - RAID-4 is a lot like mirroring, in that each of the two - disks is a copy of each other. However, RAID-4 does NOT - offer the read-performance of mirroring, and offers - considerably degraded write performance. In brief, this - is because updating the parity requires a read of the old - parity, before the new parity can be calculated and - written out. In an environment with lots of writes, the - parity disk can become a bottleneck, as each write must - access the parity disk. - - - - o RAID-5 avoids the write-bottleneck of RAID-4 by - alternately storing the parity stripe on each of the - drives. However, write performance is still not as good - as for mirroring, as the parity stripe must still be read - and XOR'ed before it is written. Read performance is - also not as good as it is for mirroring, as, after all, - there is only one copy of the data, not two or more. - RAID-5's principle advantage over mirroring is that it - offers redundancy and protection against single-drive - failure, while offering far more storage capacity when - used with three or more drives. - - - - o RAID-2 and RAID-3 are seldom used anymore, and to some - degree are have been made obsolete by modern disk - technology. RAID-2 is similar to RAID-4, but stores ECC - information instead of parity. Since all modern disk - drives incorporate ECC under the covers, this offers - little additional protection. RAID-2 can offer greater - data consistency if power is lost during a write; - however, battery backup and a clean shutdown can offer - the same benefits. RAID-3 is similar to RAID-4, except - that it uses the smallest possible stripe size. As a - result, any given read will involve all disks, making - overlapping I/O requests difficult/impossible. In order - to avoid delay due to rotational latency, RAID-3 requires - that all disk drive spindles be synchronized. Most modern - disk drives lack spindle-synchronization ability, or, if - capable of it, lack the needed connectors, cables, and - manufacturer documentation. Neither RAID-2 nor RAID-3 - are supported by the Linux Software-RAID drivers. - - - - o Other RAID levels have been defined by various - researchers and vendors. Many of these represent the - layering of one type of raid on top of another. Some - require special hardware, and others are protected by - patent. There is no commonly accepted naming scheme for - these other levels. Sometime the advantages of these - other systems are minor, or at least not apparent until - the system is highly stressed. Except for the layering - of RAID-1 over RAID-0/linear, Linux Software RAID does - not support any of the other variations. - - - - 3. Setup & Installation Considerations - - - 1. Q: What is the best way to configure Software RAID? - - - A: I keep rediscovering that file-system planning is one of - the more difficult Unix configuration tasks. To answer your - question, I can describe what we did. - - We planned the following setup: - - o two EIDE disks, 2.1.gig each. - - - disk partition mount pt. size device - 1 1 / 300M /dev/hda1 - 1 2 swap 64M /dev/hda2 - 1 3 /home 800M /dev/hda3 - 1 4 /var 900M /dev/hda4 - - 2 1 /root 300M /dev/hdc1 - 2 2 swap 64M /dev/hdc2 - 2 3 /home 800M /dev/hdc3 - 2 4 /var 900M /dev/hdc4 - - - - o Each disk is on a separate controller (& ribbon cable). - The theory is that a controller failure and/or ribbon - failure won't disable both disks. Also, we might - possibly get a performance boost from parallel operations - over two controllers/cables. - - o Install the Linux kernel on the root (/) partition - /dev/hda1. Mark this partition as bootable. - - o /dev/hdc1 will contain a ``cold'' copy of /dev/hda1. This - is NOT a raid copy, just a plain old copy-copy. It's - there just in case the first disk fails; we can use a - rescue disk, mark /dev/hdc1 as bootable, and use that to - keep going without having to reinstall the system. You - may even want to put /dev/hdc1's copy of the kernel into - LILO to simplify booting in case of failure. - - The theory here is that in case of severe failure, I can - still boot the system without worrying about raid - superblock-corruption or other raid failure modes & - gotchas that I don't understand. - - o /dev/hda3 and /dev/hdc3 will be mirrors /dev/md0. - - o /dev/hda4 and /dev/hdc4 will be mirrors /dev/md1. - - o we picked /var and /home to be mirrored, and in separate - partitions, using the following logic: - - o / (the root partition) will contain relatively static, - non-changing data: for all practical purposes, it will be - read-only without actually being marked & mounted read- - only. - - o /home will contain ''slowly'' changing data. - - o /var will contain rapidly changing data, including mail - spools, database contents and web server logs. - - The idea behind using multiple, distinct partitions is - that if, for some bizarre reason, whether it is human - error, power loss, or an operating system gone wild, - corruption is limited to one partition. In one typical - case, power is lost while the system is writing to disk. - This will almost certainly lead to a corrupted - filesystem, which will be repaired by fsck during the - next boot. Although fsck does it's best to make the - repairs without creating additional damage during those - repairs, it can be comforting to know that any such - damage has been limited to one partition. In another - typical case, the sysadmin makes a mistake during rescue - operations, leading to erased or destroyed data. - Partitions can help limit the repercussions of the - operator's errors. - - o Other reasonable choices for partitions might be /usr or - /opt. In fact, /opt and /home make great choices for - RAID-5 partitions, if we had more disks. A word of - caution: DO NOT put /usr in a RAID-5 partition. If a - serious fault occurs, you may find that you cannot mount - /usr, and that you want some of the tools on it (e.g. the - networking tools, or the compiler.) With RAID-1, if a - fault has occurred, and you can't get RAID to work, you - can at least mount one of the two mirrors. You can't do - this with any of the other RAID levels (RAID-5, striping, - or linear append). - - - - So, to complete the answer to the question: - - o install the OS on disk 1, partition 1. do NOT mount any - of the other partitions. - - o install RAID per instructions. - - o configure md0 and md1. - - o convince yourself that you know what to do in case of a - disk failure! Discover sysadmin mistakes now, and not - during an actual crisis. Experiment! (we turned off - power during disk activity -- this proved to be ugly but - informative). - - o do some ugly mount/copy/unmount/rename/reboot scheme to - move /var over to the /dev/md1. Done carefully, this is - not dangerous. - - o enjoy! - - - - 2. Q: What is the difference between the mdadd, mdrun, etc. commands, - and the raidadd, raidrun commands? - - A: The names of the tools have changed as of the 0.5 release - of the raidtools package. The md naming convention was used - in the 0.43 and older versions, while raid is used in 0.5 - and newer versions. - - - - 3. Q: I want to run RAID-linear/RAID-0 in the stock 2.0.34 kernel. I - don't want to apply the raid patches, since these are not needed - for RAID-0/linear. Where can I get the raid-tools to manage this? - - - A: This is a tough question, indeed, as the newest raid - tools package needs to have the RAID-1,4,5 kernel patches - installed in order to compile. I am not aware of any pre- - compiled, binary version of the raid tools that is available - at this time. However, experiments show that the raid-tools - binaries, when compiled against kernel 2.1.100, seem to work - just fine in creating a RAID-0/linear partition under - 2.0.34. A brave soul has asked for these, and I've tem- - porarily placed the binaries mdadd, mdcreate, etc. at - http://linas.org/linux/Software-RAID/ You must get the man - pages, etc. from the usual raid-tools package. - - - - 4. Q: Can I strip/mirror the root partition (/)? Why can't I boot - Linux directly from the md disks? - - - A: Both LILO and Loadlin need an non-stripped/mirrored par- - tition to read the kernel image from. If you want to - strip/mirror the root partition (/), then you'll want to - create an unstriped/mirrored partition to hold the ker- - nel(s). Typically, this partition is named /boot. Then you - either use the initial ramdisk support (initrd), or patches - from Harald Hoyer that allow a stripped - partition to be used as the root device. (These patches are - now a standard part of recent 2.1.x kernels) - - - There are several approaches that can be used. One approach - is documented in detail in the Bootable RAID mini-HOWTO: - . - - - Alternately, use mkinitrd to build the ramdisk image, see - below. - - - Edward Welbon writes: - - o ... all that is needed is a script to manage the boot - setup. To mount an md filesystem as root, the main thing - is to build an initial file system image that has the - needed modules and md tools to start md. I have a simple - script that does this. - - - o For boot media, I have a small cheap SCSI disk (170MB I - got it used for $20). This disk runs on a AHA1452, but - it could just as well be an inexpensive IDE disk on the - native IDE. The disk need not be very fast since it is - mainly for boot. - - - o This disk has a small file system which contains the - kernel and the file system image for initrd. The initial - file system image has just enough stuff to allow me to - load the raid SCSI device driver module and start the - raid partition that will become root. I then do an - - - echo 0x900 > /proc/sys/kernel/real-root-dev - - - - (0x900 is for /dev/md0) and exit linuxrc. The boot proceeds - normally from there. - - - o I have built most support as a module except for the - AHA1452 driver that brings in the initrd filesystem. So - I have a fairly small kernel. The method is perfectly - reliable, I have been doing this since before 2.1.26 and - have never had a problem that I could not easily recover - from. The file systems even survived several 2.1.4[45] - hard crashes with no real problems. - - - o At one time I had partitioned the raid disks so that the - initial cylinders of the first raid disk held the kernel - and the initial cylinders of the second raid disk hold - the initial file system image, instead I made the initial - cylinders of the raid disks swap since they are the - fastest cylinders (why waste them on boot?). - - - o The nice thing about having an inexpensive device - dedicated to boot is that it is easy to boot from and can - also serve as a rescue disk if necessary. If you are - interested, you can take a look at the script that builds - my initial ram disk image and then runs LILO. - - - - - It is current enough to show the picture. It isn't espe- - cially pretty and it could certainly build a much smaller - filesystem image for the initial ram disk. It would be easy - to a make it more efficient. But it uses LILO as is. If - you make any improvements, please forward a copy to me. 8-) - - - - 5. Q: I have heard that I can run mirroring over striping. Is this - true? Can I run mirroring over the loopback device? - - A: Yes, but not the reverse. That is, you can put a stripe - over several disks, and then build a mirror on top of this. - However, striping cannot be put on top of mirroring. - - - A brief technical explanation is that the linear and stripe - personalities use the ll_rw_blk routine for access. The - ll_rw_blk routine maps disk devices and sectors, not - blocks. Block devices can be layered one on top of the - other; but devices that do raw, low-level disk accesses, - such as ll_rw_blk, cannot. - - - Currently (November 1997) RAID cannot be run over the - loopback devices, although this should be fixed shortly. - - - - 6. Q: I have two small disks and three larger disks. Can I - concatenate the two smaller disks with RAID-0, and then create a - RAID-5 out of that and the larger disks? - - A: Currently (November 1997), for a RAID-5 array, no. Cur- - rently, one can do this only for a RAID-1 on top of the con- - catenated drives. - 7. Q: What is the difference between RAID-1 and RAID-5 for a two-disk - configuration (i.e. the difference between a RAID-1 array built - out of two disks, and a RAID-5 array built out of two disks)? - - - A: There is no difference in storage capacity. Nor can - disks be added to either array to increase capacity (see the - question below for details). - - - RAID-1 offers a performance advantage for reads: the RAID-1 - driver uses distributed-read technology to simultaneously - read two sectors, one from each drive, thus doubling read - performance. - - - The RAID-5 driver, although it contains many optimizations, - does not currently (September 1997) realize that the parity - disk is actually a mirrored copy of the data disk. Thus, it - serializes data reads. - - - - 8. Q: How can I guard against a two-disk failure? - - - A: Some of the RAID algorithms do guard against multiple - disk failures, but these are not currently implemented for - Linux. However, the Linux Software RAID can guard against - multiple disk failures by layering an array on top of an - array. For example, nine disks can be used to create three - raid-5 arrays. Then these three arrays can in turn be - hooked together into a single RAID-5 array on top. In fact, - this kind of a configuration will guard against a three-disk - failure. Note that a large amount of disk space is - ''wasted'' on the redundancy information. - - - - For an NxN raid-5 array, - N=3, 5 out of 9 disks are used for parity (=55%) - N=4, 7 out of 16 disks - N=5, 9 out of 25 disks - ... - N=9, 17 out of 81 disks (=~20%) - - - - In general, an MxN array will use M+N-1 disks for parity. - The least amount of space is "wasted" when M=N. - - - Another alternative is to create a RAID-1 array with three - disks. Note that since all three disks contain identical - data, that 2/3's of the space is ''wasted''. - - - - 9. Q: I'd like to understand how it'd be possible to have something - like fsck: if the partition hasn't been cleanly unmounted, fsck - runs and fixes the filesystem by itself more than 90% of the time. - Since the machine is capable of fixing it by itself with ckraid - --fix, why not make it automatic? - - - - A: This can be done by adding lines like the following to - /etc/rc.d/rc.sysinit: - - mdadd /dev/md0 /dev/hda1 /dev/hdc1 || { - ckraid --fix /etc/raid.usr.conf - mdadd /dev/md0 /dev/hda1 /dev/hdc1 - } - - - - or - - mdrun -p1 /dev/md0 - if [ $? -gt 0 ] ; then - ckraid --fix /etc/raid1.conf - mdrun -p1 /dev/md0 - fi - - - - Before presenting a more complete and reliable script, lets - review the theory of operation. - - Gadi Oxman writes: In an unclean shutdown, Linux might be in - one of the following states: - - o The in-memory disk cache was in sync with the RAID set - when the unclean shutdown occurred; no data was lost. - - o The in-memory disk cache was newer than the RAID set - contents when the crash occurred; this results in a - corrupted filesystem and potentially in data loss. - - This state can be further divided to the following two - states: - - - o Linux was writing data when the unclean shutdown - occurred. - - o Linux was not writing data when the crash occurred. - - - Suppose we were using a RAID-1 array. In (2a), it might - happen that before the crash, a small number of data - blocks were successfully written only to some of the - mirrors, so that on the next reboot, the mirrors will no - longer contain the same data. - - If we were to ignore the mirror differences, the - raidtools-0.36.3 read-balancing code might choose to read - the above data blocks from any of the mirrors, which will - result in inconsistent behavior (for example, the output - of e2fsck -n /dev/md0 can differ from run to run). - - - Since RAID doesn't protect against unclean shutdowns, - usually there isn't any ''obviously correct'' way to fix - the mirror differences and the filesystem corruption. - - For example, by default ckraid --fix will choose the - first operational mirror and update the other mirrors - with its contents. However, depending on the exact - timing at the crash, the data on another mirror might be - more recent, and we might want to use it as the source - mirror instead, or perhaps use another method for - recovery. - - The following script provides one of the more robust - boot-up sequences. In particular, it guards against - long, repeated ckraid's in the presence of uncooperative - disks, controllers, or controller device drivers. Modify - it to reflect your config, and copy it to rc.raid.init. - Then invoke rc.raid.init after the root partition has - been fsck'ed and mounted rw, but before the remaining - partitions are fsck'ed. Make sure the current directory - is in the search path. - - mdadd /dev/md0 /dev/hda1 /dev/hdc1 || { - rm -f /fastboot # force an fsck to occur - ckraid --fix /etc/raid.usr.conf - mdadd /dev/md0 /dev/hda1 /dev/hdc1 - } - # if a crash occurs later in the boot process, - # we at least want to leave this md in a clean state. - /sbin/mdstop /dev/md0 - - mdadd /dev/md1 /dev/hda2 /dev/hdc2 || { - rm -f /fastboot # force an fsck to occur - ckraid --fix /etc/raid.home.conf - mdadd /dev/md1 /dev/hda2 /dev/hdc2 - } - # if a crash occurs later in the boot process, - # we at least want to leave this md in a clean state. - /sbin/mdstop /dev/md1 - - mdadd /dev/md0 /dev/hda1 /dev/hdc1 - mdrun -p1 /dev/md0 - if [ $? -gt 0 ] ; then - rm -f /fastboot # force an fsck to occur - ckraid --fix /etc/raid.usr.conf - mdrun -p1 /dev/md0 - fi - # if a crash occurs later in the boot process, - # we at least want to leave this md in a clean state. - /sbin/mdstop /dev/md0 - - mdadd /dev/md1 /dev/hda2 /dev/hdc2 - mdrun -p1 /dev/md1 - if [ $? -gt 0 ] ; then - rm -f /fastboot # force an fsck to occur - ckraid --fix /etc/raid.home.conf - mdrun -p1 /dev/md1 - fi - # if a crash occurs later in the boot process, - # we at least want to leave this md in a clean state. - /sbin/mdstop /dev/md1 - - # OK, just blast through the md commands now. If there were - # errors, the above checks should have fixed things up. - /sbin/mdadd /dev/md0 /dev/hda1 /dev/hdc1 - /sbin/mdrun -p1 /dev/md0 - - /sbin/mdadd /dev/md12 /dev/hda2 /dev/hdc2 - /sbin/mdrun -p1 /dev/md1 - - - - In addition to the above, you'll want to create a - rc.raid.halt which should look like the following: - - /sbin/mdstop /dev/md0 - /sbin/mdstop /dev/md1 - - - - Be sure to modify both rc.sysinit and init.d/halt to include - this everywhere that filesystems get unmounted before a - halt/reboot. (Note that rc.sysinit unmounts and reboots if - fsck returned with an error.) - - - - 10. - Q: Can I set up one-half of a RAID-1 mirror with the one disk I - have now, and then later get the other disk and just drop it in? - - - A: With the current tools, no, not in any easy way. In par- - ticular, you cannot just copy the contents of one disk onto - another, and then pair them up. This is because the RAID - drivers use glob of space at the end of the partition to - store the superblock. This decreases the amount of space - available to the file system slightly; if you just naively - try to force a RAID-1 arrangement onto a partition with an - existing filesystem, the raid superblock will overwrite a - portion of the file system and mangle data. Since the - ext2fs filesystem scatters files randomly throughput the - partition (in order to avoid fragmentation), there is a very - good chance that some file will land at the very end of a - partition long before the disk is full. - - - If you are clever, I suppose you can calculate how much room - the RAID superblock will need, and make your filesystem - slightly smaller, leaving room for it when you add it later. - But then, if you are this clever, you should also be able to - modify the tools to do this automatically for you. (The - tools are not terribly complex). - - - Note:A careful reader has pointed out that the following - trick may work; I have not tried or verified this: Do the - mkraid with /dev/null as one of the devices. Then mdadd -r - with only the single, true disk (do not mdadd /dev/null). - The mkraid should have successfully built the raid array, - while the mdadd step just forces the system to run in - "degraded" mode, as if one of the disks had failed. - - - - 4. Error Recovery - - - 1. Q: I have a RAID-1 (mirroring) setup, and lost power while there - was disk activity. Now what do I do? - - - A: The redundancy of RAID levels is designed to protect - against a disk failure, not against a power failure. - - There are several ways to recover from this situation. - - o Method (1): Use the raid tools. These can be used to - sync the raid arrays. They do not fix file-system - damage; after the raid arrays are sync'ed, then the file- - system still has to be fixed with fsck. Raid arrays can - be checked with ckraid /etc/raid1.conf (for RAID-1, else, - /etc/raid5.conf, etc.) - - Calling ckraid /etc/raid1.conf --fix will pick one of the - disks in the array (usually the first), and use that as - the master copy, and copy its blocks to the others in the - mirror. - - To designate which of the disks should be used as the - master, you can use the --force-source flag: for example, - ckraid /etc/raid1.conf --fix --force-source /dev/hdc3 - - The ckraid command can be safely run without the --fix - option to verify the inactive RAID array without making - any changes. When you are comfortable with the proposed - changes, supply the --fix option. - - - o Method (2): Paranoid, time-consuming, not much better - than the first way. Lets assume a two-disk RAID-1 array, - consisting of partitions /dev/hda3 and /dev/hdc3. You - can try the following: - - a. fsck /dev/hda3 - - b. fsck /dev/hdc3 - - c. decide which of the two partitions had fewer errors, - or were more easily recovered, or recovered the data - that you wanted. Pick one, either one, to be your new - ``master'' copy. Say you picked /dev/hdc3. - - d. dd if=/dev/hdc3 of=/dev/hda3 - - e. mkraid raid1.conf -f --only-superblock - - - Instead of the last two steps, you can instead run ckraid - /etc/raid1.conf --fix --force-source /dev/hdc3 which - should be a bit faster. - - o Method (3): Lazy man's version of above. If you don't - want to wait for long fsck's to complete, it is perfectly - fine to skip the first three steps above, and move - directly to the last two steps. Just be sure to run fsck - /dev/md0 after you are done. Method (3) is actually just - method (1) in disguise. - - - In any case, the above steps will only sync up the raid - arrays. The file system probably needs fixing as well: - for this, fsck needs to be run on the active, unmounted - md device. - - - With a three-disk RAID-1 array, there are more - possibilities, such as using two disks to ''vote'' a - majority answer. Tools to automate this do not currently - (September 97) exist. - - - - 2. Q: I have a RAID-4 or a RAID-5 (parity) setup, and lost power while - there was disk activity. Now what do I do? - - - A: The redundancy of RAID levels is designed to protect - against a disk failure, not against a power failure. - - Since the disks in a RAID-4 or RAID-5 array do not contain a - file system that fsck can read, there are fewer repair - options. You cannot use fsck to do preliminary checking - and/or repair; you must use ckraid first. - - - The ckraid command can be safely run without the --fix - option to verify the inactive RAID array without making any - changes. When you are comfortable with the proposed - changes, supply the --fix option. - - - If you wish, you can try designating one of the disks as a - ''failed disk''. Do this with the --suggest-failed-disk- - mask flag. - - Only one bit should be set in the flag: RAID-5 cannot - recover two failed disks. The mask is a binary bit mask: - thus: - - 0x1 == first disk - 0x2 == second disk - 0x4 == third disk - 0x8 == fourth disk, etc. - - - - Alternately, you can choose to modify the parity sectors, by - using the --suggest-fix-parity flag. This will recompute - the parity from the other sectors. - - - The flags --suggest-failed-dsk-mask and --suggest-fix-parity - can be safely used for verification. No changes are made if - the --fix flag is not specified. Thus, you can experiment - with different possible repair schemes. - - - - 3. Q: My RAID-1 device, /dev/md0 consists of two hard drive - partitions: /dev/hda3 and /dev/hdc3. Recently, the disk with - /dev/hdc3 failed, and was replaced with a new disk. My best - friend, who doesn't understand RAID, said that the correct thing to - do now is to ''dd if=/dev/hda3 of=/dev/hdc3''. I tried this, but - things still don't work. - - - A: You should keep your best friend away from you computer. - Fortunately, no serious damage has been done. You can - recover from this by running: - - - mkraid raid1.conf -f --only-superblock - - - - By using dd, two identical copies of the partition were cre- - ated. This is almost correct, except that the RAID-1 kernel - extension expects the RAID superblocks to be different. - Thus, when you try to reactivate RAID, the software will - notice the problem, and deactivate one of the two parti- - tions. By re-creating the superblock, you should have a - fully usable system. - - - - 4. Q: My version of mkraid doesn't have a --only-superblock flag. - What do I do? - - A: The newer tools drop support for this flag, replacing it - with the --force-resync flag. It has been reported that the - following sequence appears to work with the latest tools and - software: - - - umount /web (where /dev/md0 was mounted on) - raidstop /dev/md0 - mkraid /dev/md0 --force-resync --really-force - raidstart /dev/md0 - - - - After doing this, a cat /proc/mdstat should report resync in - progress, and one should be able to mount /dev/md0 at this - point. - - - - 5. Q: My RAID-1 device, /dev/md0 consists of two hard drive - partitions: /dev/hda3 and /dev/hdc3. My best (girl?)friend, who - doesn't understand RAID, ran fsck on /dev/hda3 while I wasn't - looking, and now the RAID won't work. What should I do? - - - A: You should re-examine your concept of ``best friend''. - In general, fsck should never be run on the individual par- - titions that compose a RAID array. Assuming that neither of - the partitions are/were heavily damaged, no data loss has - occurred, and the RAID-1 device can be recovered as follows: - - a. make a backup of the file system on /dev/hda3 - - b. dd if=/dev/hda3 of=/dev/hdc3 - - c. mkraid raid1.conf -f --only-superblock - - This should leave you with a working disk mirror. - - - - 6. Q: Why does the above work as a recovery procedure? - - A: Because each of the component partitions in a RAID-1 mir- - ror is a perfectly valid copy of the file system. In a - pinch, mirroring can be disabled, and one of the partitions - can be mounted and safely run as an ordinary, non-RAID file - system. When you are ready to restart using RAID-1, then - unmount the partition, and follow the above instructions to - restore the mirror. Note that the above works ONLY for - RAID-1, and not for any of the other levels. - It may make you feel more comfortable to reverse the - direction of the copy above: copy from the disk that was - untouched to the one that was. Just be sure to fsck the - final md. - - - - 7. Q: I am confused by the above questions, but am not yet bailing - out. Is it safe to run fsck /dev/md0 ? - - - A: Yes, it is safe to run fsck on the md devices. In fact, - this is the only safe place to run fsck. - - - - 8. Q: If a disk is slowly failing, will it be obvious which one it is? - I am concerned that it won't be, and this confusion could lead to - some dangerous decisions by a sysadmin. - - - A: Once a disk fails, an error code will be returned from - the low level driver to the RAID driver. The RAID driver - will mark it as ``bad'' in the RAID superblocks of the - ``good'' disks (so we will later know which mirrors are good - and which aren't), and continue RAID operation on the - remaining operational mirrors. - - - This, of course, assumes that the disk and the low level - driver can detect a read/write error, and will not silently - corrupt data, for example. This is true of current drives - (error detection schemes are being used internally), and is - the basis of RAID operation. - - - - 9. Q: What about hot-repair? - - - A: Work is underway to complete ``hot reconstruction''. - With this feature, one can add several ``spare'' disks to - the RAID set (be it level 1 or 4/5), and once a disk fails, - it will be reconstructed on one of the spare disks in run - time, without ever needing to shut down the array. - - - However, to use this feature, the spare disk must have been - declared at boot time, or it must be hot-added, which - requires the use of special cabinets and connectors that - allow a disk to be added while the electrical power is on. - - - As of October 97, there is a beta version of MD that allows: - - o RAID 1 and 5 reconstruction on spare drives - - o RAID-5 parity reconstruction after an unclean shutdown - - o spare disk to be hot-added to an already running RAID 1 - or 4/5 array - - By default, automatic reconstruction is (Dec 97) - currently disabled by default, due to the preliminary - nature of this work. It can be enabled by changing the - value of SUPPORT_RECONSTRUCTION in include/linux/md.h. - If spare drives were configured into the array when it - was created and kernel-based reconstruction is enabled, - the spare drive will already contain a RAID superblock - (written by mkraid), and the kernel will reconstruct its - contents automatically (without needing the usual mdstop, - replace drive, ckraid, mdrun steps). - - - If you are not running automatic reconstruction, and have - not configured a hot-spare disk, the procedure described - by Gadi Oxman is recommended: - - o Currently, once the first disk is removed, the RAID set - will be running in degraded mode. To restore full - operation mode, you need to: - - o stop the array (mdstop /dev/md0) - - o replace the failed drive - - o run ckraid raid.conf to reconstruct its contents - - o run the array again (mdadd, mdrun). - - At this point, the array will be running with all the - drives, and again protects against a failure of a single - drive. - - Currently, it is not possible to assign single hot-spare - disk to several arrays. Each array requires it's own - hot-spare. - - - - 10. - Q: I would like to have an audible alarm for ``you schmuck, one - disk in the mirror is down'', so that the novice sysadmin knows - that there is a problem. - - - A: The kernel is logging the event with a ``KERN_ALERT'' - priority in syslog. There are several software packages - that will monitor the syslog files, and beep the PC speaker, - call a pager, send e-mail, etc. automatically. - - - - 11. - Q: How do I run RAID-5 in degraded mode (with one disk failed, and - not yet replaced)? - - - A: Gadi Oxman writes: Normally, to - run a RAID-5 set of n drives you have to: - - - mdadd /dev/md0 /dev/disk1 ... /dev/disk(n) - mdrun -p5 /dev/md0 - - - - Even if one of the disks has failed, you still have to mdadd - it as you would in a normal setup. (?? try using /dev/null - in place of the failed disk ??? watch out) Then, - The array will be active in degraded mode with (n - 1) - drives. If ``mdrun'' fails, the kernel has noticed an error - (for example, several faulty drives, or an unclean shut- - down). Use ``dmesg'' to display the kernel error messages - from ``mdrun''. If the raid-5 set is corrupted due to a - power loss, rather than a disk crash, one can try to recover - by creating a new RAID superblock: - - - mkraid -f --only-superblock raid5.conf - - - - A RAID array doesn't provide protection against a power - failure or a kernel crash, and can't guarantee correct - recovery. Rebuilding the superblock will simply cause the - system to ignore the condition by marking all the drives as - ``OK'', as if nothing happened. - - - - 12. - Q: How does RAID-5 work when a disk fails? - - - A: The typical operating scenario is as follows: - - o A RAID-5 array is active. - - o One drive fails while the array is active. - - o The drive firmware and the low-level Linux - disk/controller drivers detect the failure and report an - error code to the MD driver. - - o The MD driver continues to provide an error-free /dev/md0 - device to the higher levels of the kernel (with a - performance degradation) by using the remaining - operational drives. - - o The sysadmin can umount /dev/md0 and mdstop /dev/md0 as - usual. - - o If the failed drive is not replaced, the sysadmin can - still start the array in degraded mode as usual, by - running mdadd and mdrun. - - - - 13. - Q: - - A: - - - - 14. - Q: Why is there no question 13? - - - A: If you are concerned about RAID, High Availability, and - UPS, then its probably a good idea to be superstitious as - well. It can't hurt, can it? - - 15. - Q: I just replaced a failed disk in a RAID-5 array. After - rebuilding the array, fsck is reporting many, many errors. Is this - normal? - - - A: No. And, unless you ran fsck in "verify only; do not - update" mode, its quite possible that you have corrupted - your data. Unfortunately, a not-uncommon scenario is one of - accidentally changing the disk order in a RAID-5 array, - after replacing a hard drive. Although the RAID superblock - stores the proper order, not all tools use this information. - In particular, the current version of ckraid will use the - information specified with the -f flag (typically, the file - /etc/raid5.conf) instead of the data in the superblock. If - the specified order is incorrect, then the replaced disk - will be reconstructed incorrectly. The symptom of this - kind of mistake seems to be heavy & numerous fsck errors. - - - And, in case you are wondering, yes, someone lost all of - their data by making this mistake. Making a tape backup of - all data before reconfiguring a RAID array is strongly - recommended. - - - - 16. - Q: The QuickStart says that mdstop is just to make sure that the - disks are sync'ed. Is this REALLY necessary? Isn't unmounting the - file systems enough? - - - A: The command mdstop /dev/md0 will: - - o mark it ''clean''. This allows us to detect unclean - shutdowns, for example due to a power failure or a kernel - crash. - - o sync the array. This is less important after unmounting a - filesystem, but is important if the /dev/md0 is accessed - directly rather than through a filesystem (for example, - by e2fsck). - - - - 5. Troubleshooting Install Problems - - - 1. Q: What is the current best known-stable patch for RAID in the - 2.0.x series kernels? - - - A: As of 18 Sept 1997, it is "2.0.30 + pre-9 2.0.31 + Werner - Fink's swapping patch + the alpha RAID patch". As of Novem- - ber 1997, it is 2.0.31 + ... !? - - - - 2. Q: The RAID patches will not install cleanly for me. What's wrong? - - A: Make sure that /usr/include/linux is a symbolic link to - /usr/src/linux/include/linux. - - Make sure that the new files raid5.c, etc. have been copied - to their correct locations. Sometimes the patch command - will not create new files. Try the -f flag on patch. - - - - 3. Q: While compiling raidtools 0.42, compilation stops trying to - include but it doesn't exist in my system. How do I - fix this? - - - A: raidtools-0.42 requires linuxthreads-0.6 from: - - Alternately, use glibc v2.0. - - - - 4. Q: I get the message: mdrun -a /dev/md0: Invalid argument - - - A: Use mkraid to initialize the RAID set prior to the first - use. mkraid ensures that the RAID array is initially in a - consistent state by erasing the RAID partitions. In addi- - tion, mkraid will create the RAID superblocks. - - - - 5. Q: I get the message: mdrun -a /dev/md0: Invalid argument The setup - was: - - o raid build as a kernel module - - o normal install procedure followed ... mdcreate, mdadd, etc. - - o cat /proc/mdstat shows - - Personalities : - read_ahead not set - md0 : inactive sda1 sdb1 6313482 blocks - md1 : inactive - md2 : inactive - md3 : inactive - - - - o mdrun -a generates the error message /dev/md0: Invalid argument - - - - A: Try lsmod (or, alternately, cat /proc/modules) to see if - the raid modules are loaded. If they are not, you can load - them explicitly with the modprobe raid1 or modprobe raid5 - command. Alternately, if you are using the autoloader, and - expected kerneld to load them and it didn't this is probably - because your loader is missing the info to load the modules. - Edit /etc/conf.modules and add the following lines: - - - alias md-personality-3 raid1 - alias md-personality-4 raid5 - - - - 6. Q: While doing mdadd -a I get the error: /dev/md0: No such file or - directory. Indeed, there seems to be no /dev/md0 anywhere. Now - what do I do? - - - A: The raid-tools package will create these devices when you - run make install as root. Alternately, you can do the fol- - lowing: - - cd /dev - ./MAKEDEV md - - - - 7. Q: After creating a raid array on /dev/md0, I try to mount it and - get the following error: - mount: wrong fs type, bad option, bad superblock on /dev/md0, or - too many mounted file systems. What's wrong? - - A: You need to create a file system on /dev/md0 before you - can mount it. Use mke2fs. - - - - 8. Q: Truxton Fulton wrote: - - On my Linux 2.0.30 system, while doing a mkraid for a RAID-1 - device, during the clearing of the two individual parti- - tions, I got "Cannot allocate free page" errors appearing on - the console, and "Unable to handle kernel paging request at - virtual address ..." errors in the system log. At this - time, the system became quite unusable, but it appears to - recover after a while. The operation appears to have com- - pleted with no other errors, and I am successfully using my - RAID-1 device. The errors are disconcerting though. Any - ideas? - - - - A: This was a well-known bug in the 2.0.30 kernels. It is - fixed in the 2.0.31 kernel; alternately, fall back to - 2.0.29. - - - - 9. Q: I'm not able to mdrun a RAID-1, RAID-4 or RAID-5 device. If I - try to mdrun a mdadd'ed device I get the message ''invalid raid - superblock magic''. - - - A: Make sure that you've run the mkraid part of the install - procedure. - - - - 10. - Q: When I access /dev/md0, the kernel spits out a lot of errors - like md0: device not running, giving up ! and I/O error.... I've - successfully added my devices to the virtual device. - - A: To be usable, the device must be running. Use mdrun -px - /dev/md0 where x is l for linear, 0 for RAID-0 or 1 for - RAID-1, etc. - - - - 11. - Q: I've created a linear md-dev with 2 devices. cat /proc/mdstat - shows the total size of the device, but df only shows the size of - the first physical device. - - - A: You must mkfs your new md-dev before using it the first - time, so that the filesystem will cover the whole device. - - - - 12. - Q: I've set up /etc/mdtab using mdcreate, I've mdadd'ed, mdrun and - fsck'ed my two /dev/mdX partitions. Everything looks okay before a - reboot. As soon as I reboot, I get an fsck error on both - partitions: fsck.ext2: Attempt to read block from filesystem - resulted in short read while trying too open /dev/md0. Why?! How - do I fix it?! - - - A: During the boot process, the RAID partitions must be - started before they can be fsck'ed. This must be done in - one of the boot scripts. For some distributions, fsck is - called from /etc/rc.d/rc.S, for others, it is called from - /etc/rc.d/rc.sysinit. Change this file to mdadd -ar *before* - fsck -A is executed. Better yet, it is suggested that - ckraid be run if mdadd returns with an error. How do do - this is discussed in greater detail in question 14 of the - section ''Error Recovery''. - - - - 13. - Q: I get the message invalid raid superblock magic while trying to - run an array which consists of partitions which are bigger than - 4GB. - - - A: This bug is now fixed. (September 97) Make sure you have - the latest raid code. - - - - 14. - Q: I get the message Warning: could not write 8 blocks in inode - table starting at 2097175 while trying to run mke2fs on a partition - which is larger than 2GB. - - - A: This seems to be a problem with mke2fs (November 97). A - temporary work-around is to get the mke2fs code, and add - #undef HAVE_LLSEEK to e2fsprogs-1.10/lib/ext2fs/llseek.c - just before the first #ifdef HAVE_LLSEEK and recompile - mke2fs. - - - - 15. - Q: ckraid currently isn't able to read /etc/mdtab - - A: The RAID0/linear configuration file format used in - /etc/mdtab is obsolete, although it will be supported for a - while more. The current, up-to-date config files are cur- - rently named /etc/raid1.conf, etc. - - - - 16. - Q: The personality modules (raid1.o) are not loaded automatically; - they have to be manually modprobe'd before mdrun. How can this be - fixed? - - - A: To autoload the modules, we can add the following to - /etc/conf.modules: - - alias md-personality-3 raid1 - alias md-personality-4 raid5 - - - - 17. - Q: I've mdadd'ed 13 devices, and now I'm trying to mdrun -p5 - /dev/md0 and get the message: /dev/md0: Invalid argument - - - A: The default configuration for software RAID is 8 real - devices. Edit linux/md.h, change #define MAX_REAL=8 to a - larger number, and rebuild the kernel. - - - - 18. - Q: I can't make md work with partitions on our latest SPARCstation - 5. I suspect that this has something to do with disk-labels. - - - A: Sun disk-labels sit in the first 1K of a partition. For - RAID-1, the Sun disk-label is not an issue since ext2fs will - skip the label on every mirror. For other raid levels (0, - linear and 4/5), this appears to be a problem; it has not - yet (Dec 97) been addressed. - - - - 6. Supported Hardware & Software - - - 1. Q: I have SCSI adapter brand XYZ (with or without several - channels), and disk brand(s) PQR and LMN, will these work with md - to create a linear/stripped/mirrored personality? - - - A: Yes! Software RAID will work with any disk controller - (IDE or SCSI) and any disks. The disks do not have to be - identical, nor do the controllers. For example, a RAID mir- - ror can be created with one half the mirror being a SCSI - disk, and the other an IDE disk. The disks do not even have - to be the same size. There are no restrictions on the mix- - ing & matching of disks and controllers. - - - This is because Software RAID works with disk partitions, - not with the raw disks themselves. The only recommendation - is that for RAID levels 1 and 5, the disk partitions that - are used as part of the same set be the same size. If the - partitions used to make up the RAID 1 or 5 array are not the - same size, then the excess space in the larger partitions is - wasted (not used). - - - - 2. Q: I have a twin channel BT-952, and the box states that it - supports hardware RAID 0, 1 and 0+1. I have made a RAID set with - two drives, the card apparently recognizes them when it's doing - it's BIOS startup routine. I've been reading in the driver source - code, but found no reference to the hardware RAID support. Anybody - out there working on that? - - - A: The Mylex/BusLogic FlashPoint boards with RAIDPlus are - actually software RAID, not hardware RAID at all. RAIDPlus - is only supported on Windows 95 and Windows NT, not on Net- - ware or any of the Unix platforms. Aside from booting and - configuration, the RAID support is actually in the OS - drivers. - - - While in theory Linux support for RAIDPlus is possible, the - implementation of RAID-0/1/4/5 in the Linux kernel is much - more flexible and should have superior performance, so - there's little reason to support RAIDPlus directly. - - - - 3. Q: I want to run RAID with an SMP box. Is RAID SMP-safe? - - A: "I think so" is the best answer available at the time I - write this (April 98). A number of users report that they - have been using RAID with SMP for nearly a year, without - problems. However, as of April 98 (circa kernel 2.1.9x), - the following problems have been noted on the mailing list: - - o Adaptec AIC7xxx SCSI drivers are not SMP safe (General - note: Adaptec adapters have a long & lengthly history of - problems & flakiness in general. Although they seem to - be the most easily available, widespread and cheapest - SCSI adapters, they should be avoided. After factoring - for time lost, frustration, and corrupted data, Adaptec's - will prove to be the costliest mistake you'll ever make. - That said, if you have SMP problems with 2.1.88, try the - patch ftp://ftp.bero- - online.ml.org/pub/linux/aic7xxx-5.0.7-linux21.tar.gz I am - not sure if this patch has been pulled into later 2.1.x - kernels. For further info, take a look at the mail - archives for March 98 at - http://www.linuxhq.com/lnxlists/linux-raid/lr_9803_01/ As - usual, due to the rapidly changing nature of the latest - experimental 2.1.x kernels, the problems described in - these mailing lists may or may not have been fixed by the - time your read this. Caveat Emptor. ) - - - - o IO-APIC with RAID-0 on SMP has been reported to crash in - 2.1.90 - - - - 7. Modifying an Existing Installation - - - 1. Q: Are linear MD's expandable? Can a new hard-drive/partition be - added, and the size of the existing file system expanded? - - - A: Miguel de Icaza writes: - - I changed the ext2fs code to be aware of multiple-devices - instead of the regular one device per file system assump- - tion. - - - So, when you want to extend a file system, you run a utility - program that makes the appropriate changes on the new device - (your extra partition) and then you just tell the system to - extend the fs using the specified device. - - - You can extend a file system with new devices at system - operation time, no need to bring the system down (and - whenever I get some extra time, you will be able to remove - devices from the ext2 volume set, again without even having - to go to single-user mode or any hack like that). - - - You can get the patch for 2.1.x kernel from my web page: - - - - - - 2. Q: Can I add disks to a RAID-5 array? - - - A: Currently, (September 1997) no, not without erasing all - data. A conversion utility to allow this does not yet exist. - The problem is that the actual structure and layout of a - RAID-5 array depends on the number of disks in the array. - - Of course, one can add drives by backing up the array to - tape, deleting all data, creating a new array, and restoring - from tape. - - - - 3. Q: What would happen to my RAID1/RAID0 sets if I shift one of the - drives from being /dev/hdb to /dev/hdc? - - Because of cabling/case size/stupidity issues, I had to make my - RAID sets on the same IDE controller (/dev/hda and /dev/hdb). Now - that I've fixed some stuff, I want to move /dev/hdb to /dev/hdc. - - What would happen if I just change the /etc/mdtab and - /etc/raid1.conf files to reflect the new location? - - A: For RAID-0/linear, one must be careful to specify the - drives in exactly the same order. Thus, in the above exam- - ple, if the original config is - - - - mdadd /dev/md0 /dev/hda /dev/hdb - - - - Then the new config *must* be - - - mdadd /dev/md0 /dev/hda /dev/hdc - - - - For RAID-1/4/5, the drive's ''RAID number'' is stored in its - RAID superblock, and therefore the order in which the disks - are specified is not important. - - RAID-0/linear does not have a superblock due to it's older - design, and the desire to maintain backwards compatibility - with this older design. - - - - 4. Q: Can I convert a two-disk RAID-1 mirror to a three-disk RAID-5 - array? - - - A: Yes. Michael at BizSystems has come up with a clever, - sneaky way of doing this. However, like virtually all - manipulations of RAID arrays once they have data on them, it - is dangerous and prone to human error. Make a backup before - you start. - - - - I will make the following assumptions: - --------------------------------------------- - disks - original: hda - hdc - raid1 partitions hda3 - hdc3 - array name /dev/md0 - - new hda - hdc - hdd - raid5 partitions hda3 - hdc3 - hdd3 - array name: /dev/md1 - - You must substitute the appropriate disk and partition numbers for - you system configuration. This will hold true for all config file - examples. - -------------------------------------------- - DO A BACKUP BEFORE YOU DO ANYTHING - 1) recompile kernel to include both raid1 and raid5 - 2) install new kernel and verify that raid personalities are present - 3) disable the redundant partition on the raid 1 array. If this is a - root mounted partition (mine was) you must be more careful. - - Reboot the kernel without starting raid devices or boot from rescue - system ( raid tools must be available ) - - start non-redundant raid1 - mdadd -r -p1 /dev/md0 /dev/hda3 - - 4) configure raid5 but with 'funny' config file, note that there is - no hda3 entry and hdc3 is repeated. This is needed since the - raid tools don't want you to do this. - ------------------------------- - # raid-5 configuration - raiddev /dev/md1 - raid-level 5 - nr-raid-disks 3 - chunk-size 32 - - # Parity placement algorithm - parity-algorithm left-symmetric - - # Spare disks for hot reconstruction - nr-spare-disks 0 - - device /dev/hdc3 - raid-disk 0 - - device /dev/hdc3 - raid-disk 1 - - device /dev/hdd3 - raid-disk 2 - --------------------------------------- - mkraid /etc/raid5.conf - 5) activate the raid5 array in non-redundant mode - - mdadd -r -p5 -c32k /dev/md1 /dev/hdc3 /dev/hdd3 - - 6) make a file system on the array - - mke2fs -b {blocksize} /dev/md1 - - recommended blocksize by some is 4096 rather than the default 1024. - this improves the memory utilization for the kernel raid routines and - matches the blocksize to the page size. I compromised and used 2048 - since I have a relatively high number of small files on my system. - - 7) mount the two raid devices somewhere - - mount -t ext2 /dev/md0 mnt0 - mount -t ext2 /dev/md1 mnt1 - - 8) move the data - - cp -a mnt0 mnt1 - - 9) verify that the data sets are identical - 10) stop both arrays - 11) correct the information for the raid5.conf file - change /dev/md1 to /dev/md0 - change the first disk to read /dev/hda3 - - 12) upgrade the new array to full redundant status - (THIS DESTROYS REMAINING raid1 INFORMATION) - - ckraid --fix /etc/raid5.conf - - - - 8. Performance, Tools & General Bone-headed Questions - - - 1. Q: I've created a RAID-0 device on /dev/sda2 and /dev/sda3. The - device is a lot slower than a single partition. Isn't md a pile of - junk? - - A: To have a RAID-0 device running a full speed, you must - have partitions from different disks. Besides, putting the - two halves of the mirror on the same disk fails to give you - any protection whatsoever against disk failure. - - - - 2. Q: What's the use of having RAID-linear when RAID-0 will do the - same thing, but provide higher performance? - - A: It's not obvious that RAID-0 will always provide better - performance; in fact, in some cases, it could make things - worse. The ext2fs file system scatters files all over a - partition, and it attempts to keep all of the blocks of a - file contiguous, basically in an attempt to prevent fragmen- - tation. Thus, ext2fs behaves "as if" there were a (vari- - able-sized) stripe per file. If there are several disks - concatenated into a single RAID-linear, this will result - files being statistically distributed on each of the disks. - Thus, at least for ext2fs, RAID-linear will behave a lot - like RAID-0 with large stripe sizes. Conversely, RAID-0 - with small stripe sizes can cause excessive disk activity - leading to severely degraded performance if several large - files are accessed simultaneously. - - In many cases, RAID-0 can be an obvious win. For example, - imagine a large database file. Since ext2fs attempts to - cluster together all of the blocks of a file, chances are - good that it will end up on only one drive if RAID-linear is - used, but will get chopped into lots of stripes if RAID-0 is - used. Now imagine a number of (kernel) threads all trying - to random access to this database. Under RAID-linear, all - accesses would go to one disk, which would not be as - efficient as the parallel accesses that RAID-0 entails. - - - - 3. Q: How does RAID-0 handle a situation where the different stripe - partitions are different sizes? Are the stripes uniformly - distributed? - - - A: To understand this, lets look at an example with three - partitions; one that is 50MB, one 90MB and one 125MB. - - Lets call D0 the 50MB disk, D1 the 90MB disk and D2 the - 125MB disk. When you start the device, the driver calcu- - lates 'strip zones'. In this case, it finds 3 zones, - defined like this: - - - Z0 : (D0/D1/D2) 3 x 50 = 150MB total in this zone - Z1 : (D1/D2) 2 x 40 = 80MB total in this zone - Z2 : (D2) 125-50-40 = 35MB total in this zone. - - - - You can see that the total size of the zones is the size of - the virtual device, but, depending on the zone, the striping - is different. Z2 is rather inefficient, since there's only - one disk. - - Since ext2fs and most other Unix file systems distribute - files all over the disk, you have a 35/265 = 13% chance - that a fill will end up on Z2, and not get any of the bene- - fits of striping. - - (DOS tries to fill a disk from beginning to end, and thus, - the oldest files would end up on Z0. However, this strategy - leads to severe filesystem fragmentation, which is why no - one besides DOS does it this way.) - - - - 4. Q: I have some Brand X hard disks and a Brand Y controller. and am - considering using md. Does it significantly increase the - throughput? Is the performance really noticeable? - - - A: The answer depends on the configuration that you use. - - - Linux MD RAID-0 and RAID-linear performance: - If the system is heavily loaded with lots of I/O, - statistically, some of it will go to one disk, and - some to the others. Thus, performance will improve - over a single large disk. The actual improvement - depends a lot on the actual data, stripe sizes, and - other factors. In a system with low I/O usage, the - performance is equal to that of a single disk. - - - - Linux MD RAID-1 (mirroring) read performance: - MD implements read balancing. That is, the RAID-1 - code will alternate between each of the (two or more) - disks in the mirror, making alternate reads to each. - In a low-I/O situation, this won't change performance - at all: you will have to wait for one disk to complete - the read. But, with two disks in a high-I/O - environment, this could as much as double the read - performance, since reads can be issued to each of the - disks in parallel. For N disks in the mirror, this - could improve performance N-fold. - - - Linux MD RAID-1 (mirroring) write performance: - Must wait for the write to occur to all of the disks - in the mirror. This is because a copy of the data - must be written to each of the disks in the mirror. - Thus, performance will be roughly equal to the write - performance to a single disk. - - - Linux MD RAID-4/5 read performance: - Statistically, a given block can be on any one of a - number of disk drives, and thus RAID-4/5 read - performance is a lot like that for RAID-0. It will - depend on the data, the stripe size, and the - application. It will not be as good as the read - performance of a mirrored array. - - - Linux MD RAID-4/5 write performance: - This will in general be considerably slower than that - for a single disk. This is because the parity must be - written out to one drive as well as the data to - another. However, in order to compute the new parity, - the old parity and the old data must be read first. - The old data, new data and old parity must all be - XOR'ed together to determine the new parity: this - requires considerable CPU cycles in addition to the - numerous disk accesses. - - - - 5. Q: What RAID configuration should I use for optimal performance? - - A: Is the goal to maximize throughput, or to minimize - latency? There is no easy answer, as there are many factors - that affect performance: - - - o operating system - will one process/thread, or many be - performing disk access? - - o application - is it accessing data in a sequential - fashion, or random access? - - o file system - clusters files or spreads them out - (the ext2fs clusters together the blocks of a file, and - spreads out files) - - o disk driver - number of blocks to read ahead (this - is a tunable parameter) - - o CEC hardware - one drive controller, or many? - - o hd controller - able to queue multiple requests or - not? Does it provide a cache? - - o hard drive - buffer cache memory size -- is it big - enough to handle the write sizes and rate you want? - o physical platters - blocks per cylinder -- accessing - blocks on different cylinders will lead to seeks. - - - - 6. Q: What is the optimal RAID-5 configuration for performance? - - A: Since RAID-5 experiences an I/O load that is equally dis- - tributed across several drives, the best performance will be - obtained when the RAID set is balanced by using identical - drives, identical controllers, and the same (low) number of - drives on each controller. - - Note, however, that using identical components will raise - the probability of multiple simultaneous failures, for exam- - ple due to a sudden jolt or drop, overheating, or a power - surge during an electrical storm. Mixing brands and models - helps reduce this risk. - - - - 7. Q: What is the optimal block size for a RAID-4/5 array? - - - A: When using the current (November 1997) RAID-4/5 implemen- - tation, it is strongly recommended that the file system be - created with mke2fs -b 4096 instead of the default 1024 byte - filesystem block size. - - - This is because the current RAID-5 implementation allocates - one 4K memory page per disk block; if a disk block were just - 1K in size, then 75% of the memory which RAID-5 is - allocating for pending I/O would not be used. If the disk - block size matches the memory page size, then the driver can - (potentially) use all of the page. Thus, for a filesystem - with a 4096 block size as opposed to a 1024 byte block size, - the RAID driver will potentially queue 4 times as much - pending I/O to the low level drivers without allocating - additional memory. - - - Note: the above remarks do NOT apply to Software - RAID-0/1/linear driver. - - - Note: the statements about 4K memory page size apply to the - Intel x86 architecture. The page size on Alpha, Sparc, and - other CPUS are different; I believe they're 8K on - Alpha/Sparc (????). Adjust the above figures accordingly. - - - Note: if your file system has a lot of small files (files - less than 10KBytes in size), a considerable fraction of the - disk space might be wasted. This is because the file system - allocates disk space in multiples of the block size. - Allocating large blocks for small files clearly results in a - waste of disk space: thus, you may want to stick to small - block sizes, get a larger effective storage capacity, and - not worry about the "wasted" memory due to the block- - size/page-size mismatch. - - - Note: most ''typical'' systems do not have that many small - files. That is, although there might be thousands of small - files, this would lead to only some 10 to 100MB wasted - space, which is probably an acceptable tradeoff for - performance on a multi-gigabyte disk. - - However, for news servers, there might be tens or hundreds - of thousands of small files. In such cases, the smaller - block size, and thus the improved storage capacity, may be - more important than the more efficient I/O scheduling. - - - Note: there exists an experimental file system for Linux - which packs small files and file chunks onto a single block. - It apparently has some very positive performance - implications when the average file size is much smaller than - the block size. - - - Note: Future versions may implement schemes that obsolete - the above discussion. However, this is difficult to - implement, since dynamic run-time allocation can lead to - dead-locks; the current implementation performs a static - pre-allocation. - - - - 8. Q: How does the chunk size (stripe size) influence the speed of my - RAID-0, RAID-4 or RAID-5 device? - - - A: The chunk size is the amount of data contiguous on the - virtual device that is also contiguous on the physical - device. In this HOWTO, "chunk" and "stripe" refer to the - same thing: what is commonly called the "stripe" in other - RAID documentation is called the "chunk" in the MD man - pages. Stripes or chunks apply only to RAID 0, 4 and 5, - since stripes are not used in mirroring (RAID-1) and simple - concatenation (RAID-linear). The stripe size affects both - read and write latency (delay), throughput (bandwidth), and - contention between independent operations (ability to simul- - taneously service overlapping I/O requests). - - Assuming the use of the ext2fs file system, and the current - kernel policies about read-ahead, large stripe sizes are - almost always better than small stripe sizes, and stripe - sizes from about a fourth to a full disk cylinder in size - may be best. To understand this claim, let us consider the - effects of large stripes on small files, and small stripes - on large files. The stripe size does not affect the read - performance of small files: For an array of N drives, the - file has a 1/N probability of being entirely within one - stripe on any one of the drives. Thus, both the read - latency and bandwidth will be comparable to that of a single - drive. Assuming that the small files are statistically well - distributed around the filesystem, (and, with the ext2fs - file system, they should be), roughly N times more - overlapping, concurrent reads should be possible without - significant collision between them. Conversely, if very - small stripes are used, and a large file is read - sequentially, then a read will issued to all of the disks in - the array. For a the read of a single large file, the - latency will almost double, as the probability of a block - being 3/4'ths of a revolution or farther away will increase. - Note, however, the trade-off: the bandwidth could improve - almost N-fold for reading a single, large file, as N drives - can be reading simultaneously (that is, if read-ahead is - used so that all of the disks are kept active). But there - is another, counter-acting trade-off: if all of the drives - are already busy reading one file, then attempting to read a - second or third file at the same time will cause significant - contention, ruining performance as the disk ladder - algorithms lead to seeks all over the platter. Thus, large - stripes will almost always lead to the best performance. The - sole exception is the case where one is streaming a single, - large file at a time, and one requires the top possible - bandwidth, and one is also using a good read-ahead - algorithm, in which case small stripes are desired. - - - Note that this HOWTO previously recommended small stripe - sizes for news spools or other systems with lots of small - files. This was bad advice, and here's why: news spools - contain not only many small files, but also large summary - files, as well as large directories. If the summary file is - larger than the stripe size, reading it will cause many - disks to be accessed, slowing things down as each disk - performs a seek. Similarly, the current ext2fs file system - searches directories in a linear, sequential fashion. Thus, - to find a given file or inode, on average half of the - directory will be read. If this directory is spread across - several stripes (several disks), the directory read (e.g. - due to the ls command) could get very slow. Thanks to Steven - A. Reisman for this correction. Steve - also adds: - - I found that using a 256k stripe gives much better perfor- - mance. I suspect that the optimum size would be the size of - a disk cylinder (or maybe the size of the disk drive's sec- - tor cache). However, disks nowadays have recording zones - with different sector counts (and sector caches vary among - different disk models). There's no way to guarantee stripes - won't cross a cylinder boundary. - - - - The tools accept the stripe size specified in KBytes. You'll want to - specify a multiple of if the page size for your CPU (4KB on the x86). - - - - 9. Q: What is the correct stride factor to use when creating the - ext2fs file system on the RAID partition? By stride, I mean the -R - flag on the mke2fs command: - - mke2fs -b 4096 -R stride=nnn ... - - - - What should the value of nnn be? - - A: The -R stride flag is used to tell the file system about - the size of the RAID stripes. Since only RAID-0,4 and 5 use - stripes, and RAID-1 (mirroring) and RAID-linear do not, this - flag is applicable only for RAID-0,4,5. - - Knowledge of the size of a stripe allows mke2fs to allocate - the block and inode bitmaps so that they don't all end up on - the same physical drive. An unknown contributor wrote: - - I noticed last spring that one drive in a pair always had a - larger I/O count, and tracked it down to the these meta-data - blocks. Ted added the -R stride= option in response to my - explanation and request for a workaround. - - - For a 4KB block file system, with stripe size 256KB, one would use -R - stride=64. - - If you don't trust the -R flag, you can get a similar effect in a - different way. Steven A. Reisman writes: - - Another consideration is the filesystem used on the RAID-0 - device. The ext2 filesystem allocates 8192 blocks per - group. Each group has its own set of inodes. If there are - 2, 4 or 8 drives, these inodes cluster on the first disk. - I've distributed the inodes across all drives by telling - mke2fs to allocate only 7932 blocks per group. - - - Some mke2fs pages do not describe the [-g blocks-per-group] flag used - in this operation. - - - - 10. - Q: Where can I put the md commands in the startup scripts, so that - everything will start automatically at boot time? - - - A: Rod Wilkens writes: - - What I did is put ``mdadd -ar'' in the - ``/etc/rc.d/rc.sysinit'' right after the kernel loads the - modules, and before the ``fsck'' disk check. This way, you - can put the ``/dev/md?'' device in the ``/etc/fstab''. Then - I put the ``mdstop -a'' right after the ``umount -a'' - unmounting the disks, in the ``/etc/rc.d/init.d/halt'' file. - - - For raid-5, you will want to look at the return code for mdadd, and if - it failed, do a - - - ckraid --fix /etc/raid5.conf - - - - to repair any damage. - - - - 11. - Q: I was wondering if it's possible to setup striping with more - than 2 devices in md0? This is for a news server, and I have 9 - drives... Needless to say I need much more than two. Is this - possible? - - - A: Yes. (describe how to do this) - - - - 12. - Q: When is Software RAID superior to Hardware RAID? - - - A: Normally, Hardware RAID is considered superior to Soft- - ware RAID, because hardware controllers often have a large - cache, and can do a better job of scheduling operations in - parallel. However, integrated Software RAID can (and does) - gain certain advantages from being close to the operating - system. - - - For example, ... ummm. Opaque description of caching of - reconstructed blocks in buffer cache elided ... - - - On a dual PPro SMP system, it has been reported that - Software-RAID performance exceeds the performance of a well- - known hardware-RAID board vendor by a factor of 2 to 5. - - - Software RAID is also a very interesting option for high- - availability redundant server systems. In such a - configuration, two CPU's are attached to one set or SCSI - disks. If one server crashes or fails to respond, then the - other server can mdadd, mdrun and mount the software RAID - array, and take over operations. This sort of dual-ended - operation is not always possible with many hardware RAID - controllers, because of the state configuration that the - hardware controllers maintain. - - - - 13. - Q: If I upgrade my version of raidtools, will it have trouble - manipulating older raid arrays? In short, should I recreate my - RAID arrays when upgrading the raid utilities? - - - A: No, not unless the major version number changes. An MD - version x.y.z consists of three sub-versions: - - x: Major version. - y: Minor version. - z: Patchlevel version. - - - - Version x1.y1.z1 of the RAID driver supports a RAID array - with version x2.y2.z2 in case (x1 == x2) and (y1 >= y2). - - Different patchlevel (z) versions for the same (x.y) version - are designed to be mostly compatible. - - - The minor version number is increased whenever the RAID - array layout is changed in a way which is incompatible with - older versions of the driver. New versions of the driver - will maintain compatibility with older RAID arrays. - - The major version number will be increased if it will no - longer make sense to support old RAID arrays in the new - kernel code. - - - For RAID-1, it's not likely that the disk layout nor the - superblock structure will change anytime soon. Most all Any - optimization and new features (reconstruction, multithreaded - tools, hot-plug, etc.) doesn't affect the physical layout. - 14. - Q: The command mdstop /dev/md0 says that the device is busy. - - - A: There's a process that has a file open on /dev/md0, or - /dev/md0 is still mounted. Terminate the process or umount - /dev/md0. - - - - 15. - Q: Are there performance tools? - - A: There is also a new utility called iotrace in the - linux/iotrace directory. It reads /proc/io-trace and analy- - ses/plots it's output. If you feel your system's block IO - performance is too low, just look at the iotrace output. - - - - 16. - Q: I was reading the RAID source, and saw the value SPEED_LIMIT - defined as 1024K/sec. What does this mean? Does this limit - performance? - - - A: SPEED_LIMIT is used to limit RAID reconstruction speed - during automatic reconstruction. Basically, automatic - reconstruction allows you to e2fsck and mount immediately - after an unclean shutdown, without first running ckraid. - Automatic reconstruction is also used after a failed hard - drive has been replaced. - - - In order to avoid overwhelming the system while - reconstruction is occurring, the reconstruction thread - monitors the reconstruction speed and slows it down if its - too fast. The 1M/sec limit was arbitrarily chosen as a - reasonable rate which allows the reconstruction to finish - reasonably rapidly, while creating only a light load on the - system so that other processes are not interfered with. - - - - 17. - Q: What about ''spindle synchronization'' or ''disk - synchronization''? - - A: Spindle synchronization is used to keep multiple hard - drives spinning at exactly the same speed, so that their - disk platters are always perfectly aligned. This is used by - some hardware controllers to better organize disk writes. - However, for software RAID, this information is not used, - and spindle synchronization might even hurt performance. - - - - 18. - Q: How can I set up swap spaces using raid 0? Wouldn't striped - swap ares over 4+ drives be really fast? - - A: Leonard N. Zubkoff replies: It is really fast, but you - don't need to use MD to get striped swap. The kernel auto- - matically stripes across equal priority swap spaces. For - example, the following entries from /etc/fstab stripe swap - space across five drives in three groups: - /dev/sdg1 swap swap pri=3 - /dev/sdk1 swap swap pri=3 - /dev/sdd1 swap swap pri=3 - /dev/sdh1 swap swap pri=3 - /dev/sdl1 swap swap pri=3 - /dev/sdg2 swap swap pri=2 - /dev/sdk2 swap swap pri=2 - /dev/sdd2 swap swap pri=2 - /dev/sdh2 swap swap pri=2 - /dev/sdl2 swap swap pri=2 - /dev/sdg3 swap swap pri=1 - /dev/sdk3 swap swap pri=1 - /dev/sdd3 swap swap pri=1 - /dev/sdh3 swap swap pri=1 - /dev/sdl3 swap swap pri=1 - - - - 19. - Q: I want to maximize performance. Should I use multiple - controllers? - - A: In many cases, the answer is yes. Using several con- - trollers to perform disk access in parallel will improve - performance. However, the actual improvement depends on - your actual configuration. For example, it has been - reported (Vaughan Pratt, January 98) that a single 4.3GB - Cheetah attached to an Adaptec 2940UW can achieve a rate of - 14MB/sec (without using RAID). Installing two disks on one - controller, and using a RAID-0 configuration results in a - measured performance of 27 MB/sec. - - - Note that the 2940UW controller is an "Ultra-Wide" SCSI - controller, capable of a theoretical burst rate of 40MB/sec, - and so the above measurements are not surprising. However, - a slower controller attached to two fast disks would be the - bottleneck. Note also, that most out-board SCSI enclosures - (e.g. the kind with hot-pluggable trays) cannot be run at - the 40MB/sec rate, due to cabling and electrical noise - problems. - - - If you are designing a multiple controller system, remember - that most disks and controllers typically run at 70-85% of - their rated max speeds. - - - Note also that using one controller per disk can reduce the - likelihood of system outage due to a controller or cable - failure (In theory -- only if the device driver for the - controller can gracefully handle a broken controller. Not - all SCSI device drivers seem to be able to handle such a - situation without panicking or otherwise locking up). - - - 9. High Availability RAID - - - 1. Q: RAID can help protect me against data loss. But how can I also - ensure that the system is up as long as possible, and not prone to - breakdown? Ideally, I want a system that is up 24 hours a day, 7 - days a week, 365 days a year. - - A: High-Availability is difficult and expensive. The harder - you try to make a system be fault tolerant, the harder and - more expensive it gets. The following hints, tips, ideas - and unsubstantiated rumors may help you with this quest. - - o IDE disks can fail in such a way that the failed disk on - an IDE ribbon can also prevent the good disk on the same - ribbon from responding, thus making it look as if two - disks have failed. Since RAID does not protect against - two-disk failures, one should either put only one disk on - an IDE cable, or if there are two disks, they should - belong to different RAID sets. - - o SCSI disks can fail in such a way that the failed disk on - a SCSI chain can prevent any device on the chain from - being accessed. The failure mode involves a short of the - common (shared) device ready pin; since this pin is - shared, no arbitration can occur until the short is - removed. Thus, no two disks on the same SCSI chain - should belong to the same RAID array. - - o Similar remarks apply to the disk controllers. Don't - load up the channels on one controller; use multiple - controllers. - - o Don't use the same brand or model number for all of the - disks. It is not uncommon for severe electrical storms - to take out two or more disks. (Yes, we all use surge - suppressors, but these are not perfect either). Heat & - poor ventilation of the disk enclosure are other disk - killers. Cheap disks often run hot. Using different - brands of disk & controller decreases the likelihood that - whatever took out one disk (heat, physical shock, - vibration, electrical surge) will also damage the others - on the same date. - - o To guard against controller or CPU failure, it should be - possible to build a SCSI disk enclosure that is "twin- - tailed": i.e. is connected to two computers. One - computer will mount the file-systems read-write, while - the second computer will mount them read-only, and act as - a hot spare. When the hot-spare is able to determine - that the master has failed (e.g. through a watchdog), it - will cut the power to the master (to make sure that it's - really off), and then fsck & remount read-write. If - anyone gets this working, let me know. - - o Always use an UPS, and perform clean shutdowns. Although - an unclean shutdown may not damage the disks, running - ckraid on even small-ish arrays is painfully slow. You - want to avoid running ckraid as much as possible. Or you - can hack on the kernel and get the hot-reconstruction - code debugged ... - - o SCSI cables are well-known to be very temperamental - creatures, and prone to cause all sorts of problems. Use - the highest quality cabling that you can find for sale. - Use e.g. bubble-wrap to make sure that ribbon cables to - not get too close to one another and cross-talk. - Rigorously observe cable-length restrictions. - - o Take a look at SSI (Serial Storage Architecture). - Although it is rather expensive, it is rumored to be less - prone to the failure modes that SCSI exhibits. - - - o Enjoy yourself, its later than you think. - - - 10. Questions Waiting for Answers - - - 1. Q: If, for cost reasons, I try to mirror a slow disk with a fast - disk, is the S/W smart enough to balance the reads accordingly or - will it all slow down to the speed of the slowest? - - - 2. Q: For testing the raw disk thru put... is there a character - device for raw read/raw writes instead of /dev/sdaxx that we can - use to measure performance on the raid drives?? is there a GUI - based tool to use to watch the disk thru-put?? - - - 11. Wish List of Enhancements to MD and Related Software - - Bradley Ward Allen wrote: - - Ideas include: - - o Boot-up parameters to tell the kernel which devices are - to be MD devices (no more ``mdadd'') - - o Making MD transparent to ``mount''/``umount'' such that - there is no ``mdrun'' and ``mdstop'' - - o Integrating ``ckraid'' entirely into the kernel, and - letting it run as needed - - (So far, all I've done is suggest getting rid of the - tools and putting them into the kernel; that's how I feel - about it, this is a filesystem, not a toy.) - - o Deal with arrays that can easily survive N disks going - out simultaneously or at separate moments, where N is a - whole number > 0 settable by the administrator - - o Handle kernel freezes, power outages, and other abrupt - shutdowns better - - o Don't disable a whole disk if only parts of it have - failed, e.g., if the sector errors are confined to less - than 50% of access over the attempts of 20 dissimilar - requests, then it continues just ignoring those sectors - of that particular disk. - - o Bad sectors: - - o A mechanism for saving which sectors are bad, someplace - onto the disk. - - o If there is a generalized mechanism for marking degraded - bad blocks that upper filesystem levels can recognize, - use that. Program it if not. - - o Perhaps alternatively a mechanism for telling the upper - layer that the size of the disk got smaller, even - arranging for the upper layer to move out stuff from the - areas being eliminated. This would help with a degraded - blocks as well. - - o Failing the above ideas, keeping a small (admin settable) - amount of space aside for bad blocks (distributed evenly - across disk?), and using them (nearby if possible) - instead of the bad blocks when it does happen. Of - course, this is inefficient. Furthermore, the kernel - ought to log every time the RAID array starts each bad - sector and what is being done about it with a ``crit'' - level warning, just to get the administrator to realize - that his disk has a piece of dust burrowing into it (or a - head with platter sickness). - - o Software-switchable disks: - - ``disable this disk'' - would block until kernel has completed making sure - there is no data on the disk being shut down that is - needed (e.g., to complete an XOR/ECC/other error - correction), then release the disk from use (so it - could be removed, etc.); - - ``enable this disk'' - would mkraid a new disk if appropriate and then start - using it for ECC/whatever operations, enlarging the - RAID5 array as it goes; - - ``resize array'' - would respecify the total number of disks and the - number of redundant disks, and the result would often - be to resize the size of the array; where no data loss - would result, doing this as needed would be nice, but - I have a hard time figuring out how it would do that; - in any case, a mode where it would block (for possibly - hours (kernel ought to log something every ten seconds - if so)) would be necessary; - - ``enable this disk while saving data'' - which would save the data on a disk as-is and move it - to the RAID5 system as needed, so that a horrific save - and restore would not have to happen every time - someone brings up a RAID5 system (instead, it may be - simpler to only save one partition instead of two, it - might fit onto the first as a gzip'd file even); - finally, - - ``re-enable disk'' - would be an operator's hint to the OS to try out a - previously failed disk (it would simply call disable - then enable, I suppose). - - - Other ideas off the net: - - - o finalrd analog to initrd, to simplify root raid. - - o a read-only raid mode, to simplify the above - - o Mark the RAID set as clean whenever there are no "half - writes" done. -- That is, whenever there are no write - transactions that were committed on one disk but still - unfinished on another disk. - - Add a "write inactivity" timeout (to avoid frequent seeks - to the RAID superblock when the RAID set is relatively - busy). - - The Software-RAID HOWTO - Jakob Østergaard jakob@unthought.net and Emilio Bueso - bueso@vives.org - v1.1, 2004-06-03 - - This HOWTO describes how to use Software RAID under Linux. It - addresses a specific version of the Software RAID layer, namely the - 0.90 RAID layer made by Ingo Molnar and others. This is the RAID layer - that is the standard in Linux-2.4, and it is the version that is also - used by Linux-2.2 kernels shipped from some vendors. The 0.90 RAID - support is available as patches to Linux-2.0 and Linux-2.2, and is by - many considered far more stable that the older RAID support already in - those kernels. - ______________________________________________________________________ - - Table of Contents - - - - 1. Introduction - - 1.1 Disclaimer - 1.2 What is RAID? - 1.3 Terms - 1.4 The RAID levels - 1.5 Requirements - - 2. Why RAID? - - 2.1 Device and filesystem support - 2.2 Performance - 2.3 Swapping on RAID - 2.4 Why mdadm? - - 3. Devices - - 3.1 Spare disks - 3.2 Faulty disks - - 4. Hardware issues - - 4.1 IDE Configuration - 4.2 Hot Swap - 4.2.1 Hot-swapping IDE drives - 4.2.2 Hot-swapping SCSI drives - 4.2.3 Hot-swapping with SCA - - 5. RAID setup - - 5.1 General setup - 5.2 Downloading and installing the RAID tools - 5.3 Downloading and installing mdadm - 5.4 Linear mode - 5.5 RAID-0 - 5.6 RAID-1 - 5.7 RAID-4 - 5.8 RAID-5 - 5.9 The Persistent Superblock - 5.10 Chunk sizes - 5.10.1 RAID-0 - 5.10.2 RAID-0 with ext2 - 5.10.3 RAID-1 - 5.10.4 RAID-4 - 5.10.5 RAID-5 - 5.11 Options for mke2fs - - 6. Detecting, querying and testing - - 6.1 Detecting a drive failure - 6.2 Querying the arrays status - 6.3 Simulating a drive failure - 6.3.1 Force-fail by hardware - 6.3.2 Force-fail by software - 6.4 Simulating data corruption - 6.5 Monitoring RAID arrays - - 7. Tweaking, tuning and troubleshooting - - 7.1 raid-level and raidtab - 7.2 Autodetection - 7.3 Booting on RAID - 7.4 Root filesystem on RAID - 7.4.1 Method 1 - 7.4.2 Method 2 - 7.5 Making the system boot on RAID - 7.5.1 Booting with RAID as module - 7.5.2 Modular RAID on Debian GNU/Linux after move to RAID - 7.6 Converting a non-RAID RedHat System to run on Software RAID - 7.6.1 Introduction - 7.6.2 Scope - 7.6.3 Pre-conversion example system - 7.6.4 Step-1 - boot rescue cd/floppy - 7.6.5 Step-2 - create a /etc/raidtab file - 7.6.6 Step-3 - create the md devices - 7.6.7 Step-4 - unmount filesystems - 7.6.8 Step-5 - start raid devices - 7.6.9 Step-6 - remount filesystems - 7.6.10 Step-7 - change root - 7.6.11 Step-8 - edit config files - 7.6.12 Step-9 - run LILO - 7.6.13 Step-10 - change partition types - 7.6.14 Step-11 - resize filesystem - 7.6.15 Step-12 - checklist - 7.6.16 Step-13 - reboot - 7.7 Sharing spare disks between different arrays - 7.8 Pitfalls - - 8. Reconstruction - - 8.1 Recovery from a multiple disk failure - - 9. Performance - - 9.1 RAID-0 - 9.2 RAID-0 with TCQ - 9.3 RAID-5 - 9.4 RAID-10 - 9.5 Fresh benchmarking tools - - 10. Related tools - - 10.1 RAID resizing and conversion - 10.2 Backup - - 11. Partitioning RAID / LVM on RAID - - 11.1 Partitioning RAID devices - 11.2 LVM on RAID - - 12. Credits - - 13. Changelog - - 13.1 Version 1.1 - - - ______________________________________________________________________ - - 1. Introduction - - This HOWTO describes the "new-style" RAID present in the 2.4 and 2.6 - kernel series only. It does not describe the "old-style" RAID - functionality present in 2.0 and 2.2 kernels. - - The home site for this HOWTO is http://unthought.net/Software- - RAID.HOWTO/, where updated versions appear first. The howto was - originally written by Jakob Østergaard based on a large number of - emails between the author and Ingo Molnar (mingo@chiara.csoma.elte.hu) - -- one of the RAID developers --, the linux-raid mailing list (linux- - raid@vger.kernel.org) and various other people. Emilio Bueso - (bueso@vives.org) co-wrote the 1.0 version. - If you want to use the new-style RAID with 2.0 or 2.2 kernels, you - should get a patch for your kernel, from - http://people.redhat.com/mingo/ The standard 2.2 kernels does not have - direct support for the new-style RAID described in this HOWTO. - Therefore these patches are needed. The old-style RAID support in - standard 2.0 and 2.2 kernels is buggy and lacks several important - features present in the new-style RAID software. - - Some of the information in this HOWTO may seem trivial, if you know - RAID all ready. Just skip those parts. - - - - 1.1. Disclaimer - - The mandatory disclaimer: - - All information herein is presented "as-is", with no warranties - expressed nor implied. If you lose all your data, your job, get hit - by a truck, whatever, it's not my fault, nor the developers'. Be - aware, that you use the RAID software and this information at your own - risk! There is no guarantee whatsoever, that any of the software, or - this information, is in any way correct, nor suited for any use - whatsoever. Back up all your data before experimenting with this. - Better safe than sorry. - - - - 1.2. What is RAID? - - In 1987, the University of California Berkeley, published an article - entitled A Case for Redundant Arrays of Inexpensive Disks (RAID). - This article described various types of disk arrays, referred to by - the acronym RAID. The basic idea of RAID was to combine multiple - small, independent disk drives into an array of disk drives which - yields performance exceeding that of a Single Large Expensive Drive - (SLED). Additionally, this array of drives appears to the computer as - a single logical storage unit or drive. - - The Mean Time Between Failure (MTBF) of the array will be equal to the - MTBF of an individual drive, divided by the number of drives in the - array. Because of this, the MTBF of an array of drives would be too - low for many application requirements. However, disk arrays can be - made fault-tolerant by redundantly storing information in various - ways. - - Five types of array architectures, RAID-1 through RAID-5, were defined - by the Berkeley paper, each providing disk fault-tolerance and each - offering different trade-offs in features and performance. In addition - to these five redundant array architectures, it has become popular to - refer to a non-redundant array of disk drives as a RAID-0 array. - - Today some of the original RAID levels (namely level 2 and 3) are only - used in very specialized systems (and in fact not even supported by - the Linux Software RAID drivers). Another level, "linear" has emerged, - and especially RAID level 0 is often combined with RAID level 1. - - - 1.3. Terms - - In this HOWTO the word "RAID" means "Linux Software RAID". This HOWTO - does not treat any aspects of Hardware RAID. Furthermore, it does not - treat any aspects of Software RAID in other operating system kernels. - - When describing RAID setups, it is useful to refer to the number of - disks and their sizes. At all times the letter N is used to denote the - number of active disks in the array (not counting spare-disks). The - letter S is the size of the smallest drive in the array, unless - otherwise mentioned. The letter P is used as the performance of one - disk in the array, in MB/s. When used, we assume that the disks are - equally fast, which may not always be true in real-world scenarios. - - Note that the words "device" and "disk" are supposed to mean about the - same thing. Usually the devices that are used to build a RAID device - are partitions on disks, not necessarily entire disks. But combining - several partitions on one disk usually does not make sense, so the - words devices and disks just mean "partitions on different disks". - - - - 1.4. The RAID levels - - Here's a short description of what is supported in the Linux RAID - drivers. Some of this information is absolutely basic RAID info, but - I've added a few notices about what's special in the Linux - implementation of the levels. You can safely skip this section if you - know RAID already. - - The current RAID drivers in Linux supports the following levels: - - · Linear mode - - · Two or more disks are combined into one physical device. The disks - are "appended" to each other, so writing linearly to the RAID - device will fill up disk 0 first, then disk 1 and so on. The disks - does not have to be of the same size. In fact, size doesn't matter - at all here :) - - · There is no redundancy in this level. If one disk crashes you will - most probably lose all your data. You can however be lucky to - recover some data, since the filesystem will just be missing one - large consecutive chunk of data. - - · The read and write performance will not increase for single - reads/writes. But if several users use the device, you may be lucky - that one user effectively is using the first disk, and the other - user is accessing files which happen to reside on the second disk. - If that happens, you will see a performance gain. - - · RAID-0 - - · Also called "stripe" mode. The devices should (but need not) have - the same size. Operations on the array will be split on the - devices; for example, a large write could be split up as 4 kB to - disk 0, 4 kB to disk 1, 4 kB to disk 2, then 4 kB to disk 0 again, - and so on. If one device is much larger than the other devices, - that extra space is still utilized in the RAID device, but you will - be accessing this larger disk alone, during writes in the high end - of your RAID device. This of course hurts performance. - - · Like linear, there is no redundancy in this level either. Unlike - linear mode, you will not be able to rescue any data if a drive - fails. If you remove a drive from a RAID-0 set, the RAID device - will not just miss one consecutive block of data, it will be filled - with small holes all over the device. e2fsck or other filesystem - recovery tools will probably not be able to recover much from such - a device. - - · The read and write performance will increase, because reads and - writes are done in parallel on the devices. This is usually the - main reason for running RAID-0. If the busses to the disks are fast - enough, you can get very close to N*P MB/sec. - · RAID-1 - - · This is the first mode which actually has redundancy. RAID-1 can be - used on two or more disks with zero or more spare-disks. This mode - maintains an exact mirror of the information on one disk on the - other disk(s). Of Course, the disks must be of equal size. If one - disk is larger than another, your RAID device will be the size of - the smallest disk. - - · If up to N-1 disks are removed (or crashes), all data are still - intact. If there are spare disks available, and if the system (eg. - SCSI drivers or IDE chipset etc.) survived the crash, - reconstruction of the mirror will immediately begin on one of the - spare disks, after detection of the drive fault. - - · Write performance is often worse than on a single device, because - identical copies of the data written must be sent to every disk in - the array. With large RAID-1 arrays this can be a real problem, as - you may saturate the PCI bus with these extra copies. This is in - fact one of the very few places where Hardware RAID solutions can - have an edge over Software solutions - if you use a hardware RAID - card, the extra write copies of the data will not have to go over - the PCI bus, since it is the RAID controller that will generate the - extra copy. Read performance is good, especially if you have - multiple readers or seek-intensive workloads. The RAID code employs - a rather good read-balancing algorithm, that will simply let the - disk whose heads are closest to the wanted disk position perform - the read operation. Since seek operations are relatively expensive - on modern disks (a seek time of 6 ms equals a read of 123 kB at 20 - MB/sec), picking the disk that will have the shortest seek time - does actually give a noticeable performance improvement. - - · RAID-4 - - · This RAID level is not used very often. It can be used on three or - more disks. Instead of completely mirroring the information, it - keeps parity information on one drive, and writes data to the other - disks in a RAID-0 like way. Because one disk is reserved for - parity information, the size of the array will be (N-1)*S, where S - is the size of the smallest drive in the array. As in RAID-1, the - disks should either be of equal size, or you will just have to - accept that the S in the (N-1)*S formula above will be the size of - the smallest drive in the array. - - · If one drive fails, the parity information can be used to - reconstruct all data. If two drives fail, all data is lost. - - · The reason this level is not more frequently used, is because the - parity information is kept on one drive. This information must be - updated every time one of the other disks are written to. Thus, the - parity disk will become a bottleneck, if it is not a lot faster - than the other disks. However, if you just happen to have a lot of - slow disks and a very fast one, this RAID level can be very useful. - - · RAID-5 - - · This is perhaps the most useful RAID mode when one wishes to - combine a larger number of physical disks, and still maintain some - redundancy. RAID-5 can be used on three or more disks, with zero or - more spare-disks. The resulting RAID-5 device size will be (N-1)*S, - just like RAID-4. The big difference between RAID-5 and -4 is, that - the parity information is distributed evenly among the - participating drives, avoiding the bottleneck problem in RAID-4. - - · If one of the disks fail, all data are still intact, thanks to the - parity information. If spare disks are available, reconstruction - will begin immediately after the device failure. If two disks fail - simultaneously, all data are lost. RAID-5 can survive one disk - failure, but not two or more. - - · Both read and write performance usually increase, but can be hard - to predict how much. Reads are similar to RAID-0 reads, writes can - be either rather expensive (requiring read-in prior to write, in - order to be able to calculate the correct parity information), or - similar to RAID-1 writes. The write efficiency depends heavily on - the amount of memory in the machine, and the usage pattern of the - array. Heavily scattered writes are bound to be more expensive. - - - 1.5. Requirements - - This HOWTO assumes you are using Linux 2.4 or later. However, it is - possible to use Software RAID in late 2.2.x or 2.0.x Linux kernels - with a matching RAID patch and the 0.90 version of the raidtools. Both - the patches and the tools can be found at - http://people.redhat.com/mingo/. The RAID patch, the raidtools - package, and the kernel should all match as close as possible. At - times it can be necessary to use older kernels if raid patches are not - available for the latest kernel. - - If you use and recent GNU/Linux distribution based on the 2.4 kernel - or later, your system most likely already has a matching version of - the raidtools for your kernel. - - - - 2. Why RAID? - - There can be many good reasons for using RAID. A few are; the ability - to combine several physical disks into one larger "virtual" device, - performance improvements, and redundancy. - - It is, however, very important to understand that RAID is not a - substitute for good backups. Some RAID levels will make your systems - immune to data loss from single-disk failures, but RAID will not allow - you to recover from an accidental "rm -rf /". RAID will also not help - you preserve your data if the server holding the RAID itself is lost - in one way or the other (theft, flooding, earthquake, Martian invasion - etc.) - - RAID will generally allow you to keep systems up and running, in case - of common hardware problems (single disk failure). It is not in itself - a complete data safety solution. This is very important to realize. - - - 2.1. Device and filesystem support - - Linux RAID can work on most block devices. It doesn't matter whether - you use IDE or SCSI devices, or a mixture. Some people have also used - the Network Block Device (NBD) with more or less success. - - Since a Linux Software RAID device is itself a block device, the above - implies that you can actually create a RAID of other RAID devices. - This in turn makes it possible to support RAID-10 (RAID-0 of multiple - RAID-1 devices), simply by using the RAID-0 and RAID-1 functionality - together. Other more exotic configurations, such a RAID-5 over RAID-5 - "matrix" configurations are equally supported. - - The RAID layer has absolutely nothing to do with the filesystem layer. - You can put any filesystem on a RAID device, just like any other block - device. - 2.2. Performance - - Often RAID is employed as a solution to performance problems. While - RAID can indeed often be the solution you are looking for, it is not a - silver bullet. There can be many reasons for performance problems, and - RAID is only the solution to a few of them. - - See Chapter one for a mention of the performance characteristics of - each level. - - - - 2.3. Swapping on RAID - - There's no reason to use RAID for swap performance reasons. The kernel - itself can stripe swapping on several devices, if you just give them - the same priority in the /etc/fstab file. - - A nice /etc/fstab looks like: - - /dev/sda2 swap swap defaults,pri=1 0 0 - /dev/sdb2 swap swap defaults,pri=1 0 0 - /dev/sdc2 swap swap defaults,pri=1 0 0 - /dev/sdd2 swap swap defaults,pri=1 0 0 - /dev/sde2 swap swap defaults,pri=1 0 0 - /dev/sdf2 swap swap defaults,pri=1 0 0 - /dev/sdg2 swap swap defaults,pri=1 0 0 - - - This setup lets the machine swap in parallel on seven SCSI devices. No - need for RAID, since this has been a kernel feature for a long time. - - Another reason to use RAID for swap is high availability. If you set - up a system to boot on eg. a RAID-1 device, the system should be able - to survive a disk crash. But if the system has been swapping on the - now faulty device, you will for sure be going down. Swapping on a - RAID-1 device would solve this problem. - - There has been a lot of discussion about whether swap was stable on - RAID devices. This is a continuing debate, because it depends highly - on other aspects of the kernel as well. As of this writing, it seems - that swapping on RAID should be perfectly stable, you should however - stress-test the system yourself until you are satisfied with the - stability. - - You can set up RAID in a swap file on a filesystem on your RAID - device, or you can set up a RAID device as a swap partition, as you - see fit. As usual, the RAID device is just a block device. - - - - 2.4. Why mdadm? - - The classic raidtools are the standard software RAID management tool - for Linux, so using mdadm is not a must. - - However, if you find raidtools cumbersome or limited, mdadm (multiple - devices admin) is an extremely useful tool for running RAID systems. - It can be used as a replacement for the raidtools, or as a supplement. - - The mdadm tool, written by Neil Brown, a software engineer at the - University of New South Wales and a kernel developer, is now at - version 1.4.0 and has proved to be quite stable. There is much - positive response on the Linux-raid mailing list and mdadm is likely - to become widespread in the future. - - The main differences between mdadm and raidtools are: - - - · mdadm can diagnose, monitor and gather detailed information about - your arrays - - · mdadm is a single centralized program and not a collection of - disperse programs, so there's a common syntax for every RAID - management command - - · mdadm can perform almost all of its functions without having a - configuration file and does not use one by default - - · Also, if a configuration file is needed, mdadm will help with - management of it's contents - - - - 3. Devices - - Software RAID devices are so-called "block" devices, like ordinary - disks or disk partitions. A RAID device is "built" from a number of - other block devices - for example, a RAID-1 could be built from two - ordinary disks, or from two disk partitions (on separate disks - - please see the description of RAID-1 for details on this). - - There are no other special requirements to the devices from which you - build your RAID devices - this gives you a lot of freedom in designing - your RAID solution. For example, you can build a RAID from a mix of - IDE and SCSI devices, and you can even build a RAID from other RAID - devices (this is useful for RAID-0+1, where you simply construct two - RAID-1 devices from ordinary disks, and finally construct a RAID-0 - device from those two RAID-1 devices). - - Therefore, in the following text, we will use the word "device" as - meaning "disk", "partition", or even "RAID device". A "device" in the - following text simply refers to a "Linux block device". It could be - anything from a SCSI disk to a network block device. We will commonly - refer to these "devices" simply as "disks", because that is what they - will be in the common case. - - However, there are several roles that devices can play in your arrays. - A device could be a "spare disk", it could have failed and thus be a - "faulty disk", or it could be a normally working and fully functional - device actively used by the array. - - In the following we describe two special types of devices; namely the - "spare disks" and the "faulty disks". - - - - 3.1. Spare disks - - Spare disks are disks that do not take part in the RAID set until one - of the active disks fail. When a device failure is detected, that - device is marked as "bad" and reconstruction is immediately started on - the first spare-disk available. - - Thus, spare disks add a nice extra safety to especially RAID-5 systems - that perhaps are hard to get to (physically). One can allow the system - to run for some time, with a faulty device, since all redundancy is - preserved by means of the spare disk. - - You cannot be sure that your system will keep running after a disk - crash though. The RAID layer should handle device failures just fine, - but SCSI drivers could be broken on error handling, or the IDE chipset - could lock up, or a lot of other things could happen. - - Also, once reconstruction to a hot-spare begins, the RAID layer will - start reading from all the other disks to re-create the redundant - information. If multiple disks have built up bad blocks over time, the - reconstruction itself can actually trigger a failure on one of the - "good" disks. This will lead to a complete RAID failure. If you do - frequent backups of the entire filesystem on the RAID array, then it - is highly unlikely that you would ever get in this situation - this is - another very good reason for taking frequent backups. Remember, RAID - is not a substitute for backups. - - - - 3.2. Faulty disks - - When the RAID layer handles device failures just fine, crashed disks - are marked as faulty, and reconstruction is immediately started on the - first spare-disk available. - - Faulty disks still appear and behave as members of the array. The RAID - layer just treats crashed devices as inactive parts of the filesystem. - - - - 4. Hardware issues - - This section will mention some of the hardware concerns involved when - running software RAID. - - If you are going after high performance, you should make sure that the - bus(ses) to the drives are fast enough. You should not have 14 UW-SCSI - drives on one UW bus, if each drive can give 20 MB/s and the bus can - only sustain 160 MB/s. Also, you should only have one device per IDE - bus. Running disks as master/slave is horrible for performance. IDE is - really bad at accessing more that one drive per bus. Of Course, all - newer motherboards have two IDE busses, so you can set up two disks in - RAID without buying more controllers. Extra IDE controllers are rather - cheap these days, so setting up 6-8 disk systems with IDE is easy and - affordable. - - - 4.1. IDE Configuration - - It is indeed possible to run RAID over IDE disks. And excellent - performance can be achieved too. In fact, today's price on IDE drives - and controllers does make IDE something to be considered, when setting - up new RAID systems. - - · Physical stability: IDE drives has traditionally been of lower - mechanical quality than SCSI drives. Even today, the warranty on - IDE drives is typically one year, whereas it is often three to five - years on SCSI drives. Although it is not fair to say, that IDE - drives are per definition poorly made, one should be aware that IDE - drives of some brand may fail more often that similar SCSI drives. - However, other brands use the exact same mechanical setup for both - SCSI and IDE drives. It all boils down to: All disks fail, sooner - or later, and one should be prepared for that. - - · Data integrity: Earlier, IDE had no way of assuring that the data - sent onto the IDE bus would be the same as the data actually - written to the disk. This was due to total lack of parity, - checksums, etc. With the Ultra-DMA standard, IDE drives now do a - checksum on the data they receive, and thus it becomes highly - unlikely that data get corrupted. The PCI bus however, does not - have parity or checksum, and that bus is used for both IDE and SCSI - systems. - - · Performance: I am not going to write thoroughly about IDE - performance here. The really short story is: - - · IDE drives are fast, although they are not (as of this writing) - found in 10.000 or 15.000 rpm versions as their SCSI counterparts - - · IDE has more CPU overhead than SCSI (but who cares?) - - · Only use one IDE drive per IDE bus, slave disks spoil performance - - · Fault survival: The IDE driver usually survives a failing IDE - device. The RAID layer will mark the disk as failed, and if you are - running RAID levels 1 or above, the machine should work just fine - until you can take it down for maintenance. - - It is very important, that you only use one IDE disk per IDE bus. Not - only would two disks ruin the performance, but the failure of a disk - often guarantees the failure of the bus, and therefore the failure of - all disks on that bus. In a fault-tolerant RAID setup (RAID levels - 1,4,5), the failure of one disk can be handled, but the failure of two - disks (the two disks on the bus that fails due to the failure of the - one disk) will render the array unusable. Also, when the master drive - on a bus fails, the slave or the IDE controller may get awfully - confused. One bus, one drive, that's the rule. - - There are cheap PCI IDE controllers out there. You often get two or - four busses for around $80. Considering the much lower price of IDE - disks versus SCSI disks, an IDE disk array can often be a really nice - solution if one can live with the relatively low number (around 8 - probably) of disks one can attach to a typical system. - - IDE has major cabling problems when it comes to large arrays. Even if - you had enough PCI slots, it's unlikely that you could fit much more - than 8 disks in a system and still get it running without data - corruption caused by too long IDE cables. - - Furthermore, some of the newer IDE drives come with a restriction that - they are only to be used a given number of hours per day. These drives - are meant for desktop usage, and it can lead to severe problems if - these are used in a 24/7 server RAID environment. - - - - 4.2. Hot Swap - - Although hot swapping of drives is supported to some extent, it is - still not something one can do easily. - - - 4.2.1. Hot-swapping IDE drives - - Don't ! IDE doesn't handle hot swapping at all. Sure, it may work for - you, if your IDE driver is compiled as a module (only possible in the - 2.2 series of the kernel), and you re-load it after you've replaced - the drive. But you may just as well end up with a fried IDE - controller, and you'll be looking at a lot more down-time than just - the time it would have taken to replace the drive on a downed system. - - The main problem, except for the electrical issues that can destroy - your hardware, is that the IDE bus must be re-scanned after disks are - swapped. While newer Linux kernels do support re-scan of an IDE bus - (with the help of the hdparm utility), re-detecting partitions is - still something that is lacking. If the new disk is 100% identical to - the old one (wrt. geometry etc.), it may work, but really, you are - walking the bleeding edge here. - - - 4.2.2. Hot-swapping SCSI drives - - Normal SCSI hardware is not hot-swappable either. It may however work. - If your SCSI driver supports re-scanning the bus, and removing and - appending devices, you may be able to hot-swap devices. However, on a - normal SCSI bus you probably shouldn't unplug devices while your - system is still powered up. But then again, it may just work (and you - may end up with fried hardware). - - The SCSI layer should survive if a disk dies, but not all SCSI drivers - handle this yet. If your SCSI driver dies when a disk goes down, your - system will go with it, and hot-plug isn't really interesting then. - - - 4.2.3. Hot-swapping with SCA - - With SCA, it is possible to hot-plug devices. Unfortunately, this is - not as simple as it should be, but it is both possible and safe. - - Replace the RAID device, disk device, and host/channel/id/lun numbers - with the appropriate values in the example below: - - - · Dump the partition table from the drive, if it is still readable: - - sfdisk -d /dev/sdb > partitions.sdb - - - - · Remove the drive to replace from the array: - - raidhotremove /dev/md0 /dev/sdb1 - - - - · Look up the Host, Channel, ID and Lun of the drive to replace, by - looking in - - /proc/scsi/scsi - - - - · Remove the drive from the bus: - - echo "scsi remove-single-device 0 0 2 0" > /proc/scsi/scsi - - - - · Verify that the drive has been correctly removed, by looking in - - /proc/scsi/scsi - - - - · Unplug the drive from your SCA bay, and insert a new drive - - · Add the new drive to the bus: - - echo "scsi add-single-device 0 0 2 0" > /proc/scsi/scsi - - - - (this should spin up the drive as well) - - · Re-partition the drive using the previously dumped partition table: - - sfdisk /dev/sdb < partitions.sdb - - - - · Add the drive to your array: - - raidhotadd /dev/md0 /dev/sdb2 - - - - The arguments to the "scsi remove-single-device" commands are: Host, - Channel, Id and Lun. These numbers are found in the "/proc/scsi/scsi" - file. - - The above steps have been tried and tested on a system with IBM SCA - disks and an Adaptec SCSI controller. If you encounter problems or - find easier ways to do this, please discuss this on the linux-raid - mailing list. - - - - 5. RAID setup - - - - 5.1. General setup - - This is what you need for any of the RAID levels: - - · A kernel. Preferably a kernel from the 2.4 series. Alternatively a - 2.0 or 2.2 kernel with the RAID patches applied. - - · The RAID tools. - - · Patience, Pizza, and your favorite caffeinated beverage. - - All of this is included as standard in most GNU/Linux distributions - today. - - If your system has RAID support, you should have a file called - /proc/mdstat. Remember it, that file is your friend. If you do not - have that file, maybe your kernel does not have RAID support. See what - the contains, by doing a cat /proc/mdstat. It should tell you that you - have the right RAID personality (eg. RAID mode) registered, and that - no RAID devices are currently active. - - Create the partitions you want to include in your RAID set. - - - - 5.2. Downloading and installing the RAID tools - - The RAID tools are included in almost every major Linux distribution. - - IMPORTANT: If using Debian Woody (3.0) or later, you can install the - package by running - - apt-get install raidtools2 - - - - This raidtools2 is a modern version of the old raidtools package, - which doesn't support the persistent-superblock and parity-algorithm - settings. - - - - 5.3. Downloading and installing mdadm - - You can download the most recent mdadm tarball at - http://www.cse.unsw.edu.au/~neilb/source/mdadm/. Issue a nice make - install to compile and then install mdadm and its documentation, - manual pages and example files. - - tar xvf ./mdadm-1.4.0.tgz - cd mdadm-1.4.0.tgz - make install - - - If using an RPM-based distribution, you can download and install the - package file found at - http://www.cse.unsw.edu.au/~neilb/source/mdadm/RPM. - - rpm -ihv mdadm-1.4.0-1.i386.rpm - - - If using Debian Woody (3.0) or later, you can install the package by - running - - apt-get install mdadm - - - Gentoo has this package available in the portage tree. There you can - run - - emerge mdadm - - - Other distributions may also have this package available. Now, let's - go mode-specific. - - - - 5.4. Linear mode - - Ok, so you have two or more partitions which are not necessarily the - same size (but of course can be), which you want to append to each - other. - - Set up the /etc/raidtab file to describe your setup. I set up a - raidtab for two disks in linear mode, and the file looked like this: - - - raiddev /dev/md0 - raid-level linear - nr-raid-disks 2 - chunk-size 32 - persistent-superblock 1 - device /dev/sdb6 - raid-disk 0 - device /dev/sdc5 - raid-disk 1 - - - Spare-disks are not supported here. If a disk dies, the array dies - with it. There's no information to put on a spare disk. - You're probably wondering why we specify a chunk-size here when linear - mode just appends the disks into one large array with no parallelism. - Well, you're completely right, it's odd. Just put in some chunk size - and don't worry about this any more. - - Ok, let's create the array. Run the command - - mkraid /dev/md0 - - - - This will initialize your array, write the persistent superblocks, and - start the array. - - If you are using mdadm, a single command like - - mdadm --create --verbose /dev/md0 --level=linear --raid-devices=2 /dev/sdb6 /dev/sdc5 - - - should create the array. The parameters talk for themselves. The out­ - put might look like this - - mdadm: chunk size defaults to 64K - mdadm: array /dev/md0 started. - - - - Have a look in /proc/mdstat. You should see that the array is running. - - Now, you can create a filesystem, just like you would on any other - device, mount it, include it in your /etc/fstab and so on. - - - - 5.5. RAID-0 - - You have two or more devices, of approximately the same size, and you - want to combine their storage capacity and also combine their - performance by accessing them in parallel. - - Set up the /etc/raidtab file to describe your configuration. An - example raidtab looks like: - - raiddev /dev/md0 - raid-level 0 - nr-raid-disks 2 - persistent-superblock 1 - chunk-size 4 - device /dev/sdb6 - raid-disk 0 - device /dev/sdc5 - raid-disk 1 - - - Like in Linear mode, spare disks are not supported here either. RAID-0 - has no redundancy, so when a disk dies, the array goes with it. - - Again, you just run - - mkraid /dev/md0 - - - to initialize the array. This should initialize the superblocks and - start the raid device. Have a look in /proc/mdstat to see what's - going on. You should see that your device is now running. - - /dev/md0 is now ready to be formatted, mounted, used and abused. - - - - 5.6. RAID-1 - - You have two devices of approximately same size, and you want the two - to be mirrors of each other. Eventually you have more devices, which - you want to keep as stand-by spare-disks, that will automatically - become a part of the mirror if one of the active devices break. - - Set up the /etc/raidtab file like this: - - raiddev /dev/md0 - raid-level 1 - nr-raid-disks 2 - nr-spare-disks 0 - persistent-superblock 1 - device /dev/sdb6 - raid-disk 0 - device /dev/sdc5 - raid-disk 1 - - - If you have spare disks, you can add them to the end of the device - specification like - - device /dev/sdd5 - spare-disk 0 - - - Remember to set the nr-spare-disks entry correspondingly. - - Ok, now we're all set to start initializing the RAID. The mirror must - be constructed, eg. the contents (however unimportant now, since the - device is still not formatted) of the two devices must be - synchronized. - - Issue the - - mkraid /dev/md0 - - - command to begin the mirror initialization. - - Check out the /proc/mdstat file. It should tell you that the /dev/md0 - device has been started, that the mirror is being reconstructed, and - an ETA of the completion of the reconstruction. - - Reconstruction is done using idle I/O bandwidth. So, your system - should still be fairly responsive, although your disk LEDs should be - glowing nicely. - - The reconstruction process is transparent, so you can actually use the - device even though the mirror is currently under reconstruction. - - Try formatting the device, while the reconstruction is running. It - will work. Also you can mount it and use it while reconstruction is - running. Of Course, if the wrong disk breaks while the reconstruction - is running, you're out of luck. - - - - 5.7. RAID-4 - - Note! I haven't tested this setup myself. The setup below is my best - guess, not something I have actually had up running. If you use - RAID-4, please write to the author and share your experiences. - - You have three or more devices of roughly the same size, one device is - significantly faster than the other devices, and you want to combine - them all into one larger device, still maintaining some redundancy - information. Eventually you have a number of devices you wish to use - as spare-disks. - - Set up the /etc/raidtab file like this: - - raiddev /dev/md0 - raid-level 4 - nr-raid-disks 4 - nr-spare-disks 0 - persistent-superblock 1 - chunk-size 32 - device /dev/sdb1 - raid-disk 0 - device /dev/sdc1 - raid-disk 1 - device /dev/sdd1 - raid-disk 2 - device /dev/sde1 - raid-disk 3 - - - If we had any spare disks, they would be inserted in a similar way, - following the raid-disk specifications; - - device /dev/sdf1 - spare-disk 0 - - - as usual. - - Your array can be initialized with the - - mkraid /dev/md0 - - - command as usual. - - You should see the section on special options for mke2fs before - formatting the device. - - - - 5.8. RAID-5 - - You have three or more devices of roughly the same size, you want to - combine them into a larger device, but still to maintain a degree of - redundancy for data safety. Eventually you have a number of devices to - use as spare-disks, that will not take part in the array before - another device fails. - - If you use N devices where the smallest has size S, the size of the - entire array will be (N-1)*S. This "missing" space is used for parity - (redundancy) information. Thus, if any disk fails, all data stay - intact. But if two disks fail, all data is lost. - - Set up the /etc/raidtab file like this: - - raiddev /dev/md0 - raid-level 5 - nr-raid-disks 7 - nr-spare-disks 0 - persistent-superblock 1 - parity-algorithm left-symmetric - chunk-size 32 - device /dev/sda3 - raid-disk 0 - device /dev/sdb1 - raid-disk 1 - device /dev/sdc1 - raid-disk 2 - device /dev/sdd1 - raid-disk 3 - device /dev/sde1 - raid-disk 4 - device /dev/sdf1 - raid-disk 5 - device /dev/sdg1 - raid-disk 6 - - - If we had any spare disks, they would be inserted in a similar way, - following the raid-disk specifications; - - device /dev/sdh1 - spare-disk 0 - - - And so on. - - A chunk size of 32 kB is a good default for many general purpose - filesystems of this size. The array on which the above raidtab is - used, is a 7 times 6 GB = 36 GB (remember the (n-1)*s = (7-1)*6 = 36) - device. It holds an ext2 filesystem with a 4 kB block size. You could - go higher with both array chunk-size and filesystem block-size if your - filesystem is either much larger, or just holds very large files. - - Ok, enough talking. You set up the /etc/raidtab, so let's see if it - works. Run the - - mkraid /dev/md0 - - - command, and see what happens. Hopefully your disks start working - like mad, as they begin the reconstruction of your array. Have a look - in /proc/mdstat to see what's going on. - - If the device was successfully created, the reconstruction process has - now begun. Your array is not consistent until this reconstruction - phase has completed. However, the array is fully functional (except - for the handling of device failures of course), and you can format it - and use it even while it is reconstructing. - - See the section on special options for mke2fs before formatting the - array. - - Ok, now when you have your RAID device running, you can always stop it - or re-start it using the - - raidstop /dev/md0 - - - or - - raidstart /dev/md0 - - - commands. - - With mdadm you can stop the device using - - mdadm -S /dev/md0 - - - and re-start it with - - mdadm -R /dev/md0 - - - Instead of putting these into init-files and rebooting a zillion times - to make that work, read on, and get autodetection running. - - - - 5.9. The Persistent Superblock - - Back in "The Good Old Days" (TM), the raidtools would read your - /etc/raidtab file, and then initialize the array. However, this would - require that the filesystem on which /etc/raidtab resided was mounted. - This is unfortunate if you want to boot on a RAID. - - Also, the old approach led to complications when mounting filesystems - on RAID devices. They could not be put in the /etc/fstab file as - usual, but would have to be mounted from the init-scripts. - - The persistent superblocks solve these problems. When an array is - initialized with the persistent-superblock option in the /etc/raidtab - file, a special superblock is written in the beginning of all disks - participating in the array. This allows the kernel to read the - configuration of RAID devices directly from the disks involved, - instead of reading from some configuration file that may not be - available at all times. - - You should however still maintain a consistent /etc/raidtab file, - since you may need this file for later reconstruction of the array. - - The persistent superblock is mandatory if you want auto-detection of - your RAID devices upon system boot. This is described in the - Autodetection section. - - - - 5.10. Chunk sizes - - The chunk-size deserves an explanation. You can never write - completely parallel to a set of disks. If you had two disks and wanted - to write a byte, you would have to write four bits on each disk, - actually, every second bit would go to disk 0 and the others to disk - 1. Hardware just doesn't support that. Instead, we choose some chunk- - size, which we define as the smallest "atomic" mass of data that can - be written to the devices. A write of 16 kB with a chunk size of 4 - kB, will cause the first and the third 4 kB chunks to be written to - the first disk, and the second and fourth chunks to be written to the - second disk, in the RAID-0 case with two disks. Thus, for large - writes, you may see lower overhead by having fairly large chunks, - whereas arrays that are primarily holding small files may benefit more - from a smaller chunk size. - - Chunk sizes must be specified for all RAID levels, including linear - mode. However, the chunk-size does not make any difference for linear - mode. - - For optimal performance, you should experiment with the value, as well - as with the block-size of the filesystem you put on the array. - - The argument to the chunk-size option in /etc/raidtab specifies the - chunk-size in kilobytes. So "4" means "4 kB". - - - 5.10.1. RAID-0 - - Data is written "almost" in parallel to the disks in the array. - Actually, chunk-size bytes are written to each disk, serially. - - If you specify a 4 kB chunk size, and write 16 kB to an array of three - disks, the RAID system will write 4 kB to disks 0, 1 and 2, in - parallel, then the remaining 4 kB to disk 0. - - A 32 kB chunk-size is a reasonable starting point for most arrays. But - the optimal value depends very much on the number of drives involved, - the content of the file system you put on it, and many other factors. - Experiment with it, to get the best performance. - - - 5.10.2. RAID-0 with ext2 - - The following tip was contributed by michael@freenet-ag.de: - - There is more disk activity at the beginning of ext2fs block groups. - On a single disk, that does not matter, but it can hurt RAID0, if all - block groups happen to begin on the same disk. Example: - - With 4k stripe size and 4k block size, each block occupies one stripe. - With two disks, the stripe-#disk-product is 2*4k=8k. The default - block group size is 32768 blocks, so all block groups start on disk 0, - which can easily become a hot spot, thus reducing overall performance. - Unfortunately, the block group size can only be set in steps of 8 - blocks (32k when using 4k blocks), so you can not avoid the problem by - adjusting the block group size with the -g option of mkfs(8). - - If you add a disk, the stripe-#disk-product is 12, so the first block - group starts on disk 0, the second block group starts on disk 2 and - the third on disk 1. The load caused by disk activity at the block - group beginnings spreads over all disks. - - In case you can not add a disk, try a stripe size of 32k. The - stripe-#disk-product is 64k. Since you can change the block group - size in steps of 8 blocks (32k), using a block group size of 32760 - solves the problem. - - Additionally, the block group boundaries should fall on stripe - boundaries. That is no problem in the examples above, but it could - easily happen with larger stripe sizes. - - - 5.10.3. RAID-1 - - For writes, the chunk-size doesn't affect the array, since all data - must be written to all disks no matter what. For reads however, the - chunk-size specifies how much data to read serially from the - participating disks. Since all active disks in the array contain the - same information, the RAID layer has complete freedom in choosing from - which disk information is read - this is used by the RAID code to - improve average seek times by picking the disk best suited for any - given read operation. - - 5.10.4. RAID-4 - - When a write is done on a RAID-4 array, the parity information must be - updated on the parity disk as well. - - The chunk-size affects read performance in the same way as in RAID-0, - since reads from RAID-4 are done in the same way. - - - 5.10.5. RAID-5 - - On RAID-5, the chunk size has the same meaning for reads as for - RAID-0. Writing on RAID-5 is a little more complicated: When a chunk - is written on a RAID-5 array, the corresponding parity chunk must be - updated as well. Updating a parity chunk requires either - - · The original chunk, the new chunk, and the old parity block - - · Or, all chunks (except for the parity chunk) in the stripe - - The RAID code will pick the easiest way to update each parity chunk - as the write progresses. Naturally, if your server has lots of - memory and/or if the writes are nice and linear, updating the - parity chunks will only impose the overhead of one extra write - going over the bus (just like RAID-1). The parity calculation - itself is extremely efficient, so while it does of course load the - main CPU of the system, this impact is negligible. If the writes - are small and scattered all over the array, the RAID layer will - almost always need to read in all the untouched chunks from each - stripe that is written to, in order to calculate the parity chunk. - This will impose extra bus-overhead and latency due to extra reads. - - A reasonable chunk-size for RAID-5 is 128 kB, but as always, you may - want to experiment with this. - - Also see the section on special options for mke2fs. This affects - RAID-5 performance. - - - - 5.11. Options for mke2fs - - There is a special option available when formatting RAID-4 or -5 - devices with mke2fs. The -R stride=nn option will allow mke2fs to - better place different ext2 specific data-structures in an intelligent - way on the RAID device. - - If the chunk-size is 32 kB, it means, that 32 kB of consecutive data - will reside on one disk. If we want to build an ext2 filesystem with 4 - kB block-size, we realize that there will be eight filesystem blocks - in one array chunk. We can pass this information on the mke2fs - utility, when creating the filesystem: - - mke2fs -b 4096 -R stride=8 /dev/md0 - - - - RAID-{4,5} performance is severely influenced by this option. I am - unsure how the stride option will affect other RAID levels. If anyone - has information on this, please send it in my direction. - - The ext2fs blocksize severely influences the performance of the - filesystem. You should always use 4kB block size on any filesystem - larger than a few hundred megabytes, unless you store a very large - number of very small files on it. - - 6. Detecting, querying and testing - - This section is about life with a software RAID system, that's - communicating with the arrays and tinkertoying them. - - Note that when it comes to md devices manipulation, you should always - remember that you are working with entire filesystems. So, although - there could be some redundancy to keep your files alive, you must - proceed with caution. - - - - 6.1. Detecting a drive failure - - No mistery here. It's enough with a quick look to the standard log and - stat files to notice a drive failure. - - It's always a must for /var/log/messages to fill screens with tons of - error messages, no matter what happened. But, when it's about a disk - crash, huge lots of kernel errors are reported. Some nasty examples, - for the masochists, - - kernel: scsi0 channel 0 : resetting for second half of retries. - kernel: SCSI bus is being reset for host 0 channel 0. - kernel: scsi0: Sending Bus Device Reset CCB #2666 to Target 0 - kernel: scsi0: Bus Device Reset CCB #2666 to Target 0 Completed - kernel: scsi : aborting command due to timeout : pid 2649, scsi0, channel 0, id 0, lun 0 Write (6) 18 33 11 24 00 - kernel: scsi0: Aborting CCB #2669 to Target 0 - kernel: SCSI host 0 channel 0 reset (pid 2644) timed out - trying harder - kernel: SCSI bus is being reset for host 0 channel 0. - kernel: scsi0: CCB #2669 to Target 0 Aborted - kernel: scsi0: Resetting BusLogic BT-958 due to Target 0 - kernel: scsi0: *** BusLogic BT-958 Initialized Successfully *** - - - Most often, disk failures look like these, - - kernel: sidisk I/O error: dev 08:01, sector 1590410 - kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 28000002 - - - or these - - kernel: hde: read_intr: error=0x10 { SectorIdNotFound }, CHS=31563/14/35, sector=0 - kernel: hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } - - - And, as expected, the classic /proc/mdstat look will also reveal prob­ - lems, - - Personalities : [linear] [raid0] [raid1] [translucent] - read_ahead not set - md7 : active raid1 sdc9[0] sdd5[8] 32000 blocks [2/1] [U_] - - - Later on this section we will learn how to monitor RAID with mdadm so - we can receive alert reports about disk failures. Now it's time to - learn more about /proc/mdstat interpretation. - - - - 6.2. Querying the arrays status - - You can always take a look at /proc/mdstat. It won't hurt. Let's learn - how to read the file. For example, - - Personalities : [raid1] - read_ahead 1024 sectors - md5 : active raid1 sdb5[1] sda5[0] - 4200896 blocks [2/2] [UU] - - md6 : active raid1 sdb6[1] sda6[0] - 2104384 blocks [2/2] [UU] - - md7 : active raid1 sdb7[1] sda7[0] - 2104384 blocks [2/2] [UU] - - md2 : active raid1 sdc7[1] sdd8[2] sde5[0] - 1052160 blocks [2/2] [UU] - - unused devices: none - - - To identify the spare devices, first look for the [#/#] value on a - line. The first number is the number of a complete raid device as - defined. Lets say it is "n". The raid role numbers [#] following - each device indicate its role, or function, within the raid set. Any - device with "n" or higher are spare disks. 0,1,..,n-1 are for the - working array. - - Also, if you have a failure, the failed device will be marked with (F) - after the [#]. The spare that replaces this device will be the device - with the lowest role number n or higher that is not marked (F). Once - the resync operation is complete, the device's role numbers are - swapped. - - The order in which the devices appear in the /proc/mdstat output means - nothing. - - Finally, remember that you can always use raidtools or mdadm to check - the arrays out. - - mdadm --detail /dev/mdx - lsraid -a /dev/mdx - - - These commands will show spare and failed disks loud and clear. - - - - 6.3. Simulating a drive failure - - If you plan to use RAID to get fault-tolerance, you may also want to - test your setup, to see if it really works. Now, how does one - simulate a disk failure? - - The short story is, that you can't, except perhaps for putting a fire - axe thru the drive you want to "simulate" the fault on. You can never - know what will happen if a drive dies. It may electrically take the - bus it is attached to with it, rendering all drives on that bus - inaccessible. I have never heard of that happening though, but it is - entirely possible. The drive may also just report a read/write fault - to the SCSI/IDE layer, which in turn makes the RAID layer handle this - situation gracefully. This is fortunately the way things often go. - - Remember, that you must be running RAID-{1,4,5} for your array to be - able to survive a disk failure. Linear- or RAID-0 will fail - completely when a device is missing. - - - - 6.3.1. Force-fail by hardware - - If you want to simulate a drive failure, you can just plug out the - drive. You should do this with the power off. If you are interested - in testing whether your data can survive with a disk less than the - usual number, there is no point in being a hot-plug cowboy here. Take - the system down, unplug the disk, and boot it up again. - - Look in the syslog, and look at /proc/mdstat to see how the RAID is - doing. Did it work? - - Faulty disks should appear marked with an (F) if you look at - /proc/mdstat. Also, users of mdadm should see the device state as - faulty. - - When you've re-connected the disk again (with the power off, of - course, remember), you can add the "new" device to the RAID again, - with the raidhotadd command. - - - - 6.3.2. Force-fail by software - - Newer versions of raidtools come with a raidsetfaulty command. By - using raidsetfaulty you can just simulate a drive failure without - unplugging things off. - - Just running the command - - raidsetfaulty /dev/md1 /dev/sdc2 - - - should be enough to fail the disk /dev/sdc2 of the array /dev/md1. If - you are using mdadm, just type - - mdadm --manage --set-faulty /dev/md1 /dev/sdc2 - - - Now things move up and fun appears. First, you should see something - like the first line of this on your system's log. Something like the - second line will appear if you have spare disks configured. - - kernel: raid1: Disk failure on sdc2, disabling device. - kernel: md1: resyncing spare disk sdb7 to replace failed disk - - - Checking /proc/mdstat out will show the degraded array. If there was a - spare disk available, reconstruction should have started. - - Another fresh utility in newest raidtools is lsraid. Try with - - lsraid -a /dev/md1 - - - users of mdadm can run the command - - mdadm --detail /dev/md1 - - - and enjoy the view. - - Now you've seen how it goes when a device fails. Let's fix things up. - - First, we will remove the failed disk from the array. Run the command - - - raidhotremove /dev/md1 /dev/sdc2 - - - users of mdadm can run the command - - mdadm /dev/md1 -r /dev/sdc2 - - - Note that raidhotremove cannot pull a disk out of a running array. - For obvious reasons, only crashed disks are to be hotremoved from an - array (running raidstop and unmounting the device won't help). - - Now we have a /dev/md1 which has just lost a device. This could be a - degraded RAID or perhaps a system in the middle of a reconstruction - process. We wait until recovery ends before setting things back to - normal. - - So the trip ends when we send /dev/sdc2 back home. - - raidhotadd /dev/md1 /dev/sdc2 - - - As usual, you can use mdadm instead of raidtools. This should be the - command - - mdadm /dev/md1 -a /dev/sdc2 - - - As the prodigal son returns to the array, we'll see it becoming an - active member of /dev/md1 if necessary. If not, it will be marked as - an spare disk. That's management made easy. - - - - 6.4. Simulating data corruption - - RAID (be it hardware- or software-), assumes that if a write to a disk - doesn't return an error, then the write was successful. Therefore, if - your disk corrupts data without returning an error, your data will - become corrupted. This is of course very unlikely to happen, but it - is possible, and it would result in a corrupt filesystem. - - RAID cannot and is not supposed to guard against data corruption on - the media. Therefore, it doesn't make any sense either, to purposely - corrupt data (using dd for example) on a disk to see how the RAID - system will handle that. It is most likely (unless you corrupt the - RAID superblock) that the RAID layer will never find out about the - corruption, but your filesystem on the RAID device will be corrupted. - - This is the way things are supposed to work. RAID is not a guarantee - for data integrity, it just allows you to keep your data if a disk - dies (that is, with RAID levels above or equal one, of course). - - - - 6.5. Monitoring RAID arrays - - You can run mdadm as a daemon by using the follow-monitor mode. If - needed, that will make mdadm send email alerts to the system - administrator when arrays encounter errors or fail. Also, follow mode - can be used to trigger contingency commands if a disk fails, like - giving a second chance to a failed disk by removing and reinserting - it, so a non-fatal failure could be automatically solved. - - Let's see a basic example. Running - - mdadm --monitor --mail=root@localhost --delay=1800 /dev/md2 - - - should release a mdadm daemon to monitor /dev/md2. The delay parame­ - ter means that polling will be done in intervals of 1800 seconds. - Finally, critical events and fatal errors should be e-mailed to the - system manager. That's RAID monitoring made easy. - - Finally, the --program or --alert parameters specify the program to be - run whenever an event is detected. - - Note that the mdadm daemon will never exit once it decides that there - are arrays to monitor, so it should normally be run in the background. - Remember that your are running a daemon, not a shell command. - - Using mdadm to monitor a RAID array is simple and effective. However, - there are fundamental problems with that kind of monitoring - what - happens, for example, if the mdadm daemon stops? In order to overcome - this problem, one should look towards "real" monitoring solutions. - There is a number of free software, open source, and commercial - solutions available which can be used for Software RAID monitoring on - Linux. A search on FreshMeat should return a good number of matches. - - - - 7. Tweaking, tuning and troubleshooting - - - 7.1. raid-level and raidtab - - Some GNU/Linux distributions, like RedHat 8.0 and possibly others, - have a bug in their init-scripts, so that they will fail to start up - RAID arrays on boot, if your /etc/raidtab has spaces or tabs before - the raid-level keywords. - - The simple workaround for this problem is to make sure that the raid- - level keyword appears in the very beginning of the lines, without any - leading spaces of any kind. - - - 7.2. Autodetection - - Autodetection allows the RAID devices to be automatically recognized - by the kernel at boot-time, right after the ordinary partition - detection is done. - - This requires several things: - - 1. You need autodetection support in the kernel. Check this - - 2. You must have created the RAID devices using persistent-superblock - - 3. The partition-types of the devices used in the RAID must be set to - 0xFD (use fdisk and set the type to "fd") - - NOTE: Be sure that your RAID is NOT RUNNING before changing the - partition types. Use raidstop /dev/md0 to stop the device. - - If you set up 1, 2 and 3 from above, autodetection should be set up. - Try rebooting. When the system comes up, cat'ing /proc/mdstat should - tell you that your RAID is running. - - During boot, you could see messages similar to these: - - - - Oct 22 00:51:59 malthe kernel: SCSI device sdg: hdwr sector= 512 - bytes. Sectors= 12657717 [6180 MB] [6.2 GB] - Oct 22 00:51:59 malthe kernel: Partition check: - Oct 22 00:51:59 malthe kernel: sda: sda1 sda2 sda3 sda4 - Oct 22 00:51:59 malthe kernel: sdb: sdb1 sdb2 - Oct 22 00:51:59 malthe kernel: sdc: sdc1 sdc2 - Oct 22 00:51:59 malthe kernel: sdd: sdd1 sdd2 - Oct 22 00:51:59 malthe kernel: sde: sde1 sde2 - Oct 22 00:51:59 malthe kernel: sdf: sdf1 sdf2 - Oct 22 00:51:59 malthe kernel: sdg: sdg1 sdg2 - Oct 22 00:51:59 malthe kernel: autodetecting RAID arrays - Oct 22 00:51:59 malthe kernel: (read) sdb1's sb offset: 6199872 - Oct 22 00:51:59 malthe kernel: bind - Oct 22 00:51:59 malthe kernel: (read) sdc1's sb offset: 6199872 - Oct 22 00:51:59 malthe kernel: bind - Oct 22 00:51:59 malthe kernel: (read) sdd1's sb offset: 6199872 - Oct 22 00:51:59 malthe kernel: bind - Oct 22 00:51:59 malthe kernel: (read) sde1's sb offset: 6199872 - Oct 22 00:51:59 malthe kernel: bind - Oct 22 00:51:59 malthe kernel: (read) sdf1's sb offset: 6205376 - Oct 22 00:51:59 malthe kernel: bind - Oct 22 00:51:59 malthe kernel: (read) sdg1's sb offset: 6205376 - Oct 22 00:51:59 malthe kernel: bind - Oct 22 00:51:59 malthe kernel: autorunning md0 - Oct 22 00:51:59 malthe kernel: running: - Oct 22 00:51:59 malthe kernel: now! - Oct 22 00:51:59 malthe kernel: md: md0: raid array is not clean -- - starting background reconstruction - - - This is output from the autodetection of a RAID-5 array that was not - cleanly shut down (eg. the machine crashed). Reconstruction is auto­ - matically initiated. Mounting this device is perfectly safe, since - reconstruction is transparent and all data are consistent (it's only - the parity information that is inconsistent - but that isn't needed - until a device fails). - - Autostarted devices are also automatically stopped at shutdown. Don't - worry about init scripts. Just use the /dev/md devices as any other - /dev/sd or /dev/hd devices. - - Yes, it really is that easy. - - You may want to look in your init-scripts for any raidstart/raidstop - commands. These are often found in the standard RedHat init scripts. - They are used for old-style RAID, and has no use in new-style RAID - with autodetection. Just remove the lines, and everything will be just - fine. - - - - 7.3. Booting on RAID - - There are several ways to set up a system that mounts it's root - filesystem on a RAID device. Some distributions allow for RAID setup - in the installation process, and this is by far the easiest way to get - a nicely set up RAID system. - - Newer LILO distributions can handle RAID-1 devices, and thus the - kernel can be loaded at boot-time from a RAID device. LILO will - correctly write boot-records on all disks in the array, to allow - booting even if the primary disk fails. - - If you are using grub instead of LILO, then just start grub and - configure it to use the second (or third, or fourth...) disk in the - RAID-1 array you want to boot off as its root device and run setup. - And that's all. - - For example, on an array consisting of /dev/hda1 and /dev/hdc1 where - both partitions should be bootable you should just do this: - - - grub - grub>device (hd0) /dev/hdc - grub>root (hd0,0) - grub>setup (hd0) - - - - Some users have experienced problems with this, reporting that - although booting with one drive connected worked, booting with both - two drives failed. Nevertheless, running the described procedure with - both disks fixed the problem, allowing the system to boot from either - single drive or from the RAID-1 - - Another way of ensuring that your system can always boot is, to create - a boot floppy when all the setup is done. If the disk on which the - /boot filesystem resides dies, you can always boot from the floppy. On - RedHat and RedHat derived systems, this can be accomplished with the - mkbootdisk command. - - - - 7.4. Root filesystem on RAID - - In order to have a system booting on RAID, the root filesystem (/) - must be mounted on a RAID device. Two methods for achieving this is - supplied bellow. The methods below assume that you install on a normal - partition, and then - when the installation is complete - move the - contents of your non-RAID root filesystem onto a new RAID device. - Please not that this is no longer needed in general, as most newer - GNU/Linux distributions support installation on RAID devices (and - creation of the RAID devices during the installation process). - However, you may still want to use the methods below, if you are - migrating an existing system to RAID. - - - 7.4.1. Method 1 - - This method assumes you have a spare disk you can install the system - on, which is not part of the RAID you will be configuring. - - - · First, install a normal system on your extra disk. - - · Get the kernel you plan on running, get the raid-patches and the - tools, and make your system boot with this new RAID-aware kernel. - Make sure that RAID-support is in the kernel, and is not loaded as - modules. - - · Ok, now you should configure and create the RAID you plan to use - for the root filesystem. This is standard procedure, as described - elsewhere in this document. - - · Just to make sure everything's fine, try rebooting the system to - see if the new RAID comes up on boot. It should. - - · Put a filesystem on the new array (using mke2fs), and mount it - under /mnt/newroot - - · Now, copy the contents of your current root-filesystem (the spare - disk) to the new root-filesystem (the array). There are lots of - ways to do this, one of them is - - cd / - find . -xdev | cpio -pm /mnt/newroot - - - another way to copy everything from / to /mnt/newroot could be - - cp -ax / /mnt/newroot - - - - · You should modify the /mnt/newroot/etc/fstab file to use the - correct device (the /dev/md? root device) for the root filesystem. - - · Now, unmount the current /boot filesystem, and mount the boot - device on /mnt/newroot/boot instead. This is required for LILO to - run successfully in the next step. - - · Update /mnt/newroot/etc/lilo.conf to point to the right devices. - The boot device must still be a regular disk (non-RAID device), but - the root device should point to your new RAID. When done, run - - lilo -r /mnt/newroot - - complete with no errors. - - · Reboot the system, and watch everything come up as expected :) - - If you're doing this with IDE disks, be sure to tell your BIOS that - all disks are "auto-detect" types, so that the BIOS will allow your - machine to boot even when a disk is missing. - - - 7.4.2. Method 2 - - This method requires that your kernel and raidtools understand the - failed-disk directive in the /etc/raidtab file - if you are working on - a really old system this may not be the case, and you will need to - upgrade your tools and/or kernel first. - - You can only use this method on RAID levels 1 and above, as the method - uses an array in "degraded mode" which in turn is only possible if the - RAID level has redundancy. The idea is to install a system on a disk - which is purposely marked as failed in the RAID, then copy the system - to the RAID which will be running in degraded mode, and finally making - the RAID use the no-longer needed "install-disk", zapping the old - installation but making the RAID run in non-degraded mode. - - - · First, install a normal system on one disk (that will later become - part of your RAID). It is important that this disk (or partition) - is not the smallest one. If it is, it will not be possible to add - it to the RAID later on! - - · Then, get the kernel, the patches, the tools etc. etc. You know the - drill. Make your system boot with a new kernel that has the RAID - support you need, compiled into the kernel. - - · Now, set up the RAID with your current root-device as the failed- - disk in the /etc/raidtab file. Don't put the failed-disk as the - first disk in the raidtab, that will give you problems with - starting the RAID. Create the RAID, and put a filesystem on it. If - using mdadm, you can create a degraded array just by running - something like mdadm -C /dev/md0 --level raid1 --raid-disks 2 - missing /dev/hdc1, note the missing parameter. - · Try rebooting and see if the RAID comes up as it should - - · Copy the system files, and reconfigure the system to use the RAID - as root-device, as described in the previous section. - - · When your system successfully boots from the RAID, you can modify - the /etc/raidtab file to include the previously failed-disk as a - normal raid-disk. Now, raidhotadd the disk to your RAID. - - · You should now have a system that can boot from a non-degraded - RAID. - - - - 7.5. Making the system boot on RAID - - For the kernel to be able to mount the root filesystem, all support - for the device on which the root filesystem resides, must be present - in the kernel. Therefore, in order to mount the root filesystem on a - RAID device, the kernel must have RAID support. - - The normal way of ensuring that the kernel can see the RAID device is - to simply compile a kernel with all necessary RAID support compiled - in. Make sure that you compile the RAID support into the kernel, and - not as loadable modules. The kernel cannot load a module (from the - root filesystem) before the root filesystem is mounted. - - However, since RedHat-6.0 ships with a kernel that has new-style RAID - support as modules, I here describe how one can use the standard - RedHat-6.0 kernel and still have the system boot on RAID. - - - 7.5.1. Booting with RAID as module - - You will have to instruct LILO to use a RAM-disk in order to achieve - this. Use the mkinitrd command to create a ramdisk containing all - kernel modules needed to mount the root partition. This can be done - as: - - mkinitrd --with= - - - For example: - - mkinitrd --preload raid5 --with=raid5 raid-ramdisk 2.2.5-22 - - - - This will ensure that the specified RAID module is present at boot- - time, for the kernel to use when mounting the root device. - - - 7.5.2. Modular RAID on Debian GNU/Linux after move to RAID - - Debian users may encounter problems using an initrd to mount their - root filesystem from RAID, if they have migrated a standard non-RAID - Debian install to root on RAID. - - If your system fails to mount the root filesystem on boot (you will - see this in a "kernel panic" message), then the problem may be that - the initrd filesystem does not have the necessary support to mount the - root filesystem from RAID. - - Debian seems to produce its initrd.img files on the assumption that - the root filesystem to be mounted is the current one. This will - usually result in a kernel panic if the root filesystem is moved to - the raid device and you attempt to boot from that device using the - same initrd image. The solution is to use the mkinitrd command but - specifying the proposed new root filesystem. For example, the - following commands should create and set up the new initrd on a Debian - system: - - % mkinitrd -r /dev/md0 -o /boot/initrd.img-2.4.22raid - % mv /initrd.img /initrd.img-nonraid - % ln -s /boot/initrd.img-raid /initrd.img" - - - - 7.6. Converting a non-RAID RedHat System to run on Software RAID - - This section was written and contributed by Mark Price, IBM. The text - has undergone minor changes since his original work. - - Notice: the following information is provided "AS IS" with no - representation or warranty of any kind either express or implied. You - may use it freely at your own risk, and no one else will be liable for - any damages arising out of such usage. - - - 7.6.1. Introduction - - The technote details how to convert a linux system with non RAID - devices to run with a Software RAID configuration. - - - 7.6.2. Scope - - This scenario was tested with Redhat 7.1, but should be applicable to - any release which supports Software RAID (md) devices. - - - 7.6.3. Pre-conversion example system - - The test system contains two SCSI disks, sda and sdb both of of which - are the same physical size. As part of the test setup, I configured - both disks to have the same partition layout, using fdisk to ensure - the number of blocks for each partition was identical. - - DEVICE MOUNTPOINT SIZE DEVICE MOUNTPOINT SIZE - /dev/sda1 / 2048MB /dev/sdb1 2048MB - /dev/sda2 /boot 80MB /dev/sdb2 80MB - /dev/sda3 /var/ 100MB /dev/sdb3 100MB - /dev/sda4 SWAP 1024MB /dev/sdb4 SWAP 1024MB - - - In our basic example, we are going to set up a simple RAID-1 Mirror, - which requires only two physical disks. - - - 7.6.4. Step-1 - boot rescue cd/floppy - - The redhat installation CD provides a rescue mode which boots into - linux from the CD and mounts any filesystems it can find on your - disks. - - At the lilo prompt type - - lilo: linux rescue - - - - With the setup described above, the installer may ask you which disk - your root filesystem in on, either sda or sdb. Select sda. - - The installer will mount your filesytems in the following way. - - DEVICE MOUNTPOINT TEMPORARY MOUNT POINT - /dev/sda1 / /mnt/sysimage - /dev/sda2 /boot /mnt/sysimage/boot - /dev/sda3 /var /mnt/sysimage/var - /dev/sda6 /home /mnt/sysimage/home - - - - Note: - Please bear in mind other distributions may mount your - filesystems on different mount points, or may require you to mount - them by hand. - - - 7.6.5. Step-2 - create a /etc/raidtab file - - Create the file /mnt/sysimage/etc/raidtab (or wherever your real /etc - file system has been mounted. - - For our test system, the raidtab file would like like this. - - raiddev /dev/md0 - raid-level 1 - nr-raid-disks 2 - nr-spare-disks 0 - chunk-size 4 - persistent-superblock 1 - device /dev/sda1 - raid-disk 0 - device /dev/sdb1 - raid-disk 1 - - raiddev /dev/md1 - raid-level 1 - nr-raid-disks 2 - nr-spare-disks 0 - chunk-size 4 - persistent-superblock 1 - device /dev/sda2 - raid-disk 0 - device /dev/sdb2 - raid-disk 1 - - raiddev /dev/md2 - raid-level 1 - nr-raid-disks 2 - nr-spare-disks 0 - chunk-size 4 - persistent-superblock 1 - device /dev/sda3 - raid-disk 0 - device /dev/sdb3 - raid-disk 1 - - - - Note: - It is important that the devices are in the correct order. ie. - that /dev/sda1 is raid-disk 0 and not raid-disk 1. This instructs the - md driver to sync from /dev/sda1, if it were the other way around it - would sync from /dev/sdb1 which would destroy your filesystem. - - - Now copy the raidtab file from your real root filesystem to the - current root filesystem. - - (rescue)# cp /mnt/sysimage/etc/raidtab /etc/raidtab - - - - 7.6.6. Step-3 - create the md devices - - There are two ways to do this, copy the device files from - /mnt/sysimage/dev or use mknod to create them. The md device, is a - (b)lock device with major number 9. - - (rescue)# mknod /dev/md0 b 9 0 - (rescue)# mknod /dev/md1 b 9 1 - (rescue)# mknod /dev/md2 b 9 2 - - - - 7.6.7. Step-4 - unmount filesystems - - In order to start the raid devices, and sync the drives, it is - necessary to unmount all the temporary filesystems. - - (rescue)# umount /mnt/sysimage/var - (rescue)# umount /mnt/sysimage/boot - (rescue)# umount /mnt/sysimage/proc - (rescue)# umount /mnt/sysimage - - - Please note, you may not be able to umount /mnt/sysimage. This problem - can be caused by the rescue system - if you choose to manually mount - your filesystems instead of letting the rescue system do this automat­ - ically, this problem should go away. - - - 7.6.8. Step-5 - start raid devices - - Because there are filesystems on /dev/sda1, /dev/sda2 and /dev/sda3 it - is necessary to force the start of the raid device. - - (rescue)# mkraid --really-force /dev/md2 - - - - You can check the completion progress by cat'ing the /proc/mdstat - file. It shows you status of the raid device and percentage left to - sync. - - Continue with /boot and / - - (rescue)# mkraid --really-force /dev/md1 - (rescue)# mkraid --really-force /dev/md0 - - - - The md driver syncs one device at a time. - - - 7.6.9. Step-6 - remount filesystems - - Mount the newly synced filesystems back into the /mnt/sysimage mount - points. - - (rescue)# mount /dev/md0 /mnt/sysimage - (rescue)# mount /dev/md1 /mnt/sysimage/boot - (rescue)# mount /dev/md2 /mnt/sysimage/var - - - - 7.6.10. Step-7 - change root - - You now need to change your current root directory to your real root - file system. - - (rescue)# chroot /mnt/sysimage - - - - 7.6.11. Step-8 - edit config files - - You need to configure lilo and /etc/fstab appropriately to boot from - and mount the md devices. - - Note: - The boot device MUST be a non-raided device. The root device - is your new md0 device. eg. - - boot=/dev/sda - map=/boot/map - install=/boot/boot.b - prompt - timeout=50 - message=/boot/message - linear - default=linux - - image=/boot/vmlinuz - label=linux - read-only - root=/dev/md0 - - - - Alter /etc/fstab - - /dev/md0 / ext3 defaults 1 1 - /dev/md1 /boot ext3 defaults 1 2 - /dev/md2 /var ext3 defaults 1 2 - /dev/sda4 swap swap defaults 0 0 - - - - 7.6.12. Step-9 - run LILO - - With the /etc/lilo.conf edited to reflect the new root=/dev/md0 and - with /dev/md1 mounted as /boot, we can now run /sbin/lilo -v on the - chrooted filesystem. - - - 7.6.13. Step-10 - change partition types - - The partition types of the all the partitions on ALL Drives which are - used by the md driver must be changed to type 0xFD. - - Use fdisk to change the partition type, using option 't'. - - - (rescue)# fdisk /dev/sda - (rescue)# fdisk /dev/sdb - - - - Use the 'w' option after changing all the required partitions to save - the partion table to disk. - - - 7.6.14. Step-11 - resize filesystem - - When we created the raid device, the physical partion became slightly - smaller because a second superblock is stored at the end of the - partition. If you reboot the system now, the reboot will fail with an - error indicating the superblock is corrupt. - - Resize them prior to the reboot, ensure that the all md based - filesystems are unmounted except root, and remount root read-only. - - (rescue)# mount / -o remount,ro - - - - You will be required to fsck each of the md devices. This is the - reason for remounting root read-only. The -f flag is required to force - fsck to check a clean filesystem. - - (rescue)# e2fsck -f /dev/md0 - - - - This will generate the same error about inconsistent sizes and - possibly corrupted superblock.Say N to 'Abort?'. - - (rescue)# resize2fs /dev/md0 - - - - Repeat for all /dev/md devices. - - - 7.6.15. Step-12 - checklist - - The next step is to reboot the system, prior to doing this run through - the checklist below and ensure all tasks have been completed. - - · All devices have finished syncing. Check /proc/mdstat - - · /etc/fstab has been edited to reflect the changes to the device - names. - - · /etc/lilo.conf has beeb edited to reflect root device change. - - · /sbin/lilo has been run to update the boot loader. - - · The kernel has both SCSI and RAID(MD) drivers built into the - kernel. - - · The partition types of all partitions on disks that are part of an - md device have been changed to 0xfd. - - · The filesystems have been fsck'd and resize2fs'd. - - - - 7.6.16. Step-13 - reboot - - You can now safely reboot the system, when the system comes up it will - auto discover the md devices (based on the partition types). - - Your root filesystem will now be mirrored. - - - - 7.7. Sharing spare disks between different arrays - - When running mdadm in the follow/monitor mode you can make different - arrays share spare disks. That will surely make you save storage space - without losing the comfort of fallback disks. - - In the world of software RAID, this is a brand new never-seen-before - feature: for securing things to the point of spare disk areas, you - just have to provide one single idle disk for a bunch of arrays. - - With mdadm is running as a daemon, you have an agent polling arrays at - regular intervals. Then, as a disk fails on an array without a spare - disk, mdadm removes an available spare disk from another array and - inserts it into the array with the failed disk. The reconstruction - process begins now in the degraded array as usual. - - To declare shared spare disks, just use the spare-group parameter when - invoking mdadm as a daemon. - - - - 7.8. Pitfalls - - Never NEVER never re-partition disks that are part of a running RAID. - If you must alter the partition table on a disk which is a part of a - RAID, stop the array first, then repartition. - - It is easy to put too many disks on a bus. A normal Fast-Wide SCSI bus - can sustain 10 MB/s which is less than many disks can do alone today. - Putting six such disks on the bus will of course not give you the - expected performance boost. It is becoming equally easy to saturate - the PCI bus - remember, a normal 32-bit 33 MHz PCI bus has a - theoretical maximum bandwidth of around 133 MB/sec, considering - command overhead etc. you will see a somewhat lower real-world - transfer rate. Some disks today has a throughput in excess of 30 - MB/sec, so just four of those disks will actually max out your PCI - bus! When designing high-performance RAID systems, be sure to take the - whole I/O path into consideration - there are boards with more PCI - busses, with 64-bit and 66 MHz busses, and with PCI-X. - - More SCSI controllers will only give you extra performance, if the - SCSI busses are nearly maxed out by the disks on them. You will not - see a performance improvement from using two 2940s with two old SCSI - disks, instead of just running the two disks on one controller. - - If you forget the persistent-superblock option, your array may not - start up willingly after it has been stopped. Just re-create the - array with the option set correctly in the raidtab. Please note that - this will destroy the information on the array! - - If a RAID-5 fails to reconstruct after a disk was removed and re- - inserted, this may be because of the ordering of the devices in the - raidtab. Try moving the first "device ..." and "raid-disk ..." pair to - the bottom of the array description in the raidtab file. - - - - 8. Reconstruction - - If you have read the rest of this HOWTO, you should already have a - pretty good idea about what reconstruction of a degraded RAID - involves. Let us summarize: - - · Power down the system - - · Replace the failed disk - - · Power up the system once again. - - · Use raidhotadd /dev/mdX /dev/sdX to re-insert the disk in the array - - · Have coffee while you watch the automatic reconstruction running - - And that's it. - - Well, it usually is, unless you're unlucky and your RAID has been - rendered unusable because more disks than the ones redundant failed. - This can actually happen if a number of disks reside on the same bus, - and one disk takes the bus with it as it crashes. The other disks, - however fine, will be unreachable to the RAID layer, because the bus - is down, and they will be marked as faulty. On a RAID-5 where you can - spare one disk only, loosing two or more disks can be fatal. - - The following section is the explanation that Martin Bene gave to me, - and describes a possible recovery from the scary scenario outlined - above. It involves using the failed-disk directive in your - /etc/raidtab (so for people running patched 2.2 kernels, this will - only work on kernels 2.2.10 and later). - - - 8.1. Recovery from a multiple disk failure - - The scenario is: - - · A controller dies and takes two disks offline at the same time, - - · All disks on one scsi bus can no longer be reached if a disk dies, - - · A cable comes loose... - - In short: quite often you get a temporary failure of several disks - at once; afterwards the RAID superblocks are out of sync and you - can no longer init your RAID array. - - If using mdadm, you could first try to run: - - mdadm --assemble --force - - - If not, there's one thing left: rewrite the RAID superblocks by mkraid - --force - - To get this to work, you'll need to have an up to date /etc/raidtab - - if it doesn't EXACTLY match devices and ordering of the original disks - this will not work as expected, but will most likely completely - obliterate whatever data you used to have on your disks. - - Look at the sylog produced by trying to start the array, you'll see - the event count for each superblock; usually it's best to leave out - the disk with the lowest event count, i.e the oldest one. - - If you mkraid without failed-disk, the recovery thread will kick in - immediately and start rebuilding the parity blocks - not necessarily - what you want at that moment. - - With failed-disk you can specify exactly which disks you want to be - active and perhaps try different combinations for best results. BTW, - only mount the filesystem read-only while trying this out... This has - been successfully used by at least two guys I've been in contact with. - - - - 9. Performance - - This section contains a number of benchmarks from a real-world system - using software RAID. There is some general information about - benchmarking software too. - - Benchmark samples were done with the bonnie program, and at all times - on files twice- or more the size of the physical RAM in the machine. - - The benchmarks here only measures input and output bandwidth on one - large single file. This is a nice thing to know, if it's maximum I/O - throughput for large reads/writes one is interested in. However, such - numbers tell us little about what the performance would be if the - array was used for a news spool, a web-server, etc. etc. Always keep - in mind, that benchmarks numbers are the result of running a - "synthetic" program. Few real-world programs do what bonnie does, and - although these I/O numbers are nice to look at, they are not ultimate - real-world-appliance performance indicators. Not even close. - - For now, I only have results from my own machine. The setup is: - - · Dual Pentium Pro 150 MHz - - · 256 MB RAM (60 MHz EDO) - - · Three IBM UltraStar 9ES 4.5 GB, SCSI U2W - - · Adaptec 2940U2W - - · One IBM UltraStar 9ES 4.5 GB, SCSI UW - - · Adaptec 2940 UW - - · Kernel 2.2.7 with RAID patches - - The three U2W disks hang off the U2W controller, and the UW disk off - the UW controller. - - It seems to be impossible to push much more than 30 MB/s thru the SCSI - busses on this system, using RAID or not. My guess is, that because - the system is fairly old, the memory bandwidth sucks, and thus limits - what can be sent thru the SCSI controllers. - - - 9.1. RAID-0 - - Read is Sequential block input, and Write is Sequential block output. - File size was 1GB in all tests. The tests where done in single-user - mode. The SCSI driver was configured not to use tagged command - queuing. - - - >From this it seems that the RAID chunk-size doesn't make that much of - a difference. However, the ext2fs block-size should be as large as - possible, which is 4kB (eg. the page size) on IA-32. - - | | | | | - |Chunk size | Block size | Read kB/s | Write kB/s | - | | | | | - |4k | 1k | 19712 | 18035 | - |4k | 4k | 34048 | 27061 | - |8k | 1k | 19301 | 18091 | - |8k | 4k | 33920 | 27118 | - |16k | 1k | 19330 | 18179 | - |16k | 2k | 28161 | 23682 | - |16k | 4k | 33990 | 27229 | - |32k | 1k | 19251 | 18194 | - |32k | 4k | 34071 | 26976 | - - 9.2. RAID-0 with TCQ - - This time, the SCSI driver was configured to use tagged command - queuing, with a queue depth of 8. Otherwise, everything's the same as - before. - - | | | | | - |Chunk size | Block size | Read kB/s | Write kB/s | - | | | | | - |32k | 4k | 33617 | 27215 | - - - No more tests where done. TCQ seemed to slightly increase write - performance, but there really wasn't much of a difference at all. - - - 9.3. RAID-5 - - The array was configured to run in RAID-5 mode, and similar tests - where done. - - | | | | | - |Chunk size | Block size | Read kB/s | Write kB/s | - | | | | | - |8k | 1k | 11090 | 6874 | - |8k | 4k | 13474 | 12229 | - |32k | 1k | 11442 | 8291 | - |32k | 2k | 16089 | 10926 | - |32k | 4k | 18724 | 12627 | - - - Now, both the chunk-size and the block-size seems to actually make a - difference. - - - 9.4. RAID-10 - - RAID-10 is "mirrored stripes", or, a RAID-1 array of two RAID-0 - arrays. The chunk-size is the chunk sizes of both the RAID-1 array and - the two RAID-0 arrays. I did not do test where those chunk-sizes - differ, although that should be a perfectly valid setup. - - | | | | | - |Chunk size | Block size | Read kB/s | Write kB/s | - | | | | | - |32k | 1k | 13753 | 11580 | - |32k | 4k | 23432 | 22249 | - - - No more tests where done. The file size was 900MB, because the four - partitions involved where 500 MB each, which doesn't give room for a - 1G file in this setup (RAID-1 on two 1000MB arrays). - - 9.5. Fresh benchmarking tools - - To check out speed and performance of your RAID systems, do NOT use - hdparm. It won't do real benchmarking of the arrays. - - Instead of hdparm, take a look at the tools described here: IOzone and - Bonnie++. - - IOzone is a small, versatile and modern tool to use. It benchmarks - file I/O performance for read, write, re-read, re-write, read - backwards, read strided, fread, fwrite, random read, pread, mmap, - aio_read and aio_write operations. Don't worry, it can run on any of - the ext2, ext3, reiserfs, JFS, or XFS filesystems in OSDL STP. - - You can also use IOzone to show throughput performance as a function - of number of processes and number of disks used in a filesystem, - something interesting when it's about RAID striping. - - Although documentation for IOzone is available in Acrobat/PDF, - PostScript, nroff, and MS Word formats, we are going to cover here a - nice example of IOzone in action: - - iozone -s 4096 - - - This would run a test using a 4096KB file size. - - And this is an example of the output quality IOzone gives - - File size set to 4096 KB - Output is in Kbytes/sec - Time Resolution = 0.000001 seconds. - Processor cache size set to 1024 Kbytes. - Processor cache line size set to 32 bytes. - File stride size set to 17 * record size. - random random bkwd record stride - KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread - 4096 4 99028 194722 285873 298063 265560 170737 398600 436346 380952 91651 127212 288309 292633 - - - Now you just need to know about the feature that makes IOzone useful - for RAID benchmarking: the file operations involving RAID are the read - strided. The example above shows a 380.952Kb/sec. for the read - strided, so you can go figure. - - - Bonnie++ seems to be more targeted at benchmarking single drives that - at RAID, but it can test more than 2Gb of storage on 32-bit machines, - and tests for file creat, stat, unlink operations. - - - - 10. Related tools - - While not described in this HOWTO, some useful tools for Software-RAID - systems have been developed. - - - 10.1. RAID resizing and conversion - - It is not easy to add another disk to an existing array. A tool to - allow for just this operation has been developed, and is available - from http://unthought.net/raidreconf. The tool will allow for - conversion between RAID levels, for example converting a two-disk - RAID-1 array into a four-disk RAID-5 array. It will also allow for - chunk-size conversion, and simple disk adding. - - Please note that this tool is not really "production ready". It seems - to have worked well so far, but it is a rather time-consuming process - that, if it fails, will absolutely guarantee that your data will be - irrecoverably scattered over your disks. You absolutely must keep good - backups prior to experimenting with this tool. - - - 10.2. Backup - - Remember, RAID is no substitute for good backups. No amount of - redundancy in your RAID configuration is going to let you recover week - or month old data, nor will a RAID survive fires, earthquakes, or - other disasters. - - It is imperative that you protect your data, not just with RAID, but - with regular good backups. One excellent system for such backups, is - the Amanda backup system. - - - - 11. Partitioning RAID / LVM on RAID - - RAID devices cannot be partitioned, like ordinary disks can. This can - be a real annoyance on systems where one wants to run, for example, - two disks in a RAID-1, but divide the system onto multiple different - filesystems. A horror example could look like: - - # df -h - Filesystem Size Used Avail Use% Mounted on - /dev/md2 3.8G 640M 3.0G 18% / - /dev/md1 97M 11M 81M 12% /boot - /dev/md5 3.8G 1.1G 2.5G 30% /usr - /dev/md6 9.6G 8.5G 722M 93% /var/www - /dev/md7 3.8G 951M 2.7G 26% /var/lib - /dev/md8 3.8G 38M 3.6G 1% /var/spool - /dev/md9 1.9G 231M 1.5G 13% /tmp - /dev/md10 8.7G 329M 7.9G 4% /var/www/html - - - - 11.1. Partitioning RAID devices - - If a RAID device could be partitioned, the administrator could simply - have created one single /dev/md0 device device, partitioned it as he - usually would, and put the filesystems there. Instead, with today's - Software RAID, he must create a RAID-1 device for every single - filesystem, even though there are only two disks in the system. - - There have been various patches to the kernel which would allow - partitioning of RAID devices, but none of them have (as of this - writing) made it into the kernel. In short; it is not currently - possible to partition a RAID device - but luckily there is another - solution to this problem. - - - 11.2. LVM on RAID - - The solution to the partitioning problem is LVM, Logical Volume - Management. LVM has been in the stable Linux kernel series for a long - time now - LVM2 in the 2.6 kernel series is a further improvement over - the older LVM support from the 2.4 kernel series. While LVM has - traditionally scared some people away because of its complexity, it - really is something that an administrator could and should consider if - he wishes to use more than a few filesystems on a server. - - We will not attempt to describe LVM setup in this HOWTO, as there - already is a fine HOWTO for exactly this purpose. A small example of a - RAID + LVM setup will be presented though. Consider the df output - below, of such a system: - - # df -h - Filesystem Size Used Avail Use% Mounted on - /dev/md0 942M 419M 475M 47% / - /dev/vg0/backup 40G 1.3M 39G 1% /backup - /dev/vg0/amdata 496M 237M 233M 51% /var/lib/amanda - /dev/vg0/mirror 62G 56G 2.9G 96% /mnt/mirror - /dev/vg0/webroot 97M 6.5M 85M 8% /var/www - /dev/vg0/local 2.0G 458M 1.4G 24% /usr/local - /dev/vg0/netswap 3.0G 2.1G 1019M 67% /mnt/netswap - - - "What's the difference" you might ask... Well, this system has only - two RAID-1 devices - one for the root filesystem, and one that cannot - be seen on the df output - this is because /dev/md1 is used as a - "physical volume" for LVM. What this means is, that /dev/md1 acts as - "backing store" for all "volumes" in the "volume group" named vg0. - - All this "volume" terminology is explained in the LVM HOWTO - if you - do not completely understand the above, there is no need to worry - - the details are not particularly important right now (you will need to - read the LVM HOWTO anyway if you want to set up LVM). What matters is - the benefits that this setup has over the many-md-devices setup: - - · No need to reboot just to add a new filesystem (this would - otherwise be required, as the kernel cannot re-read the partition - table from the disk that holds the root filesystem, and re- - partitioning would be required in order to create the new RAID - device to hold the new filesystem) - - · Resizing of filesystems: LVM supports hot-resizing of volumes (with - RAID devices resizing is difficult and time consuming - but if you - run LVM on top of RAID, all you need in order to resize a - filesystem is to resize the volume, not the underlying RAID - device). With a filesystem such as XFS, you can even resize the - filesystem without un-mounting it first (!) Ext3 does not (as of - this writing) support hot-resizing, you can, however, resize the - filesystem without rebooting, you just need to un-mount it first. - - · Adding new disks: Need more storage? Easy! Simply insert two new - disks in your system, create a RAID-1 on top of them, make your new - /dev/md2 device a physical volume and add it to your volume group. - That's it! You now have more free space in your volume group for - either growing your existing logical volumes, or for adding new - ones. - - All in all - for servers with many filesystems, LVM (and LVM2) is - definitely a fairly simple solution which should be considered for use - on top of Software RAID. Read on in the LVM HOWTO if you want to learn - more about LVM. - - - - 12. Credits - - The following people contributed to the creation of this - documentation: - - - · Mark Price and IBM - - · Steve Boley of Dell - - · Damon Hoggett - - · Ingo Molnar - - · Jim Warren - - · Louis Mandelstam - - · Allan Noah - - · Yasunori Taniike - - · Martin Bene - - · Bennett Todd - - · Kevin Rolfes - - · Darryl Barlow - - · Brandon Knitter - - · Hans van Zijst - - · Matthew Mcglynn - - · Jimmy Hedman - - · Tony den Haan - - · The Linux-RAID mailing list people - - · The ones I forgot, sorry :) - - Please submit corrections, suggestions etc. to the author. It's the - only way this HOWTO can improve. - - 13. Changelog - - 13.1. Version 1.1 - - - · New sub-section: "Downloading and installing the RAID tools" - - · Grub support at section "Booting on RAID" - - · Mention LVM on top of RAID - - · Other minor corrections. - - - diff --git a/LDP/guide/docbook/Linux-Networking/TCP-IP.xml b/LDP/guide/docbook/Linux-Networking/TCP-IP.xml index 304fcaf5..af83585b 100644 --- a/LDP/guide/docbook/Linux-Networking/TCP-IP.xml +++ b/LDP/guide/docbook/Linux-Networking/TCP-IP.xml @@ -50,8 +50,9 @@ http://www.sangoma.com/fguide.htm www.citap.com/documents/tcp-ip/tcpip012.htm and of course in the TCP/IP Request For Comments (RFC) publication - Layer 1 and 2 - Network Access + + These two layers (Physical/Datalink) deal with pure hardware (ie. wires, satellite links, NICs, hubs.....) and is roughly synonymous with that of the Physical layer of the OSI Network Layer Model. The protocols in this @@ -70,8 +71,9 @@ delivering data between two devices on the same network. Node physical addresses are used to accomplish delivery on the local network. - Layer 3 - Internet + + The Internetwork Layer it is the heart of TCP/IP and the most important protocol. IP provides the basic packet delivery service on which TCP/IP networks are built. All protocols, in the layers above and below IP, use the Internet Protocol to @@ -91,8 +93,9 @@ internetwork routing. The Address Resolution Protocol (ARP) enable IP to identif physical address that matches a given IP address. - Layer 4 - Transport + + The transport layer is similar to the OSI transport model, but with elements of the OSI session layer functionality. This layer provides an application layer delivery service. The two protocols found at the transport layer are @@ -158,8 +161,9 @@ The delivering of information from an application on one computer to an application on another computer. - Layer 5 - Process/Application + + This layer is broadly equivalent to the application, presentation, session layers of the OSI model. It includes all processes that use the transport layer protocols to deliver data. There are many applications