This commit is contained in:
gferg 2004-11-08 13:35:03 +00:00
parent 51ade963f9
commit 73eacb5375
4 changed files with 433 additions and 189 deletions

View File

@ -805,7 +805,7 @@ partition images to and from a TFTP server. </Para>
Cluster-HOWTO</ULink>,
<CiteTitle>Linux Cluster HOWTO</CiteTitle>
</Para><Para>
<CiteTitle>Updated: Aug 2004</CiteTitle>.
<CiteTitle>Updated: Nov 2004</CiteTitle>.
How to set up high-performance Linux computing clusters. </Para>
</ListItem>
@ -2243,7 +2243,7 @@ for different (network) environments. </Para>
Large-Disk-HOWTO</ULink>,
<CiteTitle>Large Disk HOWTO</CiteTitle>
</Para><Para>
<CiteTitle>Updated: Aug 2001</CiteTitle>.
<CiteTitle>Updated: Nov 2004</CiteTitle>.
All about disk geometry and the 1024 cylinder limit for disks. </Para>
</ListItem>

View File

@ -91,7 +91,7 @@ laptop. </Para>
Cluster-HOWTO</ULink>,
<CiteTitle>Linux Cluster HOWTO</CiteTitle>
</Para><Para>
<CiteTitle>Updated: Aug 2004</CiteTitle>.
<CiteTitle>Updated: Nov 2004</CiteTitle>.
How to set up high-performance Linux computing clusters. </Para>
</ListItem>
@ -835,7 +835,7 @@ How to copy a Linux system from one hard disk to another. </Para>
Large-Disk-HOWTO</ULink>,
<CiteTitle>Large Disk HOWTO</CiteTitle>
</Para><Para>
<CiteTitle>Updated: Aug 2001</CiteTitle>.
<CiteTitle>Updated: Nov 2004</CiteTitle>.
All about disk geometry and the 1024 cylinder limit for disks. </Para>
</ListItem>

View File

@ -1,3 +1,4 @@
<!doctype Linuxdoc system>
<article>
@ -5,7 +6,7 @@
<title> Linux Cluster HOWTO </title>
<author>Ram Samudrala <tt>(me@ram.org)</tt> </author>
<date> v1.3, August 4, 2004 </date>
<date> v1.31, November 7, 2004 </date>
<abstract>
How to set up high-performance Linux computing clusters.
@ -523,10 +524,11 @@ name="http://www.shorewall.net">) for the firewall. </p>
<sect1> Parallel processing software
<p> We use our own software for parallelising applications but have
experimented with PVM and MPI. In my view, the overhead for these
pre-packaged programs is too high. I recommend writing
application-specific code for the tasks you perform (that's one
person's view). </p>
experimented with <url url="http://www.csm.ornl.gov/pvm/pvm_home.html"
name="PVM"> and <url url="http://www-unix.mcs.anl.gov/mpi/mpich/"
name="MPI">. In my view, the overhead for these pre-packaged programs
is too high. I recommend writing application-specific code for the
tasks you perform (that's one person's view). </p>
</sect1>

View File

@ -2,11 +2,11 @@
<article>
<title>Large Disk HOWTO
<author>Andries Brouwer, <tt/aeb@cwi.nl/
<date>v2.2x, 30 Aug 2001
<date>v2.5, 2004-11-01
<abstract>
All about disk geometry and the 1024 cylinder limit for disks.
All about disk geometry and the 1024 cylinder and other limits for disks.
<nidx>HOWTOs!large disk</nidx>
<nidx>HOWTOs!disk, large</nidx>
</abstract>
@ -17,77 +17,53 @@ For the most recent version of this text, see
name="www.win.tue.nl">.
<sect>
The problem
Large disks
<p>
<nidx>disk drives!interaction with BIOS</nidx>
<nidx>BIOS!interaction with disk drives</nidx>
Suppose you have a disk with more than 1024 cylinders.
Suppose moreover that you have an operating system that uses the
old INT13 BIOS interface to disk I/O.
Then you have a problem, because this interface
uses a 10-bit field for the cylinder on which the I/O
is done, so that cylinders 1024 and past are inaccessible.
<p>
Fortunately, Linux does not use the BIOS, so there is no problem.
<p>
Well, except for two things:
<p>
(1) When you boot your system,
Linux isn't running yet and cannot save you from BIOS problems.
This has some consequences for LILO and similar boot loaders.
<p>
(2) It is necessary for all operating systems that use one disk
to agree on where the partitions are. In other words, if you use
both Linux and, say, DOS on one disk, then both must interpret the
partition table in the same way. This has some consequences for
the Linux kernel and for <tt/fdisk/.
<p>
Below a rather detailed description of all relevant details.
Note that I used kernel version 2.0.8 source as a reference.
Other versions may differ a bit.
<sect>
Summary
<p>
You got a new large disk. What to do? Well, on the software side:
use <tt/fdisk/ (or, better, <tt/cfdisk/) to create partitions,
and then <tt/mke2fs/ to create a filesystem, and then <tt/mount/
to attach the new filesystem to the big file hierarchy.
You got a new disk. What to do? Well, on the software side:
use <tt/fdisk/ or <tt/cfdisk/ to create partitions,
and then <tt/mke2fs/ or <tt/mkreiserfs/ or so to create a filesystem,
and then <tt/mount/ to attach the new filesystem to the big file hierarchy.
Make sure you have relatively recent versions of these utilities -
often old versions have problems handling large disks.
<p>
You need not read this HOWTO since there are <em/no/ problems
with large hard disks these days. The great majority of
apparent problems is caused by people who think there might
be a problem and install a disk manager, or go into <tt/fdisk/
expert mode, or specify explicit disk geometries to LILO
or on the kernel command line.
with large hard disks these days.
<p>
However, typical problem areas are: (i) ancient hardware,
(ii) several operating systems on the same disk, and sometimes
(iii) booting.
Long ago, disks were large when they had a capacity larger than
528 MB, or than 8.4 GB, or than 33.8 GB. These days the interesting
limit is 137 GB. In all cases, sufficiently recent Linux kernels
handle the disk fine.
<p>
Advice:
Sometimes booting requires some care, since Linux cannot help you
when it isn't running yet. But again, with a sufficiently recent
BIOS and boot loader there are no problems.
Most of the text below will treat the cases of
(i) ancient hardware,
(ii) broken hardware or BIOS,
(iii) several operating systems on the same disk,
(iv) booting old systems.
<p>
<bf>Advice</bf>
For large SCSI disks: Linux has supported them from very early on.
No action required.
For large IDE disks (over 8.4 GB): get a recent stable kernel
(2.0.34 or later). Usually, all will be fine now,
especially if you were wise enough not to ask the BIOS
for disk translations like LBA and the like.
For large IDE disks (over 8.4 GB): make sure your kernel is 2.0.34 or later.
For very large IDE disks (over 33.8 GB): see
<ref id="verylarge" name="IDE problems with 34+ GB disks"> below.
For large IDE disks (over 33.8 GB): make sure your kernel is
2.0.39/2.2.14/2.3.21 or later.
If LILO hangs at boot time, also specify
<tt><ref id="linear" name="linear"></tt> in the
configuration file <tt>/etc/lilo.conf</tt>.
(And if you did have <tt>linear</tt>, try without it.)
If you have a recent LILO (version 21.4 or later),
the keyword <tt>lba32</tt> will usually allow booting from
anywhere on the disk, that is, the 1024 cylinder limit is gone.
(Of course, LILO is a bit fragile, and the use of a different
bootloader might be more convenient.)
For large IDE disks (over 137 GB): make sure your kernel is
2.4.19/2.5.3 or later.
If the kernel boots fine, and the boot messages indicate that it
recognizes the disk correctly, but there are problems with utilities,
upgrade the utilities.
If <ref id="LILO" name="LILO"> hangs at boot time, make sure you have
version 21.4 or later, and specify the keyword <tt>lba32</tt>
in the configuration file <tt>/etc/lilo.conf</tt>. With an older version
of LILO, try both with and without the <tt>linear</tt> keyword.
There may be geometry problems that can be solved by giving
an explicit geometry to kernel/LILO/fdisk.
@ -106,11 +82,19 @@ support UDMA66. In such a case every attempt to read will
fail, and reading the partition table is the first thing
the kernel does. Make sure no UDMA66 is used.
If the BIOS hangs at boot time because of a large disk, and
flashing a newer version is not an option, take the disk out
of the BIOS setup. If you have to boot from the disk, look
whether a capacity clipping jumper helps.
If you think something is wrong with the size of your disk,
make sure that you are not confusing binary and decimal <ref id="units">,
and realize that the free space that <tt/df/ reports on an empty disk
is a few percent smaller than the partition size, because there
is administrative overhead.
is administrative overhead. Software that does not understand
48-bit addressing will view a 137+ GB disk as having a capacity
of 137 GB. When a capacity clipping <ref id="jumpers" name="jumper">
is present, a larger disk may have been clipped to 33 GB or to 2 GB.
If for a removable drive the kernel reports two different sizes,
then one is found from the drive, and the other from the disk/floppy.
@ -118,9 +102,13 @@ This second value will be zero when the drive has no media.
<p>
Now, if you still think there are problems, or just are curious,
read on.
<p>
Below a rather detailed description of all relevant details.
I used kernel version 2.0.8 source as a reference.
Other versions may differ a bit.
<p>
<sect>
Units and Sizes
Units
<label id="units">
<p>
<nidx>units!megabyte</nidx>
@ -145,7 +133,7 @@ and 1 TiB is 1099511627776 bytes (1.1 TB).
<p>
Quite correctly, the disk drive manufacturers follow the SI norm
and use the decimal units. However, Linux kernel boot messages
(for not-so-recent kernels) and some fdisk-type programs
(for not-so-recent kernels) and some old fdisk-type programs
use the symbols MB and GB for binary, or
mixed binary-decimal units. So, before you think your disk is
smaller than was promised when you bought it, compute first the
@ -154,7 +142,7 @@ actual size in decimal units (or just in bytes).
Concerning terminology and abbreviation for binary units,
<htmlurl name="Knuth" url="http://www-cs-staff.stanford.edu/~knuth/">
has an alternative <htmlurl name="proposal"
url="http://www-cs-staff.stanford.edu/~knuth/news.html">, namely
url="http://www-cs-staff.stanford.edu/~knuth/news99.html">, namely
to use KKB, MMB, GGB, TTB, PPB, EEB, ZZB, YYB and to call these
<it>large kilobyte</it>, <it>large megabyte</it>, ... <it>large yottabyte</it>.
He writes: `Notice that doubling the letter connotes both
@ -163,7 +151,56 @@ binary-ness and large-ness.' This is a good proposal -
however the only important thing is to stress that a megabyte
has precisely 1000000 bytes, and that some other term and abbreviation
is required if you mean something else.
<p>
<sect>
Disk Access
<p>
Disk access is done in units called <it>sectors</it>.
In order to read or write something from or to the disk, we have
to specify the position on the disk, for example by giving the
sector number.
If the disk is a SCSI disk, then this sector number goes directly
into the SCSI command and is understood by the disk.
If the disk is an IDE disk using LBA, then precisely the same holds.
But if the disk is old, RLL or MFM or IDE from before the LBA times,
then the disk hardware expects a triple (cylinder,head,sector) to
designate the desired spot on the disk.
<p>
<sect1>
Cylinders, heads and sectors
<p>
A disk has sectors numbered 0, 1, 2, ...
This is called <it>LBA addressing</it>.
<p>
In ancient times, before the advent of IDE disks,
disks had a <it>geometry</it> described by three constants
C, H, S: the number of cylinders, the number of heads,
the number of sectors per track.
The address of a sector was given by three numbers:
<it>c</it>, <it>h</it>, <it>s</it>: the cylinder number
(between 0 and C-1), the head number (between 0 and H-1),
and the sector number within the track (between 1 and S), where
for some mysterious reason <it>c</it> and <it>h</it> count from 0,
but <it>s</it> counts from 1. This is called <it>CHS addressing</it>.
<p>
No disk manufactured less than ten years ago has a geometry, but
this ancient 3D sector addressing is still used by the INT13
BIOS interface (with fantasy numbers C, H, S
unrelated to any physical reality).
<p>
The correspondence between the linear numbering and this 3D notation
is as follows: for a disk with C cylinders, H heads and S sectors/track
position (<it>c</it>,<it>h</it>,<it>s</it>) in 3D or CHS notation
is the same as position
<it>c</it><tt/*/H<tt/*/S + <it>h</it><tt/*/S + (<it>s</it>-1)
in linear or LBA notation.
<p>
Consequently, in order to access a very old non-SCSI disk, we need to know
its <em/geometry/, that is, the values of C, H and S.
(And if you don't know, there is a lot of good information on
<htmlurl url="http://www.thetechpage.com/cgi-bin/default.cgi"
name="www.thetechpage.com">.)
<p>
<sect1>
Sectorsize
<p>
@ -188,73 +225,58 @@ There is an industry convention to give C/H/S=16383/16/63
for disks larger than 8.4 GB, and the disk size can no longer
be read off from the C/H/S values reported by the disk.
<sect>
Disk Access
<p>
In order to read or write something from or to the disk, we have
to specify a position on the disk, for example by giving a sector
or block number.
If the disk is a SCSI disk, then this sector number goes directly
into the SCSI command and is understood by the disk.
If the disk is an IDE disk using LBA, then precisely the same holds.
But if the disk is old, RLL or MFM or IDE from before the LBA times,
then the disk hardware expects a triple (cylinder,head,sector) to
designate the desired spot on the disk.
<p>
The correspondence between the linear numbering and this 3D notation
is as follows: for a disk with C cylinders, H heads and S sectors/track
position (c,h,s) in 3D or CHS notation is the same as position
c<tt/*/H<tt/*/S + h<tt/*/S + (s-1) in linear or LBA notation.
(The minus one is because traditionally sectors are counted from 1,
not 0, in this 3D notation.)
<p>
Consequently, in order to access a very old non-SCSI disk, we need to know
its <em/geometry/, that is, the values of C, H and S.
(And if you don't know, there is a lot of good information on
<htmlurl url="http://www.thetechpage.com/cgi-bin/default.cgi"
name="www.thetechpage.com">.)
<sect1>
BIOS Disk Access and the 1024 cylinder limit
The 1024 cylinder and 8.5 GB limits
<p>
Linux does not use the BIOS, but some other systems do.
The BIOS, which predates LBA times, offers with INT13
disk I/O routines that have (c,h,s) as input.
(More precisely: <tt/AH/ selects the function to perform,
<tt/CH/ is the low 8 bits of the cylinder number, <tt/CL/
The old INT13 BIOS interface to disk I/O uses 24 bits to address
a sector: 10 bits for the cylinder, 8 bits for the head, and 6 bits
for the sector number within the track (counting from 1).
This means that this interface cannot address more than
1024*256*63 sectors, which is 8.5 GB (with 512-byte sectors).
And if the (fantasy) geometry specified for the disk has fewer
than 1024 cylinders, or 256 heads, or 63 sectors per track,
then this limit will be less.
<p>
(More precisely: with INT 13, AH selects the function to perform,
CH is the low 8 bits of the cylinder number, CL
has in bits 7-6 the high two bits of the cylinder number
and in bits 5-0 the sector number, <tt/DH/ is the head number,
and <tt/DL/ is the drive number (80h or 81h).
and DL is the drive number (80h or 81h).
This explains part of the layout of the partition table.)
<p>
Thus, we have CHS encoded in three bytes,
with 10 bits for the cylinder number, 8 bits for the head number,
and 6 bits for the track sector number (numbered 1-63).
It follows that cylinder numbers can range from 0 to 1023
and that no more than 1024 cylinders are BIOS addressable.
This state of affairs was rectified when the so-called Extended INT13
functions were introduced. A modern BIOS has no problems accessing
large disks.
<p>
DOS and Windows software did not change when IDE disks
with LBA support were introduced, so DOS and Windows
continued needing a disk geometry, even when this was
no longer needed for the actual disk I/O, but only for talking
to the BIOS. This again means that Linux needs the geometry
in those places where communication with the BIOS or with
other operating systems is required, even on a modern disk.
(More precisely: DS:SI points at a 16-byte Disk Address Packet
that contains an 8-byte starting absolute block number.)
<p>
This state of affairs lasted for four years or so,
and then disks appeared on the market that could not be
addressed with the INT13 functions (because the 10+8+6=24
bits for (c,h,s) can address not more than 8.5 GB) and a new
BIOS interface was designed: the so-called Extended INT13
functions, where DS:SI points at a 16-byte Disk Address Packet
that contains an 8-byte starting absolute block number.
Linux does not use the BIOS, so does (and did) not have this problem.
<p>
However, this geometry stuff plays a role in the interpretation
of partition tables, so if Linux shares a disk with for example DOS,
then it needs to know what geometry DOS will think the disk has.
It also plays a role at boot time, where the BIOS has to load
a boot loader, and the boot loader has to load the operating system.
<p>
Very slowly the Microsoft world is moving towards using these
Extended INT13 functions. Probably a few years from now
no modern system on modern hardware will need the concept
of `disk geometry' anymore.
<sect1>
The 137 GB limit
<p>
The old ATA standard describes how to address a sector on an IDE disk
using 28 bits (8 bits for the sector, 4 for the head, 16 for the cylinder).
This means that an IDE disk can have at most 2^28 addressable sectors
With 512-byte sectors this is 2^37 bytes, that is, 137.4 GB.
<p>
The ATA-6 standard includes a specification how to address
past this 2^28 sector boundary. The new standard allows addressing
of 2^48 sectors. There is support in recent Linux kernels that
have incorporated Andre Hedrick's IDE patch, for example
2.4.18-pre7-ac3 and 2.5.3.
<p>
Maxtor sells 160 GB IDE disks since Fall 2001.
An old kernel will treat such disks as 137.4 GB disks.
<p>
<sect>
History of BIOS and IDE limits
<p>
<descrip>
@ -262,7 +284,8 @@ History of BIOS and IDE limits
At most 65536 cylinders (numbered 0-65535), 16 heads (numbered 0-15),
255 sectors/track (numbered 1-255), for a maximum total capacity of
267386880 sectors (of 512 bytes each), that is, 136902082560 bytes (137 GB).
This is not yet a problem (in 1999), but will be a few years from now.
In Sept 2001, the first drives larger than this (160 GB Maxtor Diamondmax)
appeared.
<p>
<tag/BIOS Int 13 - the 8.5 GB limit/
At most 1024 cylinders (numbered 0-1023), 256 heads (numbered 0-255),
@ -338,26 +361,32 @@ cylinders C as total capacity divided by (H<tt/*/63<tt/*/512).)
The next hurdle comes with a size over 33.8 GB.
The problem is that with the default 16 heads and 63 sectors/track
this corresponds to a number of cylinders of more than 65535, which
does not fit into a short. Most BIOSes in existence today can't handle
such disks. (See, e.g., <htmlurl name="Asus upgrades"
does not fit into a short. Many BIOSes couldn't handle such disks.
(See, e.g., <htmlurl name="Asus upgrades"
url="http://www.asus.com/Products/Motherboard/bios_slot1.html">
for new flash images that work.)
Linux kernels older than 2.2.14 / 2.3.21 need a patch.
See <ref id="verylarge" name="IDE problems with 34+ GB disks"> below.
<p>
<tag/The 137 GB limit (Sept 2001)/
As mentioned above, the old ATA protocol uses 16+4+8 = 28 bits
to specify the sector number, and hence cannot address more than
2^28 sectors. ATA-6 describes an extension that allows the addressing
of 2^48 sectors, a million times as much.
There is support in very recent kernels.
<p>
<tag/The 2 TiB limit/
With 32-bit sector numbers, one can address 2 TiB.
A lot of software will have to be rewritten once disks get larger.
</descrip>
For another discussion of this topic, see
<htmlurl
url="http://www.maxtor.com/products/DiamondMax/techsupport/Q&amp;A/30004.html"
name="Breaking the Barriers">, and, with more details,
<htmlurl url="http://www.maxtor.com/technology/whitepapers/63001.html"
name="IDE Hard Drive Capacity Barriers">.
Hard drives over 8.4 GB are supposed to report their geometry as 16383/16/63.
This in effect means that the `geometry' is obsolete, and the total disk
size can no longer be computed from the geometry.
size can no longer be computed from the geometry, but is found in the
LBA capacity field returned by the <ref id="identify" name="IDENTIFY command">.
Hard drives over 137.4 GB are supposed to report an LBA capacity of
0xfffffff = 268435455 sectors (137438952960 bytes). Now the actual
disk size is found in the new 48-capacity field.
<sect>
Booting
@ -390,6 +419,7 @@ Most systems from 1998 or later will have a modern BIOS.
<p>
<sect1>LILO and the `lba32' and `linear' options
<label id="LILO">
<label id="linear">
<p>
Executive summary: If you use LILO as boot loader, make sure you have
@ -451,6 +481,16 @@ floppy or in the MBR and boot from anywhere on any IDE drive
<htmlurl name="//metalab.unc.edu/pub/Linux/system/boot/loaders/"
url="//metalab.unc.edu/pub/Linux/system/boot/loaders/">.
<sect1>
Other boot loaders
<p>
LILO is a bit fragile, it requires the discipline of running
<tt>/sbin/lilo</tt> each time one installs a new kernel.
Some other boot loaders do not have this disadvantage.
Especially <tt/grub/ is popular these days; a major
disadvantage is that it does not support the
<tt>lilo -R label</tt> function.
<sect>
Disk geometry, partitions and `overlap'
<label id="overlap">
@ -483,9 +523,9 @@ and by the 32-bit <tt/start/ and <tt/length/ fields.
Linux only uses the <tt/start/ and <tt/length/ fields, and can
therefore handle partitions of not more than 2^32 sectors,
that is, partitions of at most 2 TiB. That is thirty times
that is, partitions of at most 2 TiB. That is twelve times
larger than the disks available today, so maybe it will be
enough for the next six years or so.
enough for the next five years or so.
(So, partitions can be very large, but there is a serious
restriction in that a file in an ext2 filesystem on hardware
with 32-bit integers cannot be larger than 2 GiB.)
@ -543,6 +583,7 @@ Since "disk geometry" is something without objective existence,
different operating systems will invent different geometries
for the same disk. One often sees a translated geometry like */255/63
used by one and an untranslated geometry like */16/63 used by another OS.
(People tell me Windows NT uses */64/32 while Windows 2K uses */255/63.)
Thus, it may be impossible to align partitions to cylinder boundaries
according to each of the the various ideas about the size of a cylinder
that one's systems have. Also different Linux kernels may assign
@ -650,10 +691,14 @@ The effect is more or less the same as with a translating BIOS -
but especially when running several different operating systems
on the same disk, disk managers can cause a lot of trouble.
Linux does support OnTrack Disk Manager since version 1.3.14,
Linux did support OnTrack Disk Manager since version 1.3.14,
and EZ-Drive since version 1.3.29. Some more details are
given below.
given in the next section.
In 2.5.70 the automatic disk manager support was removed.
Instead, two boot options were added: "hda=remap" to do
the EZ-Drive remapping of sector 0 to sector 1, and
"hda=remap63" to do the OnTrack Disk Manager shift over 63 sectors.
<sect>
Kernel disk translation for IDE disks
@ -753,6 +798,16 @@ Recent kernels (since 2.3.21) recognize boot parameters like "hda=remap" and
the contents of the partition table. The "hdX=noremap" boot parameter also
avoids the OnTrack Disk Manager shift.
<sect1>Since 2.5.70: boot parameters<p>
In 2.5.70 the automatic disk manager support was removed.
Instead, two boot options were added: "hda=remap" to do
the EZ-Drive remapping of sector 0 to sector 1, and
"hda=remap63" to do the OnTrack Disk Manager shift over 63 sectors.
This also means that it no longer is a problem to get rid of
a disk manager.
<sect>
Consequences
<p>
@ -765,9 +820,11 @@ LILO as the geometry that will enable successful interaction
with the BIOS at boot time. (Usually these two coincide.)
How does <tt/fdisk/ know about the geometry?
It asks the kernel, using the <tt/HDIO_GETGEO/ ioctl.
But the user can override the geometry interactively
or on the command line.
There are three sources of information. First, if the user has specified
the geometry interactively or on the command line, then we take
the user input. Second, if it is possible to guess the geometry used
from the partition table, then we use that. Third, when nothing else
is available, <tt/fdisk/ asks the kernel, using the <tt/HDIO_GETGEO/ ioctl.
How does LILO know about the geometry?
It asks the kernel, using the <tt/HDIO_GETGEO/ ioctl.
@ -789,6 +846,10 @@ line in <tt>/etc/lilo.conf</tt> (see lilo.conf(5)).
And otherwise the kernel will guess, possibly using values
obtained from the BIOS or the hardware.
Note that values guessed by the kernel are very unreliable.
The kernel does not have a good way of finding out what values
the BIOS uses, or indeed whether the disk is known to the BIOS at all.
It is possible (since Linux 2.1.79) to change the kernel's ideas
about the geometry by using the <tt>/proc</tt> filesystem.
For example
@ -803,6 +864,30 @@ For example
</verb></tscreen>
This is especially useful if you need so many boot parameters
that you overflow LILO's (very limited) command line length.
(It also helps if you want to influence a utility that gets its
idea of the geometry from the kernel via the HDIO_GETGEO ioctl.)
Since Linux 2.6.5 the kernel will (when compiled with CONFIG_EDD)
ask the BIOS for legacy_cylinders, legacy_heads, legacy_sectors
using INT 13/AH=08h. The values obtained are made available in
<tt>/sys/firmware/edd/int13_dev{80,81,82,83}/legacy_*</tt>.
In 2.6.5 the files were <tt>legacy_{cylinders,heads,sectors}</tt>
(with contents in hex, e.g. 0xfe for 254), but those names are
confusing, and in 2.6.7 they were changed to <tt>legacy_max_cylinder</tt>,
<tt>legacy_max_head</tt>, <tt>legacy_sectors_per_track</tt>
(with contents in decimal).
A geometry like C/H/S=1000/255/63 is found here as 999, 254, 63.
<tscreen><verb>
# insmod edd.ko
# cd /sys/firmware/edd/int13_dev83
# cat legacy_max_head
254
# cat sectors
120064896
#
</verb></tscreen>
Thus, we see here a disk with 255 heads and 120064896 sectors in all.
Careful comparison shows that this is <tt>/dev/hdf</tt>.
How does the BIOS know about the geometry?
The user may have specified it in the CMOS setup.
@ -859,24 +944,25 @@ How does one know the right total capacity? For example,
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=71346240
</verb></tscreen>
gives two ways of finding the total number of sectors 71346240.
The kernel output
Recent kernels also give the precise size in the boot messages:
<tscreen><verb>
# dmesg | grep hdc
...
hdc: Maxtor 93652U8, 34837MB w/2048kB Cache, CHS=70780/16/63
hdc: [PTBL] [4441/255/63] hdc1 hdc2 hdc3! hdc4 < hdc5 > ...
# dmesg | grep hde
hde: Maxtor 93652U8, ATA DISK drive
hde: 71346240 sectors (36529 MB) w/2048KiB Cache, CHS=70780/16/63
hde: hde1 hde2 hde3 < hde5 > hde4
hde2: <bsd: hde6 hde7 hde8 hde9 >
</verb></tscreen>
tells us about (at least) 34837*2048=71346176 and about (at least)
70780*16*63=71346240 sectors. In this case the second value happens
to be precisely correct, but in general both may be rounded down.
This is a good way to approximate the disk size when <tt/hdparm/
is unavailable. Never give a too large value for <it/cyls/!
In the case of SCSI disks the precise number of sectors is given
Older kernels only give MB and CHS. In general the CHS value is
rounded down, so that the above output tells us that there are
at least 70780*16*63=71346240 sectors. In this example that happens
to be the precise value. The MB value may be rounded instead of
truncated, and in old kernels may be `binary' (MiB) instead of decimal.
Note the agreement between the kernel size in MB and the Maxtor model number.
Also in the case of SCSI disks the precise number of sectors is given
in the kernel boot messages:
<tscreen><verb>
SCSI device sda: hdwr sector= 512 bytes. Sectors= 17755792 [8669 MB] [8.7 GB]
SCSI device sda: 17755792 512-byte hdwr sectors (9091 MB)
</verb></tscreen>
(and MB, GB are rounded, not rounded down, and `binary').
<sect>Details<p>
<sect1>IDE details - the seven geometries<p>
@ -913,6 +999,9 @@ seen by the BIOS. This means that, e.g., in an IDE-only system where
hdb is not given in the Setup, the geometries reported by the BIOS
for the first and second disk will apply to hda and hdc.
In order to avoid such confusion, since Linux 2.5.51 G_bios is
not used anymore.
<sect2>The IDENTIFY DRIVE command
<label id="identify">
<p>
@ -922,9 +1011,9 @@ This contains lots of technical stuff.
Let us only describe here what plays a role in geometry matters.
The words are numbered 0-255.
<p>
We find three pieces of information here: DefaultCHS (words 1,3,6),
CurrentCHS (words 54-58) and LBAcapacity (words 60-61).
We find four pieces of information here: DefaultCHS (words 1,3,6),
CurrentCHS (words 54-58) and LBAcapacity (words 60-61) and
48-bit capacity (words 100-103).
<p><table><tabular ca="c|l">
| Description | Example @@
@ -932,17 +1021,20 @@ CurrentCHS (words 54-58) and LBAcapacity (words 60-61).
1 | Default number of cylinders | 16383 @
3 | Default number of heads | 16 @
6 | Default number of sectors per track | 63 @@
10-19 | Serial number (in ASCII) | K8033FEC @
23-26 | Firmware revision (in ASCII) | DA620CQ0 @
27-46 | Model name (in ASCII) | Maxtor 54098U8 @@
10-19 | Serial number (in ASCII) | G8067TME @
23-26 | Firmware revision (in ASCII) | GAK&1B0 @
27-46 | Model name (in ASCII) | Maxtor 4G160J8 @@
49 | bit field: bit 9: LBA supported | 0x2f00 @@
53 | bit field: bit 0: words 54-58 are valid | 0x0007 @
54 | Current number of cylinders | 16383 @
55 | Current number of heads | 16 @
56 | Current number of sectors per track | 63 @
57-58 | Current total number of sectors | 16514064 @@
60-61 | Default total number of sectors | 80041248 @@
255 | Checksum and signature (0xa5) | 0xf9a5 @@
57-58 | Current LBA capacity | 16514064 @@
60-61 | Default LBA capacity | 268435455 @@
82-83 | Command sets supported | 7c69 4f09 @@
85-86 | Command sets enabled | 7c68 0e01 @@
100-103 | Maximum user LBA for 48-bit addressing | 320173056 @@
255 | Checksum and signature (0xa5) | 0x44a5 @@
</tabular></table>
In the ASCII strings each word contains two characters,
@ -1113,12 +1205,15 @@ This has the effect of producing a (C,H,S) with C at most 1024
and S at most 62.
<sect>
Clipped disks
<p>
<sect1>
The Linux IDE 8 GiB limit
<p>
The Linux IDE driver gets the geometry and capacity of a disk
(and lots of other stuff) by using an
<ref id="identify" name="ATA IDENTIFY"> request.
Until recently the driver would not believe the returned value
Linux kernels older than 2.0.34/2.1.90 would not believe the returned value
of lba_capacity if it was more than 10% larger than the capacity
computed by C<tt/*/H<tt/*/S. However, by industry agreement
large IDE disks (with more than 16514064 sectors)
@ -1126,7 +1221,7 @@ return C=16383, H=16, S=63, for a total of 16514064 sectors (7.8 GB)
independent of their actual size, but give their actual size in
lba_capacity.
<p>
Recent Linux kernels (2.0.34, 2.1.90) know about this
Since versions 2.0.34/2.1.90, Linux kernels know about this
and do the right thing. If you have an older Linux kernel and do
not want to upgrade, and this kernel only sees 8 GiB of a much larger disk,
then try changing the routine <tt/lba_capacity_is_ok/ in
@ -1184,8 +1279,8 @@ RawCHS=16383/15/63 and LBAsects=19807200. I use 20960/15/63 to
get the full capacity.'
For the jumper settings, see
<htmlurl
name="http://www.storage.ibm.com/techsup/hddtech/hddtech.htm"
url="http://www.storage.ibm.com/techsup/hddtech/hddtech.htm">.
name="http://www.hitachigst.com/hdd/support/jumpers.htm"
url="http://www.hitachigst.com/hdd/support/jumpers.htm">.
<sect1>
Jumpers that clip total capacity
@ -1223,10 +1318,12 @@ IDE disks larger than this.
With an old BIOS and a disk larger than 33.8 GB, the BIOS may hang,
and in such cases booting may be impossible, even when the disk
is removed from the CMOS settings.
<!-- doesnt exist anymore
See also <htmlurl name="the BIOS 33.8 GB limit"
url="http://www.storage.ibm.com/techsup/hddtech/bios338gb.htm">.
-->
<p>
Therefore, large IBM and Maxtor disks come with a jumper
Therefore, large IBM and Maxtor and Seagate disks come with a jumper
that make the disk appear as a 33.8 GB disk.
For example, the IBM Deskstar 37.5 GB (DPTA-353750) with 73261440 sectors
(corresponding to 72680/16/63, or 4560/255/63) can be jumpered to appear
@ -1234,19 +1331,34 @@ as a 33.8 GB disk, and then reports geometry 16383/16/63 like any big disk,
but LBAcapacity 66055248 (corresponding to 65531/16/63, or 4111/255/63).
Similar things hold for recent large Maxtor disks.
<p>
Below some more details that used
to be relevant but probably can be ignored now.
<p>
<sect3>Maxtor<p>
With the jumper present, both the geometry (16383/16/63) and the size
(66055248) are conventional and give no information about the actual size.
Moreover, attempts to access sector 66055248 and above yield I/O errors.
However, on Maxtor drives the actual size can be found and made accessible
using the READ NATIVE MAX ADDRESS and SET MAX ADDRESS commands.
Presumably this is what MaxBlast/EZ-Drive does. Now there is also
a small Linux utility <htmlurl url="http://www.win.tue.nl/~aeb/linux/setmax.c"
name="setmax.c"> for this, and also a kernel patch has been published.
Presumably this is what MaxBlast/EZ-Drive does.
There is a small Linux utility
<htmlurl url="http://www.win.tue.nl/~aeb/linux/setmax.c" name="setmax.c">
that does the same. Only very few disks need it - almost always
CONFIG_IDEDISK_STROKE does the trick.
<p>
Early large Maxtor disks have an additional detail: the J46 jumper
for these 34-40 GB disks changes the geometry from 16383/16/63
to 4092/16/63 and does not change the reported LBAcapacity.
For drives larger than 137 GB also READ NATIVE MAX ADDRESS returns
a conventional value, namely 0xfffffff, corresponding to 137 GB.
Here READ NATIVE MAX ADDRESS EXT and SET MAX ADDRESS EXT (using
48-bit addressing) are required. The <tt>setmax</tt> utility does not yet
know about this. A very small patch makes 2.5.3 handle this situation.
<p>
Early large Maxtor disks
<!-- (early releases of the 36GB drive in the DM36 family) -->
<!-- (older models of the DiamondMax 36 and 40) -->
<!-- 36GB confirmed -->
have an additional detail: the J46 jumper for these 34-40 GB disks
changes the geometry from 16383/16/63 to 4092/16/63 and does not
change the reported LBAcapacity.
This means that also with jumper present the BIOS (old Award 4.5*)
will hang at boot time. For this case Maxtor provides a utility
<htmlurl url="http://www.maxtor.com/technology/technotes/20012.html"
@ -1263,7 +1375,7 @@ For Maxtor D540X-4K, see below.
For IBM things are worse: the jumper really clips capacity
and there is no software way to get it back. The solution is
not to use the jumper but use <tt>setmax -m 66055248 /dev/hdX</tt>
to software-clip the disk. ("How?" you say - "I cannot boot!".
to software-clip the disk. "How?" you say - "I cannot boot!".
IBM gives the tip: <it>If a system with Award BIOS hangs during drive
detection: Reboot the system and hold the F4 key to bypass autodetection
of the drive(s).</it> If this doesn't help, find a different computer,
@ -1273,6 +1385,27 @@ with jumpered Maxtor disks: booting works, and after getting past
the BIOS either a patched kernel or a <tt>setmax -d 0</tt>
gets you full capacity.
<p>
Thomas Charbonnel reports on a different approach:
"I had a 80 GB IBM IC35L080AVVA07-0 drive and installed IBM's
Disk Manager. Installed my boot loader on the drive's MBR.
Everything worked fine. Note that the IDE drive must become
the boot drive so that one can install only one 34+ GB drive
using this approach."
<p>
<sect3>Seagate<p>
Seagate disks have a jumper that will clip the reported number
of cylinders to 4092 on drives smaller than 33.8 GB, while it
will limit the reported LBA capacity (Identify words 60/61) to
33.8 GB on larger disks.
<p>
For models ST-340810A, ST-360020A, ST-380020A:
The ATA Read Native Max and Set Max commands may be used to reset
the true full capacity.
<p>
For models ST-340016A, ST-340823A, ST-340824A, ST-360021A, ST-380021A:
The ATA Set Features F1 sub-command will cause Identify Data words
60-61 to report the true full capacity.
<p>
<sect3>Maxtor D540X-4K<p>
The Maxtor Diamond Max drives 4K080H4, 4K060H3, 4K040H2 (aka D540X-4K)
are identical to the drives 4D080H4, 4D060H3, 4D040H2 (aka D540X-4D),
@ -1291,6 +1424,74 @@ first put the disk in a machine with non-broken BIOS, soft-clip it
with <tt>setmax -m 66055248 /dev/hdX</tt>, then put it back in the
first machine, and after booting run <tt>setmax -d 0 /dev/hdX</tt>
to get full capacity again.
<p>
In the meantime, some docs and pictures have appeared on the Maxtor site,
confirming part of the above. Compare
<figure><eps file="absent">
<img src="http://service.maxtor.com/rightnow/images/ministyleA.gif">
</figure>
<figure><eps file="absent">
<img src="http://service.maxtor.com/rightnow/images/ministyleB.gif">
</figure>
<figure><eps file="absent">
<img src="http://service.maxtor.com/rightnow/images/ministylec.gif">
</figure>
<p>
<sect3>Western Digital<p>
Some info, including the settings for capacity-clipping jumpers, is given on
<htmlurl url="http://support.wdc.com/techinfo/general/jumpers.asp"
name="the Western Digital site">. I do not know what precisely
these jumpers do.
<p>
<sect1>READ NATIVE MAX ADDRESS / SET MAX ADDRESS<p>
If an IDE/ATA disk has support for the Host Protected Area (HPA) feature set,
then it is possible to set the LBA capacity to any value below
the actual capacity. Access past the assigned point usually leads
to I/O errors. Since classical software finds out about the disk size
by looking at the LBA capacity field of the Identify information,
such software will not suspect that the disk actually is larger.
<p>
The actual total size of the disk is read using the
READ NATIVE MAX ADDRESS command.
This "soft disk size" is set using the SET MAX ADDRESS command.
It comes in two flavours: if the "volatile" bit is set, the
command will have effect until the next reboot or hardware reset;
otherwise the effect is permanent.
It is possible to protect settings with a password.
(For details, see the ATA standard.)
<p>
This clipped size has (at least) two applications:
on the one hand it is possible to fake a smaller disk,
so that the BIOS will not have problems, and have Linux,
or (for DOS/Windows) a disk manager restore total size;
on the other hand one can have a vendor area at the end,
inaccessible to the ordinary user.
<p>
For many of the disks discussed above, setting a jumper has
precisely this effect: LBA capacity is diminished while
the native max capacity remains the same, and the SET MAX ADDRESS
will restore full capacity.
<sect1>CONFIG_IDEDISK_STROKE<p>
The CONFIG_IDEDISK_STROKE option of Linux 2.4.19/2.5.3 and later,
will tell Linux to read the native max capacity and do a
SET MAX ADDRESS to get access to full capacity.
This configuration option lives under the heading
"Auto-Geometry Resizing support" in the
"IDE, ATA and ATAPI block devices" kernel configuration section.
<p>
The configuration option went away in 2.6.7
and was replaced by a (per-disk) boot parameter,
so that one can say "hda=stroke".
<p>
With this "stroke" option jumpered disks will in many cases
be handled correctly, i.e., be seen with full capacity
(in spite of the jumper). And the same holds when the disk
got a Host Protected Area in some other (non-jumper) way.
<p>
This is the preferred way to handle disks that need a jumper
because of a broken BIOS.
<sect>
The Linux 65535 cylinder limit
@ -1300,6 +1501,11 @@ This means that if you have more than 65535 cylinders, the number is
truncated, and (for a typical SCSI setup with 1 MiB cylinders)
a 80 GiB disk may appear as a 16 GiB one.
Once one recognizes what the problem is, it is easily avoided.
Use fdisk 2.10i or newer.
<p>
(The programming convention is to use the <tt/BLKGETSIZE/ ioctl
to get total size, and <tt/HDIO_GETGEO/ to get number of heads and
sectors/track, and, if needed, get C by C = size/(H*S).)
<sect1>
IDE problems with 34+ GB disks
<label id="verylarge">
@ -1309,7 +1515,7 @@ and jumpers that clip capacity were discussed
<ref id="jumperbig" name="above">.)
<p>
Drives larger than 33.8 GB will not work with kernels older than
2.2.14 / 2.3.21.
2.0.39 / 2.2.14 / 2.3.21.
The details are as follows.
Suppose you bought a new IBM-DPTA-373420 disk with a capacity
of 66835440 sectors (34.2 GB). Pre-2.3.21 kernels will tell you
@ -1453,6 +1659,8 @@ Such things are easily solved by giving boot parameters
`hda=C,H,S' for the appropriate numbers C, H and S, either at boot time
or in /etc/lilo.conf.
<p>
Since Linux 2.5.51 this BIOS information is not used anymore,
and the same problem occurs for all disks. See below.
<sect1>
Nonproblem: Identical disks have different geometry?
@ -1486,6 +1694,39 @@ The rounding down here costs almost 8 MB.
<p>
If you would like to remap hdd in the same way, give the kernel
boot parameters `hdd=1232,255,63'.
<p>
On the other hand, if the disk is not shared with DOS or so,
it may be better to set hdb to Normal in the BIOS setup,
instead of asking for some translation like LBA.
<p>
Since Linux 2.5.51, the IDE driver no longer uses BIOS info on the first
two disks, and the different treatment of the first two disks has disappeared.
<sect1>
Problem: 2.4 and 2.6 report different geometries?
2.6 reports the wrong geometry? 2.6 reports no geometry at all?
<p>
Since geometry does not exist, it is not surprising that each of
2.0/2.2/2.4/2.6 reports a somewhat different disk geometry.
<p>
Some people will maintain that geometry *does* exist, and in that
case do not mean a property of the disk, but mean the values
reported by the BIOS. That is what several other operating systems
will use. Since Linux 2.5.51, the kernel no longer uses the values
reported by the BIOS - it is difficult to match BIOS device numbers
with Linux disk names, maybe data is only available for two disks,
maybe some disks are not present in the BIOS setup, etc.
However, if one needs these values, since Linux 2.6.5 one can set
CONFIG_EDD and mount sysfs, and then find the BIOS data for the
various disks under <tt>/sys/firmware/edd/int13_dev*</tt>.
Now the matching of BIOS numbers, represented in directory names
like <tt>int13_dev82</tt>, with Linux names like <tt>sda</tt> can
be done by user space software, possibly with help from the user.
<p>
This 2.5.51 change caused problems when many people using both Linux
and Windows on the same disk upgraded from 2.4 to 2.6 and used as
partitioning tool the program <tt>parted</tt> that had not yet
been updated. I have not checked whether current parted is OK.
<sect1>
Nonproblem: fdisk sees much more room than df?
@ -1537,3 +1778,4 @@ On the other hand, this filesystem can have at most 1024000 files
(more than enough), against 4096000 (too much) earlier.
</article>