mirror of https://github.com/tLDP/LDP
1782 lines
80 KiB
Plaintext
1782 lines
80 KiB
Plaintext
<!doctype linuxdoc system>
|
|
<article>
|
|
<title>Large Disk HOWTO
|
|
<author>Andries Brouwer, <tt/aeb@cwi.nl/
|
|
<date>v2.5, 2004-11-01
|
|
|
|
|
|
<abstract>
|
|
All about disk geometry and the 1024 cylinder and other limits for disks.
|
|
<nidx>HOWTOs!large disk</nidx>
|
|
<nidx>HOWTOs!disk, large</nidx>
|
|
</abstract>
|
|
|
|
<p>
|
|
For the most recent version of this text, see
|
|
<htmlurl url="http://www.win.tue.nl/~aeb/linux/Large-Disk.html"
|
|
name="www.win.tue.nl">.
|
|
|
|
<sect>
|
|
Large disks
|
|
<p>
|
|
You got a new disk. What to do? Well, on the software side:
|
|
use <tt/fdisk/ or <tt/cfdisk/ to create partitions,
|
|
and then <tt/mke2fs/ or <tt/mkreiserfs/ or so to create a filesystem,
|
|
and then <tt/mount/ to attach the new filesystem to the big file hierarchy.
|
|
Make sure you have relatively recent versions of these utilities -
|
|
often old versions have problems handling large disks.
|
|
<p>
|
|
You need not read this HOWTO since there are <em/no/ problems
|
|
with large hard disks these days.
|
|
<p>
|
|
Long ago, disks were large when they had a capacity larger than
|
|
528 MB, or than 8.4 GB, or than 33.8 GB. These days the interesting
|
|
limit is 137 GB. In all cases, sufficiently recent Linux kernels
|
|
handle the disk fine.
|
|
<p>
|
|
Sometimes booting requires some care, since Linux cannot help you
|
|
when it isn't running yet. But again, with a sufficiently recent
|
|
BIOS and boot loader there are no problems.
|
|
Most of the text below will treat the cases of
|
|
(i) ancient hardware,
|
|
(ii) broken hardware or BIOS,
|
|
(iii) several operating systems on the same disk,
|
|
(iv) booting old systems.
|
|
<p>
|
|
<bf>Advice</bf>
|
|
|
|
For large SCSI disks: Linux has supported them from very early on.
|
|
No action required.
|
|
|
|
For large IDE disks (over 8.4 GB): make sure your kernel is 2.0.34 or later.
|
|
|
|
For large IDE disks (over 33.8 GB): make sure your kernel is
|
|
2.0.39/2.2.14/2.3.21 or later.
|
|
|
|
For large IDE disks (over 137 GB): make sure your kernel is
|
|
2.4.19/2.5.3 or later.
|
|
|
|
If the kernel boots fine, and the boot messages indicate that it
|
|
recognizes the disk correctly, but there are problems with utilities,
|
|
upgrade the utilities.
|
|
|
|
If <ref id="LILO" name="LILO"> hangs at boot time, make sure you have
|
|
version 21.4 or later, and specify the keyword <tt>lba32</tt>
|
|
in the configuration file <tt>/etc/lilo.conf</tt>. With an older version
|
|
of LILO, try both with and without the <tt>linear</tt> keyword.
|
|
|
|
There may be geometry problems that can be solved by giving
|
|
an explicit geometry to kernel/LILO/fdisk.
|
|
|
|
If you have an old <tt/fdisk/ and it warns about
|
|
<ref id="overlap" name="overlapping"> partitions:
|
|
ignore the warnings, or check using <tt/cfdisk/ that really all is well.
|
|
|
|
For HPT366, see the <htmlurl name="Linux HPT366 HOWTO"
|
|
url="http://www.csie.ntu.edu.tw/~b6506063/hpt366/">.
|
|
|
|
If at boot time the kernel cannot read the partition table,
|
|
consider the possibility that UDMA66 was selected while
|
|
the controller or the cable or the disk drive did not
|
|
support UDMA66. In such a case every attempt to read will
|
|
fail, and reading the partition table is the first thing
|
|
the kernel does. Make sure no UDMA66 is used.
|
|
|
|
If the BIOS hangs at boot time because of a large disk, and
|
|
flashing a newer version is not an option, take the disk out
|
|
of the BIOS setup. If you have to boot from the disk, look
|
|
whether a capacity clipping jumper helps.
|
|
|
|
If you think something is wrong with the size of your disk,
|
|
make sure that you are not confusing binary and decimal <ref id="units">,
|
|
and realize that the free space that <tt/df/ reports on an empty disk
|
|
is a few percent smaller than the partition size, because there
|
|
is administrative overhead. Software that does not understand
|
|
48-bit addressing will view a 137+ GB disk as having a capacity
|
|
of 137 GB. When a capacity clipping <ref id="jumpers" name="jumper">
|
|
is present, a larger disk may have been clipped to 33 GB or to 2 GB.
|
|
|
|
If for a removable drive the kernel reports two different sizes,
|
|
then one is found from the drive, and the other from the disk/floppy.
|
|
This second value will be zero when the drive has no media.
|
|
<p>
|
|
Now, if you still think there are problems, or just are curious,
|
|
read on.
|
|
<p>
|
|
Below a rather detailed description of all relevant details.
|
|
I used kernel version 2.0.8 source as a reference.
|
|
Other versions may differ a bit.
|
|
<p>
|
|
<sect>
|
|
Units
|
|
<label id="units">
|
|
<p>
|
|
<nidx>units!megabyte</nidx>
|
|
<nidx>units!gigabyte</nidx>
|
|
A kilobyte (kB) is 1000 bytes.
|
|
A megabyte (MB) is 1000 kB.
|
|
A gigabyte (GB) is 1000 MB.
|
|
A terabyte (TB) is 1000 GB.
|
|
This is the
|
|
<htmlurl url="http://physics.nist.gov/cuu/Units/prefixes.html"
|
|
name="SI norm">.
|
|
However, there are people that use 1 MB=1024000 bytes and talk
|
|
about 1.44 MB floppies, and people who think that 1 MB=1048576 bytes.
|
|
Here I follow the
|
|
<htmlurl url="http://physics.nist.gov/cuu/Units/binary.html"
|
|
name="recent standard">
|
|
and write Ki, Mi, Gi, Ti for the binary units, so that
|
|
these floppies are 1440 KiB (1.47 MB, 1.41 MiB),
|
|
1 MiB is 1048576 bytes (1.05 MB),
|
|
1 GiB is 1073741824 bytes (1.07 GB)
|
|
and 1 TiB is 1099511627776 bytes (1.1 TB).
|
|
<p>
|
|
Quite correctly, the disk drive manufacturers follow the SI norm
|
|
and use the decimal units. However, Linux kernel boot messages
|
|
(for not-so-recent kernels) and some old fdisk-type programs
|
|
use the symbols MB and GB for binary, or
|
|
mixed binary-decimal units. So, before you think your disk is
|
|
smaller than was promised when you bought it, compute first the
|
|
actual size in decimal units (or just in bytes).
|
|
<p>
|
|
Concerning terminology and abbreviation for binary units,
|
|
<htmlurl name="Knuth" url="http://www-cs-staff.stanford.edu/~knuth/">
|
|
has an alternative <htmlurl name="proposal"
|
|
url="http://www-cs-staff.stanford.edu/~knuth/news99.html">, namely
|
|
to use KKB, MMB, GGB, TTB, PPB, EEB, ZZB, YYB and to call these
|
|
<it>large kilobyte</it>, <it>large megabyte</it>, ... <it>large yottabyte</it>.
|
|
He writes: `Notice that doubling the letter connotes both
|
|
binary-ness and large-ness.' This is a good proposal -
|
|
`large gigabyte' sounds better than `gibibyte'. For our purposes
|
|
however the only important thing is to stress that a megabyte
|
|
has precisely 1000000 bytes, and that some other term and abbreviation
|
|
is required if you mean something else.
|
|
<p>
|
|
<sect>
|
|
Disk Access
|
|
<p>
|
|
Disk access is done in units called <it>sectors</it>.
|
|
In order to read or write something from or to the disk, we have
|
|
to specify the position on the disk, for example by giving the
|
|
sector number.
|
|
If the disk is a SCSI disk, then this sector number goes directly
|
|
into the SCSI command and is understood by the disk.
|
|
If the disk is an IDE disk using LBA, then precisely the same holds.
|
|
But if the disk is old, RLL or MFM or IDE from before the LBA times,
|
|
then the disk hardware expects a triple (cylinder,head,sector) to
|
|
designate the desired spot on the disk.
|
|
<p>
|
|
<sect1>
|
|
Cylinders, heads and sectors
|
|
<p>
|
|
A disk has sectors numbered 0, 1, 2, ...
|
|
This is called <it>LBA addressing</it>.
|
|
<p>
|
|
In ancient times, before the advent of IDE disks,
|
|
disks had a <it>geometry</it> described by three constants
|
|
C, H, S: the number of cylinders, the number of heads,
|
|
the number of sectors per track.
|
|
The address of a sector was given by three numbers:
|
|
<it>c</it>, <it>h</it>, <it>s</it>: the cylinder number
|
|
(between 0 and C-1), the head number (between 0 and H-1),
|
|
and the sector number within the track (between 1 and S), where
|
|
for some mysterious reason <it>c</it> and <it>h</it> count from 0,
|
|
but <it>s</it> counts from 1. This is called <it>CHS addressing</it>.
|
|
<p>
|
|
No disk manufactured less than ten years ago has a geometry, but
|
|
this ancient 3D sector addressing is still used by the INT13
|
|
BIOS interface (with fantasy numbers C, H, S
|
|
unrelated to any physical reality).
|
|
<p>
|
|
The correspondence between the linear numbering and this 3D notation
|
|
is as follows: for a disk with C cylinders, H heads and S sectors/track
|
|
position (<it>c</it>,<it>h</it>,<it>s</it>) in 3D or CHS notation
|
|
is the same as position
|
|
<it>c</it><tt/*/H<tt/*/S + <it>h</it><tt/*/S + (<it>s</it>-1)
|
|
in linear or LBA notation.
|
|
<p>
|
|
Consequently, in order to access a very old non-SCSI disk, we need to know
|
|
its <em/geometry/, that is, the values of C, H and S.
|
|
(And if you don't know, there is a lot of good information on
|
|
<htmlurl url="http://www.thetechpage.com/cgi-bin/default.cgi"
|
|
name="www.thetechpage.com">.)
|
|
<p>
|
|
<sect1>
|
|
Sectorsize
|
|
<p>
|
|
<nidx>disk!sectorsize</nidx>
|
|
In the present text a sector has 512 bytes. This is almost always
|
|
true, but for example certain MO disks use a sectorsize of 2048 bytes,
|
|
and all capacities given below must be multiplied by four.
|
|
(When using <tt/fdisk/ on such disks, make sure you have version
|
|
2.9i or later, and give the `-b 2048' option.)
|
|
|
|
<sect1>
|
|
Disksize
|
|
<p>
|
|
<nidx>disk!disksize</nidx>
|
|
A disk with C cylinders, H heads and S sectors per track
|
|
has C<tt/*/H<tt/*/S sectors in all, and can store
|
|
C<tt/*/H<tt/*/S<tt/*/512 bytes.
|
|
For example, if the disk label says C/H/S=4092/16/63
|
|
then the disk has 4092<tt/*/16<tt/*/63=4124736 sectors, and can hold
|
|
4124736<tt/*/512=2111864832 bytes (2.11 GB).
|
|
There is an industry convention to give C/H/S=16383/16/63
|
|
for disks larger than 8.4 GB, and the disk size can no longer
|
|
be read off from the C/H/S values reported by the disk.
|
|
|
|
<sect1>
|
|
The 1024 cylinder and 8.5 GB limits
|
|
<p>
|
|
The old INT13 BIOS interface to disk I/O uses 24 bits to address
|
|
a sector: 10 bits for the cylinder, 8 bits for the head, and 6 bits
|
|
for the sector number within the track (counting from 1).
|
|
This means that this interface cannot address more than
|
|
1024*256*63 sectors, which is 8.5 GB (with 512-byte sectors).
|
|
And if the (fantasy) geometry specified for the disk has fewer
|
|
than 1024 cylinders, or 256 heads, or 63 sectors per track,
|
|
then this limit will be less.
|
|
<p>
|
|
(More precisely: with INT 13, AH selects the function to perform,
|
|
CH is the low 8 bits of the cylinder number, CL
|
|
has in bits 7-6 the high two bits of the cylinder number
|
|
and in bits 5-0 the sector number, <tt/DH/ is the head number,
|
|
and DL is the drive number (80h or 81h).
|
|
This explains part of the layout of the partition table.)
|
|
<p>
|
|
This state of affairs was rectified when the so-called Extended INT13
|
|
functions were introduced. A modern BIOS has no problems accessing
|
|
large disks.
|
|
<p>
|
|
(More precisely: DS:SI points at a 16-byte Disk Address Packet
|
|
that contains an 8-byte starting absolute block number.)
|
|
<p>
|
|
Linux does not use the BIOS, so does (and did) not have this problem.
|
|
<p>
|
|
However, this geometry stuff plays a role in the interpretation
|
|
of partition tables, so if Linux shares a disk with for example DOS,
|
|
then it needs to know what geometry DOS will think the disk has.
|
|
It also plays a role at boot time, where the BIOS has to load
|
|
a boot loader, and the boot loader has to load the operating system.
|
|
<p>
|
|
<sect1>
|
|
The 137 GB limit
|
|
<p>
|
|
The old ATA standard describes how to address a sector on an IDE disk
|
|
using 28 bits (8 bits for the sector, 4 for the head, 16 for the cylinder).
|
|
This means that an IDE disk can have at most 2^28 addressable sectors
|
|
With 512-byte sectors this is 2^37 bytes, that is, 137.4 GB.
|
|
<p>
|
|
The ATA-6 standard includes a specification how to address
|
|
past this 2^28 sector boundary. The new standard allows addressing
|
|
of 2^48 sectors. There is support in recent Linux kernels that
|
|
have incorporated Andre Hedrick's IDE patch, for example
|
|
2.4.18-pre7-ac3 and 2.5.3.
|
|
<p>
|
|
Maxtor sells 160 GB IDE disks since Fall 2001.
|
|
An old kernel will treat such disks as 137.4 GB disks.
|
|
<p>
|
|
<sect>
|
|
History of BIOS and IDE limits
|
|
<p>
|
|
<descrip>
|
|
<tag/ATA Specification (for IDE disks) - the 137 GB limit/
|
|
At most 65536 cylinders (numbered 0-65535), 16 heads (numbered 0-15),
|
|
255 sectors/track (numbered 1-255), for a maximum total capacity of
|
|
267386880 sectors (of 512 bytes each), that is, 136902082560 bytes (137 GB).
|
|
In Sept 2001, the first drives larger than this (160 GB Maxtor Diamondmax)
|
|
appeared.
|
|
<p>
|
|
<tag/BIOS Int 13 - the 8.5 GB limit/
|
|
At most 1024 cylinders (numbered 0-1023), 256 heads (numbered 0-255),
|
|
63 sectors/track (numbered 1-63) for a maximum total capacity of
|
|
8455716864 bytes (8.5 GB). This is a serious limitation today.
|
|
It means that DOS cannot use present day large disks.
|
|
<p>
|
|
<tag/The 528 MB limit/
|
|
If the same values for c,h,s are used for the BIOS Int 13 call and
|
|
for the IDE disk I/O, then both limitations combine, and one can
|
|
use at most 1024 cylinders, 16 heads, 63 sectors/track, for a
|
|
maximum total capacity of 528482304 bytes (528MB), the infamous
|
|
504 MiB limit for DOS with an old BIOS.
|
|
This started being a problem around 1993, and people resorted to all kinds
|
|
of trickery, both in hardware (LBA), in firmware (translating BIOS),
|
|
and in software (disk managers).
|
|
The concept of `translation' was invented (1994): a BIOS could use
|
|
one geometry while talking to the drive, and another, fake, geometry
|
|
while talking to DOS, and translate between the two.
|
|
<p>
|
|
<tag/The 2.1 GB limit (April 1996)/
|
|
Some older BIOSes only allocate 12 bits for the field in CMOS RAM that
|
|
gives the number of cylinders. Consequently, this number can be at most
|
|
4095, and only 4095<tt/*/16<tt/*/63<tt/*/512=2113413120 bytes are accessible.
|
|
The effect of having a larger disk would be a hang at boot time.
|
|
This made disks with geometry 4092/16/63 rather popular. And still today
|
|
many large disk drives come with a jumper to make them appear 4092/16/63.
|
|
See also <htmlurl url="http://www.firmware.com/support/bios/over2gb.htm"
|
|
name="over2gb.htm">. <htmlurl name="Other BIOSes"
|
|
url="http://www.asus.com/Products/Techref/Ide/Intel/intel-ide-001.html">
|
|
would not hang but just detect a much smaller disk, like 429 MB instead of 2.5 GB.
|
|
<p>
|
|
<tag/The 3.2 GB limit/
|
|
There was a bug in the Phoenix 4.03 and 4.04 BIOS firmware that would
|
|
cause the system to lock up in the CMOS setup for drives with a capacity
|
|
over 3277 MB. See <htmlurl url="http://www.firmware.com/support/bios/over3gb.htm"
|
|
name="over3gb.htm">.
|
|
<p>
|
|
<tag/The 4.2 GB limit (Feb 1997)/
|
|
Simple BIOS translation (ECHS=Extended CHS, sometimes called `Large
|
|
disk support' or just `Large')
|
|
works by repeatedly doubling the number of heads and halving the number
|
|
of cylinders shown to DOS, until the number of cylinders is at most 1024.
|
|
Now DOS and Windows 95 cannot handle 256 heads,
|
|
and in the common case that the disk reports 16 heads, this means that
|
|
this simple mechanism only works up to 8192<tt/*/16<tt/*/63<tt/*/512=4227858432
|
|
bytes (with a fake geometry with 1024 cylinders, 128 heads, 63 sectors/track).
|
|
Note that ECHS does not change the number of sectors per track, so if
|
|
that is not 63, the limit will be lower.
|
|
See <htmlurl url="http://www.firmware.com/support/bios/over4gb.htm"
|
|
name="over4gb.htm">.
|
|
<p>
|
|
<tag/The 7.9 GB limit/
|
|
Slightly smarter BIOSes avoid the previous problem by first adjusting the
|
|
number of heads to 15 (`revised ECHS'), so that a fake geometry with
|
|
240 heads can be obtained, good for
|
|
1024<tt/*/240<tt/*/63<tt/*/512=7927234560 bytes.
|
|
<p>
|
|
<tag/The 8.4 GB limit/
|
|
<label id="The 8.4 GB limit">
|
|
Finally, if the BIOS does all it can to make this translation a success,
|
|
and uses 255 heads and 63 sectors/track (`assisted LBA' or just `LBA')
|
|
it may reach 1024<tt/*/255<tt/*/63<tt/*/512=8422686720 bytes, slightly less
|
|
than the earlier 8.5 GB limit because the geometries with 256 heads must be
|
|
avoided.
|
|
(This translation will use for the number of heads the first value H
|
|
in the sequence 16, 32, 64, 128, 255 for which the total disk capacity
|
|
fits in 1024<tt/*/H<tt/*/63<tt/*/512, and then computes the number of
|
|
cylinders C as total capacity divided by (H<tt/*/63<tt/*/512).)
|
|
<p>
|
|
<tag/The 33.8 GB limit (August 1999)/
|
|
<label id="biosupgrades">
|
|
The next hurdle comes with a size over 33.8 GB.
|
|
The problem is that with the default 16 heads and 63 sectors/track
|
|
this corresponds to a number of cylinders of more than 65535, which
|
|
does not fit into a short. Many BIOSes couldn't handle such disks.
|
|
(See, e.g., <htmlurl name="Asus upgrades"
|
|
url="http://www.asus.com/Products/Motherboard/bios_slot1.html">
|
|
for new flash images that work.)
|
|
Linux kernels older than 2.2.14 / 2.3.21 need a patch.
|
|
See <ref id="verylarge" name="IDE problems with 34+ GB disks"> below.
|
|
<p>
|
|
<tag/The 137 GB limit (Sept 2001)/
|
|
As mentioned above, the old ATA protocol uses 16+4+8 = 28 bits
|
|
to specify the sector number, and hence cannot address more than
|
|
2^28 sectors. ATA-6 describes an extension that allows the addressing
|
|
of 2^48 sectors, a million times as much.
|
|
There is support in very recent kernels.
|
|
<p>
|
|
<tag/The 2 TiB limit/
|
|
With 32-bit sector numbers, one can address 2 TiB.
|
|
A lot of software will have to be rewritten once disks get larger.
|
|
</descrip>
|
|
|
|
Hard drives over 8.4 GB are supposed to report their geometry as 16383/16/63.
|
|
This in effect means that the `geometry' is obsolete, and the total disk
|
|
size can no longer be computed from the geometry, but is found in the
|
|
LBA capacity field returned by the <ref id="identify" name="IDENTIFY command">.
|
|
Hard drives over 137.4 GB are supposed to report an LBA capacity of
|
|
0xfffffff = 268435455 sectors (137438952960 bytes). Now the actual
|
|
disk size is found in the new 48-capacity field.
|
|
|
|
<sect>
|
|
Booting
|
|
<p>
|
|
<nidx>booting!BIOS usage during</nidx>
|
|
<nidx>disk!BIOS access during booting</nidx>
|
|
When the system is booted, the BIOS reads sector 0 (known as
|
|
the MBR - the Master Boot Record) from the first disk
|
|
(or from floppy or CDROM), and jumps to the code found there - usually
|
|
some bootstrap loader. These small bootstrap programs
|
|
found there typically have no own disk drivers and use
|
|
BIOS services. This means that a Linux kernel can only be
|
|
booted when it is entirely located within the first 1024
|
|
cylinders, unless you both have a modern BIOS (a BIOS that supports
|
|
the Extended INT13 functions), and a modern bootloader
|
|
(a bootloader that uses these functions when available).
|
|
<p>
|
|
This problem (if it is a problem) is very easily solved:
|
|
make sure that the kernel (and perhaps other files used during bootup,
|
|
such as LILO map files) are located on a partition that is entirely
|
|
contained in the first 1024 cylinders of a disk that the BIOS can access -
|
|
probably this means the first or second disk.
|
|
<p>
|
|
Thus: create a small partition, say 10 MB large, so that there
|
|
is room for a handful of kernels, making sure that it is entirely
|
|
contained within the first 1024 cylinders of the first or second
|
|
disk. Mount it on <tt>/boot</tt> so that LILO will put its stuff there.
|
|
<p>
|
|
Most systems from 1998 or later will have a modern BIOS.
|
|
<p>
|
|
|
|
<sect1>LILO and the `lba32' and `linear' options
|
|
<label id="LILO">
|
|
<label id="linear">
|
|
<p>
|
|
Executive summary: If you use LILO as boot loader, make sure you have
|
|
LILO version 21.4 or later. (It can be found at
|
|
<htmlurl name="ftp://metalab.unc.edu/pub/Linux/system/boot/lilo/"
|
|
url="ftp://metalab.unc.edu/pub/Linux/system/boot/lilo/">.)
|
|
Always use the <tt>lba32</tt> option.
|
|
<p>
|
|
An invocation of <tt>/sbin/lilo</tt> (the boot map installer) stores a list
|
|
of addresses in the boot map, so that LILO (the boot loader) knows from
|
|
where to read the kernel image. By default these addresses are
|
|
stored in (c,h,s) form, and ordinary INT13 calls are used at boot time.
|
|
<p>
|
|
When the configuration file specifies <tt>lba32</tt> or <tt>linear</tt>,
|
|
linear addresses are stored. With <tt>lba32</tt> also linear addresses
|
|
are used at boot time, when the BIOS supports extended INT13.
|
|
With <tt>linear</tt>, or with an old BIOS, these linear addresses are
|
|
converted back to (c,h,s) form, and ordinary INT13 calls are used.
|
|
<p>
|
|
Thus, with <tt>lba32</tt> there are no geometry problems and there is
|
|
no 1024 cylinder limit. Without it there is a 1024 cylinder limit.
|
|
What about the geometry?
|
|
<p>
|
|
The boot loader and the BIOS must agree as to the disk geometry.
|
|
<tt>/sbin/lilo</tt> asks the kernel for the geometry,
|
|
but there is no guarantee that the Linux kernel geometry coincides
|
|
with what the BIOS will use. Thus, often the geometry
|
|
supplied by the kernel is worthless. In such cases it helps
|
|
to give LILO the `<tt/linear/' option. The advantage is that
|
|
the Linux kernel idea of the geometry no longer plays a role.
|
|
The disadvantage is that <tt>lilo</tt> cannot warn you when
|
|
part of the kernel was stored above the 1024 cylinder limit,
|
|
and you may end up with a system that does not boot.
|
|
|
|
<sect1>A LILO bug<p>
|
|
With LILO versions below v21 there is another disadvantage:
|
|
the address conversion done at boot time has a bug: when c*H is 65536
|
|
or more, overflow occurs in the computation.
|
|
For H larger than 64 this causes a stricter limit on c than the
|
|
well-known c < 1024; for example, with H=255 and an old LILO
|
|
one must have c < 258. (c=cylinder where kernel image lives,
|
|
H=number of heads of disk)
|
|
|
|
<sect1>1024 cylinders is not 1024 cylinders<p>
|
|
Tim Williams writes: `I had my Linux partition within the first 1024
|
|
cylinders and still it wouldnt boot. First when I moved it below 1 GB
|
|
did things work.' How can that be? Well, this was a SCSI disk with
|
|
AHA2940UW controller which uses either H=64, S=32 (that is, cylinders
|
|
of 1 MiB = 1.05 MB), or H=255, S=63 (that is, cylinders of 8.2 MB),
|
|
depending on setup options in firmware and BIOS. No doubt the BIOS
|
|
assumed the former, so that the 1024 cylinder limit was found at 1 GiB,
|
|
while Linux used the latter and LILO thought that this limit was at 8.4 GB.
|
|
|
|
<sect1>No 1024 cylinder limit on old machines with IDE<p>
|
|
The <tt>nuni</tt> boot loader does not use BIOS services
|
|
but accesses IDE drives directly. So, one can put it on a
|
|
floppy or in the MBR and boot from anywhere on any IDE drive
|
|
(not only from the first two). Find it at
|
|
<htmlurl name="//metalab.unc.edu/pub/Linux/system/boot/loaders/"
|
|
url="//metalab.unc.edu/pub/Linux/system/boot/loaders/">.
|
|
|
|
<sect1>
|
|
Other boot loaders
|
|
<p>
|
|
LILO is a bit fragile, it requires the discipline of running
|
|
<tt>/sbin/lilo</tt> each time one installs a new kernel.
|
|
Some other boot loaders do not have this disadvantage.
|
|
Especially <tt/grub/ is popular these days; a major
|
|
disadvantage is that it does not support the
|
|
<tt>lilo -R label</tt> function.
|
|
|
|
<sect>
|
|
Disk geometry, partitions and `overlap'
|
|
<label id="overlap">
|
|
<p>
|
|
<nidx>disk!geometry</nidx>
|
|
<nidx>disk!partitions</nidx>
|
|
If you have several operating systems on your disks, then each
|
|
uses one or more disk partitions. A disagreement on where these
|
|
partitions are may have catastrophic consequences.
|
|
|
|
<label id="partitiontable">
|
|
The MBR contains a <it>partition table</it> describing where the
|
|
(primary) partitions are. There are 4 table entries, for 4
|
|
primary partitions, and each looks like
|
|
<tscreen><verb>
|
|
struct partition {
|
|
char active; /* 0x80: bootable, 0: not bootable */
|
|
char begin[3]; /* CHS for first sector */
|
|
char type;
|
|
char end[3]; /* CHS for last sector */
|
|
int start; /* 32 bit sector number (counting from 0) */
|
|
int length; /* 32 bit number of sectors */
|
|
};
|
|
</verb></tscreen>
|
|
(where CHS stands for Cylinder/Head/Sector).
|
|
|
|
This information is redundant: the location of a partition
|
|
is given both by the 24-bit <tt/begin/ and <tt/end/ fields,
|
|
and by the 32-bit <tt/start/ and <tt/length/ fields.
|
|
|
|
Linux only uses the <tt/start/ and <tt/length/ fields, and can
|
|
therefore handle partitions of not more than 2^32 sectors,
|
|
that is, partitions of at most 2 TiB. That is twelve times
|
|
larger than the disks available today, so maybe it will be
|
|
enough for the next five years or so.
|
|
(So, partitions can be very large, but there is a serious
|
|
restriction in that a file in an ext2 filesystem on hardware
|
|
with 32-bit integers cannot be larger than 2 GiB.)
|
|
|
|
DOS uses the <tt/begin/ and <tt/end/ fields, and uses the
|
|
BIOS INT13 call to access the disk, and can therefore only
|
|
handle disks of not more than 8.4 GB, even with a translating
|
|
BIOS. (Partitions cannot be larger than 2.1 GB because of
|
|
restrictions of the FAT16 file system.) The same holds for
|
|
Windows 3.11 and WfWG and Windows NT 3.*.
|
|
|
|
Windows 95 has support for the Extended INT13 interface, and
|
|
uses special partition types (c, e, f instead of b, 6, 5)
|
|
to indicate that a partition should be accessed in this way.
|
|
When these partition types are used, the <tt/begin/ and <tt/end/ fields
|
|
contain dummy information (1023/255/63).
|
|
Windows 95 OSR2 introduces the FAT32 file system (partition type
|
|
b or c), that allows partitions of size at most 2 TiB.
|
|
|
|
What is this nonsense you get from <tt/fdisk/ about `overlapping'
|
|
partitions, when in fact nothing is wrong?
|
|
Well - there is something `wrong': if you look at the <tt/begin/
|
|
and <tt/end/ fields of such partitions, as DOS does, they overlap.
|
|
(And that cannot be corrected, because these fields cannot store
|
|
cylinder numbers above 1024 - there will always be `overlap'
|
|
as soon as you have more than 1024 cylinders.)
|
|
However, if you look at the <tt/start/ and <tt/length/ fields,
|
|
as Linux does, and as Windows 95 does in the case of partitions
|
|
with partition type c, e or f, then all is well.
|
|
So, ignore these warnings when <tt/cfdisk/ is satisfied and you
|
|
have a Linux-only disk. Be careful when the disk is shared with DOS.
|
|
Use the commands <tt>cfdisk -Ps /dev/hdx</tt> and <tt>cfdisk -Pt /dev/hdx</tt>
|
|
to look at the partition table of <tt>/dev/hdx</tt>.
|
|
|
|
<sect1>The last cylinder<p>
|
|
Many old IBM PS/2 systems used disks with a defect map written
|
|
to the end of the disk. (Bit 0x20 in the control word of the
|
|
<htmlurl name="disk parameter table"
|
|
url="http://www.win.tue.nl/~aeb/linux/hdtypes/hdtypes-2.html"> is set.)
|
|
Therefore, FDISK would not use the last cylinder. Just to be sure, the BIOS
|
|
often already reports the size of the disk as one cylinder smaller than
|
|
reality, and that may mean that two cylinders are lost.
|
|
Newer BIOSes have several disk size reporting functions, where internally
|
|
one calls the other. When both subtract 1 for this reserved cylinder and
|
|
also FDISK does so, then one may lose three cylinders.
|
|
These days all of this is irrelevant, but this may provide an explanation
|
|
if one observes that different utilities have slightly different opinions
|
|
about the disk size.
|
|
|
|
<sect1>Cylinder boundaries<p>
|
|
A well-known claim says that partitions should start and end
|
|
at cylinder boundaries.
|
|
<p>
|
|
Since "disk geometry" is something without objective existence,
|
|
different operating systems will invent different geometries
|
|
for the same disk. One often sees a translated geometry like */255/63
|
|
used by one and an untranslated geometry like */16/63 used by another OS.
|
|
(People tell me Windows NT uses */64/32 while Windows 2K uses */255/63.)
|
|
Thus, it may be impossible to align partitions to cylinder boundaries
|
|
according to each of the various ideas about the size of a cylinder
|
|
that one's systems have. Also different Linux kernels may assign
|
|
different geometries to the same disk.
|
|
Also, enabling or disabling the BIOS of a SCSI card may change the
|
|
fake geometry of the connected SCSI disks.
|
|
<p>
|
|
Fortunately, for Linux there is no alignment requirement at all.
|
|
(Except that some semi-broken installation software likes to be very sure
|
|
that all is OK; thus, it may be impossible to install RedHat 7.1
|
|
on a disk with unaligned partitions because DiskDruid is unhappy.)
|
|
<p>
|
|
People report that it is easy to create nonaligned partitions
|
|
in Windows NT, without any noticeable bad effects.
|
|
<p>
|
|
But MSDOS 6.22 has an alignment requirement. Extended partition sectors
|
|
that are not on a cylinder boundary are ignored by its FDISK.
|
|
The system itself is happy with any alignment, but interprets
|
|
relative starting addresses as if relative to an aligned address:
|
|
The starting address of a logical partition is given relative not
|
|
to the address of the extended partition sector that describes it,
|
|
but relative to the start of the cylinder that contains that sector.
|
|
(So, it is not surprising that also PartitionMagic requires alignment.)
|
|
<p>
|
|
What is the definition of alignment?
|
|
MSDOS 6.22 FDISK will do the following:
|
|
1. If the first sector of the cylinder is a partition
|
|
table sector, then the rest of the track is unused,
|
|
and the partition starts with the next track.
|
|
This applies to sector 0 (the MBR) and the partition table sectors
|
|
preceding logical partitions.
|
|
2. Otherwise, the partition starts at the first sector of the
|
|
cylinder. Also the extended partition starts at a cylinder boundary.
|
|
The <tt>cfdisk</tt> man page says that old versions of DOS did not
|
|
align partitions.
|
|
<p>
|
|
Use of partition type 85 for the extended partition makes it invisible
|
|
to DOS, making sure that only Linux will look inside.
|
|
<p>
|
|
As an aside: on a Sparc, the boot partition must start on a cylinder
|
|
boundary (but there is no requirement on the end).
|
|
|
|
<sect>
|
|
Translation and Disk Managers
|
|
<p>
|
|
<nidx>disk!geometry translation</nidx>
|
|
<nidx>BIOS!translating</nidx>
|
|
<nidx>BIOS!LBA support</nidx>
|
|
Disk geometry (with heads, cylinders and tracks) is something
|
|
from the age of MFM and RLL. In those days it corresponded to
|
|
a physical reality. Nowadays, with IDE or SCSI, nobody is
|
|
interested in what the `real' geometry of a disk is.
|
|
Indeed, the number of sectors per track is variable - there are
|
|
more sectors per track close to the outer rim of the disk - so there
|
|
is no `real' number of sectors per track.
|
|
Quite the contrary: the IDE command INITIALIZE DRIVE PARAMETERS (91h)
|
|
serves to tell the disk how many heads and sectors per track
|
|
it is supposed to have today.
|
|
It is quite normal to see a large modern disk that has 2 heads
|
|
report 15 or 16 heads to the BIOS, while the BIOS may again report
|
|
255 heads to user software.
|
|
|
|
For the user it is best to regard a disk as just a linear array
|
|
of sectors numbered 0, 1, ..., and leave it to the firmware
|
|
to find out where a given sector lives on the disk. This linear
|
|
numbering is called LBA.
|
|
|
|
So now the conceptual picture is the following.
|
|
DOS, or some boot loader, talks to the BIOS, using (c,h,s) notation.
|
|
The BIOS converts (c,h,s) to LBA notation using the fake geometry
|
|
that the user is using. If the disk accepts LBA then this value
|
|
is used for disk I/O. Otherwise, it is converted back to (c',h',s')
|
|
using the geometry that the disk uses today, and that is used for
|
|
disk I/O.
|
|
|
|
Note that there is a bit of confusion in the use of the expression `LBA':
|
|
As a term describing disk capabilities it means `Linear Block Addressing'
|
|
(as opposed to CHS Addressing). As a term in the BIOS Setup, it describes
|
|
a translation scheme sometimes called `assisted LBA' - see above
|
|
under `<ref id="The 8.4 GB limit">'.
|
|
|
|
Something similar works when the firmware doesn't speak LBA
|
|
but the BIOS knows about translation. (In the setup this is
|
|
often indicated as `Large'.) Now the BIOS will present
|
|
a geometry (C,H,S) to the operating system, and use
|
|
(C',H',S') while talking to the disk controller. Usually S = S',
|
|
C = C'/N and H = H'<tt/*/N, where N is the smallest power of
|
|
two that will ensure C' <= 1024 (so that least capacity
|
|
is wasted by the rounding down in C' = C/N).
|
|
Again, this allows access of up to 8.4 GB (7.8 GiB).
|
|
|
|
(The third setup option usually is `Normal', where no translation
|
|
is involved.)
|
|
|
|
If a BIOS does not know about `Large' or `LBA', then there are
|
|
software solutions around. Disk Managers like OnTrack or EZ-Drive
|
|
replace the BIOS disk handling routines by their own.
|
|
Often this is accomplished by having the disk manager code live
|
|
in the MBR and subsequent sectors (OnTrack calls this code DDO:
|
|
Dynamic Drive Overlay), so that it is booted before any other
|
|
operating system. That is why one may have problems
|
|
when booting from a floppy when a Disk Manager has been installed.
|
|
|
|
The effect is more or less the same as with a translating BIOS -
|
|
but especially when running several different operating systems
|
|
on the same disk, disk managers can cause a lot of trouble.
|
|
|
|
Linux did support OnTrack Disk Manager since version 1.3.14,
|
|
and EZ-Drive since version 1.3.29. Some more details are
|
|
given in the next section.
|
|
|
|
In 2.5.70 the automatic disk manager support was removed.
|
|
Instead, two boot options were added: "hda=remap" to do
|
|
the EZ-Drive remapping of sector 0 to sector 1, and
|
|
"hda=remap63" to do the OnTrack Disk Manager shift over 63 sectors.
|
|
|
|
<sect>
|
|
Kernel disk translation for IDE disks
|
|
<p>
|
|
<nidx>disk!translation done by kernel</nidx>
|
|
If the Linux kernel detects the presence of some disk manager
|
|
on an IDE disk, it will try to remap the disk in the same way
|
|
this disk manager would have done, so that Linux sees the same
|
|
disk partitioning as for example DOS with OnTrack or EZ-Drive.
|
|
However, NO remapping is done when a geometry was specified
|
|
on the command line - so a
|
|
`<tt/hd=/<it/cyls/<tt/,/<it/heads/<tt/,/<it/secs/' command line option
|
|
might well kill compatibility with a disk manager.
|
|
|
|
If you are hit by this, and know someone who can compile a new
|
|
kernel for you, find the file <tt>linux/drivers/block/ide.c</tt>
|
|
and remove in the routine <tt>ide_xlate_1024()</tt> the test
|
|
<tt>if (drive->forced_geom) { ...; return 0; }</tt>.
|
|
|
|
The remapping is done by trying 4, 8, 16, 32, 64, 128, 255 heads
|
|
(keeping H<tt/*/C constant) until either C <= 1024 or H = 255.
|
|
|
|
The details are as follows - subsection headers are the strings
|
|
appearing in the corresponding boot messages. Here and everywhere
|
|
else in this text partition types are given in hexadecimal.
|
|
|
|
<sect1>EZD<p>
|
|
<nidx>disk!EZ-Drive translation</nidx>
|
|
<nidx>disk!EZD translation</nidx>
|
|
EZ-Drive is detected by the fact that the first primary partition
|
|
has type 55. The geometry is remapped as described above,
|
|
and the partition table from sector 0 is discarded - instead
|
|
the partition table is read from sector 1. Disk block numbers
|
|
are not changed, but writes to sector 0 are redirected to sector 1.
|
|
This behaviour can be changed by recompiling the kernel with
|
|
<tt/ #define FAKE_FDISK_FOR_EZDRIVE 0 /
|
|
in <tt/ide.c/.
|
|
|
|
<sect1>DM6:DDO<p>
|
|
<nidx>disk!OnTrack DiskManager translation</nidx>
|
|
<nidx>disk!DM6:DD0 translation</nidx>
|
|
OnTrack DiskManager (on the first disk) is detected by the fact
|
|
that the first primary partition has type 54. The geometry is
|
|
remapped as described above and the entire disk is shifted by
|
|
63 sectors (so that the old sector 63 becomes sector 0).
|
|
Afterwards a new MBR (with partition table) is read from
|
|
the new sector 0. Of course this shift is to make room for
|
|
the DDO - that is why there is no shift on other disks.
|
|
|
|
<sect1>DM6:AUX<p>
|
|
<nidx>disk!OnTrack DiskManager translation</nidx>
|
|
<nidx>disk!DM6:AUX</nidx>
|
|
OnTrack DiskManager (on other disks) is detected by the fact
|
|
that the first primary partition has type 51 or 53.
|
|
The geometry is remapped as described above.
|
|
|
|
<sect1>DM6:MBR<p>
|
|
<nidx>disk!OnTrack DiskManager translation</nidx>
|
|
<nidx>disk!DM6:MBR</nidx>
|
|
An older version of OnTrack DiskManager is detected not by
|
|
partition type, but by signature. (Test whether the offset
|
|
found in bytes 2 and 3 of the MBR is not more than 430, and
|
|
the short found at this offset equals 0x55AA, and is followed
|
|
by an odd byte.) Again the geometry is remapped as above.
|
|
|
|
<sect1>PTBL<p>
|
|
<nidx>disk!PTBL translation</nidx>
|
|
Finally, there is a test that tries to deduce a translation
|
|
from the <tt/start/ and <tt/end/ values of the primary partitions:
|
|
If some partition has start and end sector number 1 and 63, respectively,
|
|
and end heads 31, 63, 127 or 254, then, since it is customary
|
|
to end partitions on a cylinder boundary, and since moreover
|
|
the IDE interface uses at most 16 heads, it is conjectured
|
|
that a BIOS translation is active, and the geometry is
|
|
remapped to use 32, 64, 128 or 255 heads, respectively.
|
|
However, no remapping is done when the current idea of the
|
|
geometry already has 63 sectors per track and at least as
|
|
many heads (since this probably means that a remapping was
|
|
done already).
|
|
|
|
<sect1>Getting rid of a disk manager<p>
|
|
When Linux detects OnTrack Disk Manager, it will shift all disk
|
|
accesses by 63 sectors. Similarly, when Linux detects EZ-Drive,
|
|
it shifts all accesses of sector 0 to sector 1.
|
|
This means that it may be difficult to get rid of these disk managers.
|
|
Most disk managers have an uninstall option, but if you need to remove
|
|
some disk manager an approach that often works is to give an explicit
|
|
disk geometry on the command line. Now Linux skips the <tt>ide_xlate_1024()</tt>
|
|
routine, and one can wipe out the partition table with disk manager
|
|
(and probably lose access to all disk data) with the command
|
|
<tscreen><verb>
|
|
dd if=/dev/zero of=/dev/hdx bs=512 count=1
|
|
</verb></tscreen>
|
|
The details depend a little on kernel version.
|
|
Recent kernels (since 2.3.21) recognize boot parameters like "hda=remap" and
|
|
"hdb=noremap", so that it is possible to get or avoid the EZD shift regardless of
|
|
the contents of the partition table. The "hdX=noremap" boot parameter also
|
|
avoids the OnTrack Disk Manager shift.
|
|
|
|
<sect1>Since 2.5.70: boot parameters<p>
|
|
|
|
In 2.5.70 the automatic disk manager support was removed.
|
|
Instead, two boot options were added: "hda=remap" to do
|
|
the EZ-Drive remapping of sector 0 to sector 1, and
|
|
"hda=remap63" to do the OnTrack Disk Manager shift over 63 sectors.
|
|
|
|
This also means that it no longer is a problem to get rid of
|
|
a disk manager.
|
|
|
|
<sect>
|
|
Consequences
|
|
<p>
|
|
<nidx>disk!consequences of translation</nidx>
|
|
What does all of this mean? For Linux users only one thing:
|
|
that they must make sure that LILO and <tt/fdisk/ use the right
|
|
geometry where `right' is defined for <tt/fdisk/ as the geometry
|
|
used by the other operating systems on the same disk, and for
|
|
LILO as the geometry that will enable successful interaction
|
|
with the BIOS at boot time. (Usually these two coincide.)
|
|
|
|
How does <tt/fdisk/ know about the geometry?
|
|
There are three sources of information. First, if the user has specified
|
|
the geometry interactively or on the command line, then we take
|
|
the user input. Second, if it is possible to guess the geometry used
|
|
from the partition table, then we use that. Third, when nothing else
|
|
is available, <tt/fdisk/ asks the kernel, using the <tt/HDIO_GETGEO/ ioctl.
|
|
|
|
How does LILO know about the geometry?
|
|
It asks the kernel, using the <tt/HDIO_GETGEO/ ioctl.
|
|
But the user can override the geometry using the `<tt/disk=/' option
|
|
in <tt>/etc/lilo.conf</tt> (see lilo.conf(5)).
|
|
One may also give the <tt/linear/ option to LILO, and it will store
|
|
LBA addresses instead of CHS addresses in its map file,
|
|
and find out of the geometry to use at boot time (by using
|
|
INT 13 Function 8 to ask for the drive geometry).
|
|
|
|
How does the kernel know what to answer?
|
|
Well, first of all, the user may have specified an explicit geometry
|
|
with a `<tt/hda=/<it/cyls/<tt/,/<it/heads/<tt/,/<it/secs/'
|
|
kernel command line option (see bootparam(7)), perhaps by hand, or by
|
|
asking the boot loader to supply such an option to the kernel.
|
|
For example, one can tell LILO to supply such an option by adding
|
|
an `<tt/append = "hda=/<it/cyls/<tt/,/<it/heads/<tt/,/<it/secs/<tt/"/'
|
|
line in <tt>/etc/lilo.conf</tt> (see lilo.conf(5)).
|
|
And otherwise the kernel will guess, possibly using values
|
|
obtained from the BIOS or the hardware.
|
|
|
|
Note that values guessed by the kernel are very unreliable.
|
|
The kernel does not have a good way of finding out what values
|
|
the BIOS uses, or indeed whether the disk is known to the BIOS at all.
|
|
|
|
It is possible (since Linux 2.1.79) to change the kernel's ideas
|
|
about the geometry by using the <tt>/proc</tt> filesystem.
|
|
For example
|
|
<tscreen><verb>
|
|
# sfdisk -g /dev/hdc
|
|
/dev/hdc: 4441 cylinders, 255 heads, 63 sectors/track
|
|
# cd /proc/ide/ide1/hdc
|
|
# echo bios_cyl:17418 bios_head:128 bios_sect:32 > settings
|
|
# sfdisk -g /dev/hdc
|
|
/dev/hdc: 17418 cylinders, 128 heads, 32 sectors/track
|
|
#
|
|
</verb></tscreen>
|
|
This is especially useful if you need so many boot parameters
|
|
that you overflow LILO's (very limited) command line length.
|
|
(It also helps if you want to influence a utility that gets its
|
|
idea of the geometry from the kernel via the HDIO_GETGEO ioctl.)
|
|
|
|
Since Linux 2.6.5 the kernel will (when compiled with CONFIG_EDD)
|
|
ask the BIOS for legacy_cylinders, legacy_heads, legacy_sectors
|
|
using INT 13/AH=08h. The values obtained are made available in
|
|
<tt>/sys/firmware/edd/int13_dev{80,81,82,83}/legacy_*</tt>.
|
|
In 2.6.5 the files were <tt>legacy_{cylinders,heads,sectors}</tt>
|
|
(with contents in hex, e.g. 0xfe for 254), but those names are
|
|
confusing, and in 2.6.7 they were changed to <tt>legacy_max_cylinder</tt>,
|
|
<tt>legacy_max_head</tt>, <tt>legacy_sectors_per_track</tt>
|
|
(with contents in decimal).
|
|
A geometry like C/H/S=1000/255/63 is found here as 999, 254, 63.
|
|
<tscreen><verb>
|
|
# insmod edd.ko
|
|
# cd /sys/firmware/edd/int13_dev83
|
|
# cat legacy_max_head
|
|
254
|
|
# cat sectors
|
|
120064896
|
|
#
|
|
</verb></tscreen>
|
|
Thus, we see here a disk with 255 heads and 120064896 sectors in all.
|
|
Careful comparison shows that this is <tt>/dev/hdf</tt>.
|
|
|
|
How does the BIOS know about the geometry?
|
|
The user may have specified it in the CMOS setup.
|
|
Or the geometry is read from the disk, and possibly translated
|
|
as specified in the setup. In the case of SCSI disks, where no
|
|
geometry exists, the geometry that the BIOS has to invent can
|
|
often be specified by jumpers or setup options. (For example,
|
|
Adaptec controllers have the possibility to choose between
|
|
the usual H=64, S=32 and the `extended translation' H=255, S=63.)
|
|
Sometimes the BIOS reads the partition table to see with what
|
|
geometry the disk was last partitioned - it will assume that
|
|
a valid partition table is present when the 55aa signature
|
|
is present. This is good, in that it allows moving disks to
|
|
a different machine. But having the BIOS behaviour depend on
|
|
the disk contents also causes strange problems.
|
|
(For example, it has been <htmlurl name="reported"
|
|
url="http://www.heise.de/ct/faq/hotline/98/07/hotline9807_11.shtml">
|
|
that a 2.5 GB disk was seen as having 528 MB because the BIOS read
|
|
the partition table and concluded that it should use untranslated
|
|
CHS. Another effect is found in the <htmlurl name="report"
|
|
url="http://www.heise.de/ct/faq/hotline/98/19/hotline9819_11.shtml">
|
|
that unpartitioned disks were slower than partitioned ones,
|
|
because the BIOS tested 32-bit mode by reading the MBR and
|
|
seeing whether it correctly got the 55aa signature.)
|
|
|
|
How does the disk know about the geometry?
|
|
Well, the manufacturer invents a geometry that multiplies out
|
|
to approximately the right capacity. Many disks have jumpers
|
|
that change the reported geometry, in order to avoid BIOS bugs.
|
|
For example, all IBM disks allow the user to choose between
|
|
15 and 16 heads, and many manufacturers add jumpers to make
|
|
the disk seem smaller than 2.1 GB or 33.8 GB. See also
|
|
<ref id="jumpers" name="below">.
|
|
Sometimes there are utilities that change the disk firmware.
|
|
|
|
<sect1>
|
|
Computing LILO parameters
|
|
<p>
|
|
Sometimes it is useful to force a certain geometry
|
|
by adding `<tt/hda=/<it/cyls/<tt/,/<it/heads/<tt/,/<it/secs/'
|
|
on the kernel command line. Almost always one wants <it/secs/=63,
|
|
and the purpose of adding this is to specify <it/heads/.
|
|
(Reasonable values today are <it/heads/=16 and <it/heads/=255.)
|
|
What should one specify for <it/cyls/? Precisely that number
|
|
that will give the right total capacity of C*H*S sectors.
|
|
For example, for a drive with 71346240 sectors (36529274880 bytes)
|
|
one would compute C as 71346240/(255*63)=4441 (for example using
|
|
the program <tt/bc/), and give boot parameter <tt/hdc=4441,255,63/.
|
|
How does one know the right total capacity? For example,
|
|
<tscreen><verb>
|
|
# hdparm -g /dev/hdc | grep sectors
|
|
geometry = 4441/255/63, sectors = 71346240, start = 0
|
|
# hdparm -i /dev/hdc | grep LBAsects
|
|
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=71346240
|
|
</verb></tscreen>
|
|
gives two ways of finding the total number of sectors 71346240.
|
|
Recent kernels also give the precise size in the boot messages:
|
|
<tscreen><verb>
|
|
# dmesg | grep hde
|
|
hde: Maxtor 93652U8, ATA DISK drive
|
|
hde: 71346240 sectors (36529 MB) w/2048KiB Cache, CHS=70780/16/63
|
|
hde: hde1 hde2 hde3 < hde5 > hde4
|
|
hde2: <bsd: hde6 hde7 hde8 hde9 >
|
|
</verb></tscreen>
|
|
Older kernels only give MB and CHS. In general the CHS value is
|
|
rounded down, so that the above output tells us that there are
|
|
at least 70780*16*63=71346240 sectors. In this example that happens
|
|
to be the precise value. The MB value may be rounded instead of
|
|
truncated, and in old kernels may be `binary' (MiB) instead of decimal.
|
|
Note the agreement between the kernel size in MB and the Maxtor model number.
|
|
Also in the case of SCSI disks the precise number of sectors is given
|
|
in the kernel boot messages:
|
|
<tscreen><verb>
|
|
SCSI device sda: 17755792 512-byte hdwr sectors (9091 MB)
|
|
</verb></tscreen>
|
|
|
|
<sect>Details<p>
|
|
<sect1>IDE details - the seven geometries<p>
|
|
<nidx>disk!IDE geometry setting</nidx>
|
|
The IDE driver has five sources of information about the geometry.
|
|
The first (G_user) is the one specified by the user on the command line.
|
|
The second (G_bios) is the BIOS Fixed Disk Parameter Table
|
|
(for first and second disk only) that is read on system startup,
|
|
before the switch to 32-bit mode.
|
|
The third (G_phys) and fourth (G_log) are returned by the IDE controller
|
|
as a response to the <ref id="identify" name="IDENTIFY command"> - they
|
|
are the `physical' and `current logical' geometries.
|
|
|
|
On the other hand, the driver needs two values for the geometry:
|
|
on the one hand G_fdisk, returned by a <tt/HDIO_GETGEO/ ioctl, and
|
|
on the other hand G_used, which is actually used for doing I/O.
|
|
Both G_fdisk and G_used are initialized to G_user if given, to
|
|
G_bios when this information is present according to CMOS, and
|
|
to G_phys otherwise. If G_log looks reasonable then G_used is set
|
|
to that. Otherwise, if G_used is unreasonable and G_phys looks
|
|
reasonable then G_used is set to G_phys. Here `reasonable' means
|
|
that the number of heads is in the range 1-16.
|
|
|
|
To say this in other words: the command line overrides the BIOS,
|
|
and will determine what <tt/fdisk/ sees, but if it specifies a
|
|
translated geometry (with more than 16 heads), then for kernel I/O
|
|
it will be overridden by output of the IDENTIFY command.
|
|
|
|
Note that G_bios is rather unreliable: for systems booting from SCSI
|
|
the first and second disk may well be SCSI disks, and the geometry
|
|
that the BIOS reported for sda is used by the kernel for hda.
|
|
Moreover, disks that are not mentioned in the BIOS Setup are not
|
|
seen by the BIOS. This means that, e.g., in an IDE-only system where
|
|
hdb is not given in the Setup, the geometries reported by the BIOS
|
|
for the first and second disk will apply to hda and hdc.
|
|
|
|
In order to avoid such confusion, since Linux 2.5.51 G_bios is
|
|
not used anymore.
|
|
|
|
<sect2>The IDENTIFY DRIVE command
|
|
<label id="identify">
|
|
<p>
|
|
When an IDE drive is sent the IDENTIFY DRIVE (0xec) command,
|
|
it will return 256 words (512 bytes) of information.
|
|
This contains lots of technical stuff.
|
|
Let us only describe here what plays a role in geometry matters.
|
|
The words are numbered 0-255.
|
|
<p>
|
|
We find four pieces of information here: DefaultCHS (words 1,3,6),
|
|
CurrentCHS (words 54-58) and LBAcapacity (words 60-61) and
|
|
48-bit capacity (words 100-103).
|
|
|
|
<p><table><tabular ca="c|l">
|
|
| Description | Example @@
|
|
0 | bit field: bit 6: fixed disk, bit 7: removable medium | 0x0040 @@
|
|
1 | Default number of cylinders | 16383 @
|
|
3 | Default number of heads | 16 @
|
|
6 | Default number of sectors per track | 63 @@
|
|
10-19 | Serial number (in ASCII) | G8067TME @
|
|
23-26 | Firmware revision (in ASCII) | GAK&1B0 @
|
|
27-46 | Model name (in ASCII) | Maxtor 4G160J8 @@
|
|
49 | bit field: bit 9: LBA supported | 0x2f00 @@
|
|
53 | bit field: bit 0: words 54-58 are valid | 0x0007 @
|
|
54 | Current number of cylinders | 16383 @
|
|
55 | Current number of heads | 16 @
|
|
56 | Current number of sectors per track | 63 @
|
|
57-58 | Current LBA capacity | 16514064 @@
|
|
60-61 | Default LBA capacity | 268435455 @@
|
|
82-83 | Command sets supported | 7c69 4f09 @@
|
|
85-86 | Command sets enabled | 7c68 0e01 @@
|
|
100-103 | Maximum user LBA for 48-bit addressing | 320173056 @@
|
|
255 | Checksum and signature (0xa5) | 0x44a5 @@
|
|
</tabular></table>
|
|
|
|
In the ASCII strings each word contains two characters,
|
|
the high order byte the first, the low order byte the second.
|
|
The 32-bit values are given with low order word first.
|
|
Words 54-58 are set by the command INITIALIZE DRIVE PARAMETERS (0x91).
|
|
They are significant only when CHS addressing is used, but may
|
|
help to find the actual disk size in case the disk sets
|
|
DefaultCHS to 4092/16/63 in order to avoid BIOS problems.
|
|
|
|
Sometimes, when a jumper causes a big drive to misreport LBAcapacity
|
|
(often to 66055248 sectors, in order to stay below the 33.8 GB limit),
|
|
one needs a fourth piece of information to find the actual disk size,
|
|
namely the result of the READ NATIVE MAX ADDRESS (0xf8) command.
|
|
|
|
<sect1>SCSI details<p>
|
|
<nidx>disk!SCSI geometry setting</nidx>
|
|
The situation for SCSI is slightly different, as the SCSI commands
|
|
already use logical block numbers, so a `geometry' is entirely
|
|
irrelevant for actual I/O.
|
|
However, the format of the partition table is still the same,
|
|
so <tt/fdisk/ has to invent some geometry, and also uses <tt/HDIO_GETGEO/ here -
|
|
indeed, <tt/fdisk/ does not distinguish between IDE and SCSI disks.
|
|
As one can see from the detailed description below, the various
|
|
drivers each invent a somewhat different geometry. Indeed, one big mess.
|
|
<p>
|
|
If you are not using DOS or so, then avoid all extended translation
|
|
settings, and just use 64 heads, 32 sectors per track (for a nice,
|
|
convenient 1 MiB per cylinder), if possible, so that no problems
|
|
arise when you move the disk from one controller to another.
|
|
Some SCSI disk drivers (aha152x, pas16, ppa, qlogicfas, qlogicisp)
|
|
are so nervous about DOS compatibility that they will not allow
|
|
a Linux-only system to use more than about 8 GiB. This is a bug.
|
|
<p>
|
|
What is the real geometry?
|
|
The easiest answer is that there is no such thing.
|
|
And if there were, you wouldn't want to know, and certainly
|
|
NEVER, EVER tell <tt/fdisk/ or LILO or the kernel about it.
|
|
It is strictly a business between the SCSI controller and the disk.
|
|
Let me repeat that: only silly people tell <tt/fdisk//LILO/kernel about
|
|
the true SCSI disk geometry.
|
|
<p>
|
|
But if you are curious and insist, you might ask the disk itself.
|
|
There is the important command READ CAPACITY that will give the total
|
|
size of the disk, and there is the MODE SENSE command, that in the
|
|
Rigid Disk Drive Geometry Page (page 04) gives the number of cylinders
|
|
and heads (this is information that cannot be changed), and in the
|
|
Format Page (page 03) gives the number of bytes per sector,
|
|
and sectors per track. This latter number is typically dependent upon
|
|
the notch, and the number of sectors per track varies - the outer
|
|
tracks have more sectors than the inner tracks.
|
|
The Linux program <tt/scsiinfo/ will give this information.
|
|
There are many details and complications, and it is clear that nobody
|
|
(probably not even the operating system) wants to use this information.
|
|
Moreover, as long as we are only concerned about <tt/fdisk/ and LILO,
|
|
one typically gets answers like C/H/S=4476/27/171 - values that
|
|
cannot be used by <tt/fdisk/ because the partition table reserves only
|
|
10 resp. 8 resp. 6 bits for C/H/S.
|
|
<p>
|
|
Then where does the kernel <tt/HDIO_GETGEO/ get its information from?
|
|
Well, either from the SCSI controller, or by making an educated guess.
|
|
Some drivers seem to think that we want to know `reality', but
|
|
of course we only want to know what the DOS or OS/2 FDISK
|
|
(or Adaptec AFDISK, etc) will use.
|
|
<p>
|
|
Note that Linux <tt/fdisk/ needs the numbers H and S of heads and sectors
|
|
per track to convert LBA sector numbers into c/h/s addresses, but the
|
|
number C of cylinders does not play a role in this conversion.
|
|
Some drivers use (C,H,S) = (1023,255,63) to signal that the drive
|
|
capacity is at least 1023<tt/*/255<tt/*/63 sectors. This is unfortunate,
|
|
since it does not reveal the actual size, and will limit the
|
|
users of most <tt/fdisk/ versions to about 8 GiB of their disks -
|
|
a real limitation in these days.
|
|
<p>
|
|
In the description below, M denotes the total disk capacity,
|
|
and C, H, S the number of cylinders, heads and sectors per track.
|
|
It suffices to give H, S if we regard C as defined by M / (H<tt/*/S).
|
|
<p>
|
|
By default, H=64, S=32.
|
|
<p>
|
|
<descrip>
|
|
<tag/aha1740, dtc, g_NCR5380, t128, wd7000:/ <p>
|
|
H=64, S=32.
|
|
<p>
|
|
<tag/aha152x, pas16, ppa, qlogicfas, qlogicisp:/ <p>
|
|
H=64, S=32 unless C > 1024, in which case
|
|
H=255, S=63, C = min(1023, M/(H<tt/*/S)).
|
|
(Thus C is truncated, and H<tt/*/S<tt/*/C is not an approximation to
|
|
the disk capacity M. This will confuse most versions of <tt/fdisk/.)
|
|
The <tt/ppa.c/ code uses M+1 instead of M and says that due to a
|
|
bug in <tt/sd.c/ M is off by 1.
|
|
<p>
|
|
<tag/advansys:/ <p>
|
|
H=64, S=32 unless C > 1024 and moreover the `> 1 GB' option
|
|
in the BIOS is enabled, in which case H=255, S=63.
|
|
<p>
|
|
<tag/aha1542:/ <p>
|
|
Ask the controller which of two possible translation schemes
|
|
is in use, and use either H=255, S=63 or H=64, S=32. In the former
|
|
case there is a boot message "aha1542.c: Using extended bios translation".
|
|
<p>
|
|
<tag/aic7xxx:/ <p>
|
|
H=64, S=32 unless C > 1024, and moreover
|
|
either the "extended" boot parameter was given,
|
|
or the `extended' bit was set in the SEEPROM or BIOS,
|
|
in which case H=255, S=63.
|
|
In Linux 2.0.36 this extended translation would always be set
|
|
in case no SEEPROM was found, but in Linux 2.2.6 if no SEEPROM
|
|
is found extended translation is set only when the user asked
|
|
for it using this boot parameter (while when a SEEPROM is found,
|
|
the boot parameter is ignored).
|
|
This means that a setup that works under 2.0.36 may fail to boot
|
|
with 2.2.6 (and require the <tt>linear</tt> keyword for LILO, or
|
|
the <tt>aic7xxx=extended</tt> kernel boot parameter).
|
|
<p>
|
|
<tag/buslogic:/ <p>
|
|
H=64, S=32 unless C >= 1024, and moreover extended translation
|
|
was enabled on the controller, in which case if M < 2^22 then
|
|
H=128, S=32; otherwise H=255, S=63. However, after making this choice
|
|
for (C,H,S), the partition table is read, and if for one of the
|
|
three possibilities (H,S) = (64,32), (128,32), (255,63) the value
|
|
endH=H-1 is seen somewhere then that pair (H,S) is used, and a boot message
|
|
is printed "Adopting Geometry from Partition Table".
|
|
<p>
|
|
<tag/fdomain:/ <p>
|
|
Find the geometry information in the BIOS Drive Parameter Table,
|
|
or read the partition table and use H=endH+1, S=endS for the first
|
|
partition, provided it is nonempty, or use H=64, S=32 for M < 2^21 (1 GiB),
|
|
H=128, S=63 for M < 63<tt/*/2^17 (3.9 GiB) and H=255, S=63 otherwise.
|
|
<p>
|
|
<tag/in2000:/ <p>
|
|
Use the first of (H,S) = (64,32), (64,63), (128,63), (255,63)
|
|
that will make C <= 1024. In the last case, truncate C at 1023.
|
|
<p>
|
|
<tag/seagate:/ <p>
|
|
Read C,H,S from the disk. (Horrors!) If C or S is too large, then
|
|
put S=17, H=2 and double H until C <= 1024. This means that H will
|
|
be set to 0 if M > 128<tt/*/1024<tt/*/17 (1.1 GiB). This is a bug.
|
|
<p>
|
|
<tag/ultrastor and u14_34f:/ <p>
|
|
One of three mappings
|
|
((H,S) = (16,63), (64,32), (64,63))
|
|
is used depending on the controller mapping mode.
|
|
<p>
|
|
</descrip>
|
|
If the driver does not specify the geometry, we fall back
|
|
on an educated guess using the partition table, or using the
|
|
total disk capacity.
|
|
<p>
|
|
Look at the partition table. Since by convention partitions end
|
|
on a cylinder boundary, we can, given <tt/end = (endC,endH,endS)/
|
|
for any partition, just put H = <tt/endH+1/ and S = <tt/endS/. (Recall
|
|
that sectors are counted from 1.)
|
|
More precisely, the following is done.
|
|
If there is a nonempty partition, pick the partition with the largest <tt/beginC/.
|
|
For that partition, look at <tt/end+1/, computed
|
|
both by adding <tt/start/ and <tt/length/ and by assuming that this
|
|
partition ends on a cylinder boundary. If both values agree, or
|
|
if <tt/endC/ = 1023 and <tt/start+length/ is an integral multiple of
|
|
<tt/(endH+1)<tt/*/endS/,
|
|
then assume that this partition really was aligned on a cylinder
|
|
boundary, and put H = <tt/endH+1/ and S = <tt/endS/.
|
|
If this fails, either because there are no partitions, or because
|
|
they have strange sizes, then look only at the disk capacity M.
|
|
Algorithm: put H = M/(62<tt/*/1024) (rounded up), S = M/(1024<tt/*/H)
|
|
(rounded up), C = M/(H<tt/*/S) (rounded down).
|
|
This has the effect of producing a (C,H,S) with C at most 1024
|
|
and S at most 62.
|
|
|
|
<sect>
|
|
Clipped disks
|
|
<p>
|
|
<sect1>
|
|
The Linux IDE 8 GiB limit
|
|
<p>
|
|
The Linux IDE driver gets the geometry and capacity of a disk
|
|
(and lots of other stuff) by using an
|
|
<ref id="identify" name="ATA IDENTIFY"> request.
|
|
Linux kernels older than 2.0.34/2.1.90 would not believe the returned value
|
|
of lba_capacity if it was more than 10% larger than the capacity
|
|
computed by C<tt/*/H<tt/*/S. However, by industry agreement
|
|
large IDE disks (with more than 16514064 sectors)
|
|
return C=16383, H=16, S=63, for a total of 16514064 sectors (7.8 GB)
|
|
independent of their actual size, but give their actual size in
|
|
lba_capacity.
|
|
<p>
|
|
Since versions 2.0.34/2.1.90, Linux kernels know about this
|
|
and do the right thing. If you have an older Linux kernel and do
|
|
not want to upgrade, and this kernel only sees 8 GiB of a much larger disk,
|
|
then try changing the routine <tt/lba_capacity_is_ok/ in
|
|
<tt>/usr/src/linux/drivers/block/ide.c</tt> into something like
|
|
<tscreen><verb>
|
|
static int lba_capacity_is_ok (struct hd_driveid *id) {
|
|
id->cyls = id->lba_capacity / (id->heads * id->sectors);
|
|
return 1;
|
|
}
|
|
</verb></tscreen>
|
|
For a more cautious patch, see 2.1.90.
|
|
|
|
<sect1>
|
|
BIOS complications
|
|
<p>
|
|
As just mentioned, large disks return the geometry
|
|
C=16383, H=16, S=63 independent of the actual size,
|
|
while the actual size is returned in the value of LBAcapacity.
|
|
Some BIOSes do not recognize this, and translate this
|
|
16383/16/63 into something with fewer cylinders and more heads,
|
|
for example 1024/255/63 or 1027/255/63. So, the kernel must not
|
|
only recognize the single geometry 16383/16/63, but also all
|
|
BIOS-mangled versions of it.
|
|
Since 2.2.2 this is done correctly (by taking the BIOS idea
|
|
of H and S, and computing C = capacity/(H*S)).
|
|
Usually this problem is solved by setting the disk to Normal
|
|
in the BIOS setup (or, even better, to None, not mentioning
|
|
it at all to the BIOS). If that is impossible because you have
|
|
to boot from it or use it also with DOS/Windows, and upgrading
|
|
to 2.2.2 or later is not an option, use kernel boot parameters.
|
|
<p>
|
|
If a BIOS reports 16320/16/63, then this is usually done
|
|
in order to get 1024/255/63 after translation.
|
|
<p>
|
|
There is an additional problem here. If the disk was partitioned
|
|
using a geometry translation, then the kernel may at boot time
|
|
see this geometry used in the partition table, and report
|
|
<tt>hda: [PTBL] [1027/255/63]</tt>. This is bad, because now the
|
|
disk is only 8.4 GB. This was fixed in 2.3.21. Again, kernel
|
|
boot parameters will help.
|
|
|
|
<sect1>
|
|
Jumpers that select the number of heads
|
|
<label id="jumpers">
|
|
<p>
|
|
Many disks have jumpers that allow you to choose between
|
|
a 15-head an a 16-head geometry. The default settings will give
|
|
you a 16-head disk. Sometimes both geometries address the same
|
|
number of sectors, sometimes the 15-head version is smaller.
|
|
There may be a good reason for this setup: Petri Kaukasoina
|
|
writes: `A 10.1 Gig IBM Deskstar 16 GP (model IBM-DTTA-351010) was
|
|
jumpered for 16 heads as default but this old PC (with AMI BIOS)
|
|
didn't boot and I had to jumper it for 15 heads. hdparm -i tells
|
|
RawCHS=16383/15/63 and LBAsects=19807200. I use 20960/15/63 to
|
|
get the full capacity.'
|
|
For the jumper settings, see
|
|
<htmlurl
|
|
name="http://www.hitachigst.com/hdd/support/jumpers.htm"
|
|
url="http://www.hitachigst.com/hdd/support/jumpers.htm">.
|
|
|
|
<sect1>
|
|
Jumpers that clip total capacity
|
|
<p>
|
|
Many disks have jumpers that allow you to make the disk
|
|
appear smaller than it is. A silly thing to do, and probably
|
|
no Linux user ever wants to use this, but some BIOSes crash
|
|
on big disks. The usual solution is to keep the disk entirely
|
|
out of the BIOS setup. But this may be feasible only if the
|
|
disk is not your boot disk.
|
|
<p>
|
|
<sect2>Clip to 2.1 GB<p>
|
|
The first serious limit was the 4096 cylinder limit (that is,
|
|
with 16 heads and 63 sectors/track, 2.11 GB).
|
|
For example, a Fujitsu MPB3032ATU 3.24 GB disk has default geometry
|
|
6704/15/63, but can be jumpered to appear as 4092/16/63,
|
|
and then reports LBAcapacity 4124736 sectors, so that the operating
|
|
system cannot guess that it is larger in reality.
|
|
In such a case (with a BIOS that crashes if it hears how big the disk is
|
|
in reality, so that the jumper is required) one needs boot parameters
|
|
to tell Linux about the size of the disk.
|
|
<p>
|
|
That is unfortunate. Most disks can be jumpered so as to appear as a 2 GB disk
|
|
and then report a clipped geometry like 4092/16/63 or 4096/16/63, but still
|
|
report full LBAcapacity. Such disks will work well, and use full capacity
|
|
under Linux, regardless of jumper settings.
|
|
<p>
|
|
<sect2>Clip to 33 GB
|
|
<label id="jumperbig">
|
|
<p>
|
|
A more recent limit is <ref id="verylarge" name="the 33.8 GB limit">.
|
|
Linux kernels older than 2.2.14 / 2.3.21 need a patch to be able to cope with
|
|
IDE disks larger than this.
|
|
<p>
|
|
With an old BIOS and a disk larger than 33.8 GB, the BIOS may hang,
|
|
and in such cases booting may be impossible, even when the disk
|
|
is removed from the CMOS settings.
|
|
<!-- doesnt exist anymore
|
|
See also <htmlurl name="the BIOS 33.8 GB limit"
|
|
url="http://www.storage.ibm.com/techsup/hddtech/bios338gb.htm">.
|
|
-->
|
|
<p>
|
|
Therefore, large IBM and Maxtor and Seagate disks come with a jumper
|
|
that make the disk appear as a 33.8 GB disk.
|
|
For example, the IBM Deskstar 37.5 GB (DPTA-353750) with 73261440 sectors
|
|
(corresponding to 72680/16/63, or 4560/255/63) can be jumpered to appear
|
|
as a 33.8 GB disk, and then reports geometry 16383/16/63 like any big disk,
|
|
but LBAcapacity 66055248 (corresponding to 65531/16/63, or 4111/255/63).
|
|
Similar things hold for recent large Maxtor disks.
|
|
<p>
|
|
Below some more details that used
|
|
to be relevant but probably can be ignored now.
|
|
<p>
|
|
<sect3>Maxtor<p>
|
|
With the jumper present, both the geometry (16383/16/63) and the size
|
|
(66055248) are conventional and give no information about the actual size.
|
|
Moreover, attempts to access sector 66055248 and above yield I/O errors.
|
|
However, on Maxtor drives the actual size can be found and made accessible
|
|
using the READ NATIVE MAX ADDRESS and SET MAX ADDRESS commands.
|
|
Presumably this is what MaxBlast/EZ-Drive does.
|
|
There is a small Linux utility
|
|
<htmlurl url="http://www.win.tue.nl/~aeb/linux/setmax.c" name="setmax.c">
|
|
that does the same. Only very few disks need it - almost always
|
|
CONFIG_IDEDISK_STROKE does the trick.
|
|
<p>
|
|
For drives larger than 137 GB also READ NATIVE MAX ADDRESS returns
|
|
a conventional value, namely 0xfffffff, corresponding to 137 GB.
|
|
Here READ NATIVE MAX ADDRESS EXT and SET MAX ADDRESS EXT (using
|
|
48-bit addressing) are required. The <tt>setmax</tt> utility does not yet
|
|
know about this. A very small patch makes 2.5.3 handle this situation.
|
|
<p>
|
|
Early large Maxtor disks
|
|
<!-- (early releases of the 36GB drive in the DM36 family) -->
|
|
<!-- (older models of the DiamondMax 36 and 40) -->
|
|
<!-- 36GB confirmed -->
|
|
have an additional detail: the J46 jumper for these 34-40 GB disks
|
|
changes the geometry from 16383/16/63 to 4092/16/63 and does not
|
|
change the reported LBAcapacity.
|
|
This means that also with jumper present the BIOS (old Award 4.5*)
|
|
will hang at boot time. For this case Maxtor provides a utility
|
|
<htmlurl url="http://www.maxtor.com/technology/technotes/20012.html"
|
|
name="JUMPON.EXE"> that upgrades the firmware to make J46 behave as
|
|
described above.
|
|
<p>
|
|
On recent Maxtor drives the call <tt>setmax -d 0 /dev/hdX</tt> will
|
|
give you max capacity again. However, on slightly older drives a
|
|
firmware bug does not allow you to use <tt>-d 0</tt>, and
|
|
<tt>setmax -d 255 /dev/hdX</tt> returns you to almost full capacity.
|
|
For Maxtor D540X-4K, see below.
|
|
<p>
|
|
<sect3>IBM<p>
|
|
For IBM things are worse: the jumper really clips capacity
|
|
and there is no software way to get it back. The solution is
|
|
not to use the jumper but use <tt>setmax -m 66055248 /dev/hdX</tt>
|
|
to software-clip the disk. "How?" you say - "I cannot boot!".
|
|
IBM gives the tip: <it>If a system with Award BIOS hangs during drive
|
|
detection: Reboot the system and hold the F4 key to bypass autodetection
|
|
of the drive(s).</it> If this doesn't help, find a different computer,
|
|
connect the drive to it, and run <tt>setmax</tt> there. After doing this
|
|
you go back to the first machine and are in the same situation as
|
|
with jumpered Maxtor disks: booting works, and after getting past
|
|
the BIOS either a patched kernel or a <tt>setmax -d 0</tt>
|
|
gets you full capacity.
|
|
<p>
|
|
Thomas Charbonnel reports on a different approach:
|
|
"I had a 80 GB IBM IC35L080AVVA07-0 drive and installed IBM's
|
|
Disk Manager. Installed my boot loader on the drive's MBR.
|
|
Everything worked fine. Note that the IDE drive must become
|
|
the boot drive so that one can install only one 34+ GB drive
|
|
using this approach."
|
|
<p>
|
|
<sect3>Seagate<p>
|
|
Seagate disks have a jumper that will clip the reported number
|
|
of cylinders to 4092 on drives smaller than 33.8 GB, while it
|
|
will limit the reported LBA capacity (Identify words 60/61) to
|
|
33.8 GB on larger disks.
|
|
<p>
|
|
For models ST-340810A, ST-360020A, ST-380020A:
|
|
The ATA Read Native Max and Set Max commands may be used to reset
|
|
the true full capacity.
|
|
<p>
|
|
For models ST-340016A, ST-340823A, ST-340824A, ST-360021A, ST-380021A:
|
|
The ATA Set Features F1 sub-command will cause Identify Data words
|
|
60-61 to report the true full capacity.
|
|
<p>
|
|
<sect3>Maxtor D540X-4K<p>
|
|
The Maxtor Diamond Max drives 4K080H4, 4K060H3, 4K040H2 (aka D540X-4K)
|
|
are identical to the drives 4D080H4, 4D060H3, 4D040H2 (aka D540X-4D),
|
|
except that the jumper settings differ. A Maxtor FAQ specifies the
|
|
Master/Slave/CableSelect settings for them, but the capacity clip jumper
|
|
for the "4K" drives seems to be undocumented. Nils Ohlmeier reports that
|
|
he experimentally finds that it is the J42 jumper ("reserved for
|
|
factory use") closest to the power connector.
|
|
(The "4D" drives use the J46 jumper, like all other Maxtor drives.)
|
|
<p>
|
|
However, it may be that this undocumented jumper acts like the IBM jumper:
|
|
the machine boots correctly, but the disk has been clipped to 33 GB
|
|
and <tt>setmax -d 0</tt> does not help to get full capacity back.
|
|
And the IBM solution works: do not use any disk-clipping jumpers, but
|
|
first put the disk in a machine with non-broken BIOS, soft-clip it
|
|
with <tt>setmax -m 66055248 /dev/hdX</tt>, then put it back in the
|
|
first machine, and after booting run <tt>setmax -d 0 /dev/hdX</tt>
|
|
to get full capacity again.
|
|
<p>
|
|
In the meantime, some docs and pictures have appeared on the Maxtor site,
|
|
confirming part of the above. Compare
|
|
|
|
<figure><eps file="absent">
|
|
<img src="images/MaxtorStyle.gif">
|
|
</figure>
|
|
<figure><eps file="absent">
|
|
<img src="images/MaxtorStyleB.gif">
|
|
</figure>
|
|
<figure><eps file="absent">
|
|
<img src="images/MaxtorStyleC.gif">
|
|
</figure>
|
|
<p>
|
|
<sect3>Western Digital<p>
|
|
Some info, including the settings for capacity-clipping jumpers, is given on
|
|
<htmlurl url="http://support.wdc.com/techinfo/general/jumpers.asp"
|
|
name="the Western Digital site">. I do not know what precisely
|
|
these jumpers do.
|
|
<p>
|
|
<sect1>READ NATIVE MAX ADDRESS / SET MAX ADDRESS<p>
|
|
If an IDE/ATA disk has support for the Host Protected Area (HPA) feature set,
|
|
then it is possible to set the LBA capacity to any value below
|
|
the actual capacity. Access past the assigned point usually leads
|
|
to I/O errors. Since classical software finds out about the disk size
|
|
by looking at the LBA capacity field of the Identify information,
|
|
such software will not suspect that the disk actually is larger.
|
|
<p>
|
|
The actual total size of the disk is read using the
|
|
READ NATIVE MAX ADDRESS command.
|
|
This "soft disk size" is set using the SET MAX ADDRESS command.
|
|
It comes in two flavours: if the "volatile" bit is set, the
|
|
command will have effect until the next reboot or hardware reset;
|
|
otherwise the effect is permanent.
|
|
It is possible to protect settings with a password.
|
|
(For details, see the ATA standard.)
|
|
<p>
|
|
This clipped size has (at least) two applications:
|
|
on the one hand it is possible to fake a smaller disk,
|
|
so that the BIOS will not have problems, and have Linux,
|
|
or (for DOS/Windows) a disk manager restore total size;
|
|
on the other hand one can have a vendor area at the end,
|
|
inaccessible to the ordinary user.
|
|
<p>
|
|
For many of the disks discussed above, setting a jumper has
|
|
precisely this effect: LBA capacity is diminished while
|
|
the native max capacity remains the same, and the SET MAX ADDRESS
|
|
will restore full capacity.
|
|
|
|
<sect1>CONFIG_IDEDISK_STROKE<p>
|
|
The CONFIG_IDEDISK_STROKE option of Linux 2.4.19/2.5.3 and later,
|
|
will tell Linux to read the native max capacity and do a
|
|
SET MAX ADDRESS to get access to full capacity.
|
|
This configuration option lives under the heading
|
|
"Auto-Geometry Resizing support" in the
|
|
"IDE, ATA and ATAPI block devices" kernel configuration section.
|
|
<p>
|
|
The configuration option went away in 2.6.7
|
|
and was replaced by a (per-disk) boot parameter,
|
|
so that one can say "hda=stroke".
|
|
<p>
|
|
With this "stroke" option jumpered disks will in many cases
|
|
be handled correctly, i.e., be seen with full capacity
|
|
(in spite of the jumper). And the same holds when the disk
|
|
got a Host Protected Area in some other (non-jumper) way.
|
|
<p>
|
|
This is the preferred way to handle disks that need a jumper
|
|
because of a broken BIOS.
|
|
|
|
<sect>
|
|
The Linux 65535 cylinder limit
|
|
<p>
|
|
The <tt/HDIO_GETGEO/ ioctl returns the number of cylinders in a short.
|
|
This means that if you have more than 65535 cylinders, the number is
|
|
truncated, and (for a typical SCSI setup with 1 MiB cylinders)
|
|
a 80 GiB disk may appear as a 16 GiB one.
|
|
Once one recognizes what the problem is, it is easily avoided.
|
|
Use fdisk 2.10i or newer.
|
|
<p>
|
|
(The programming convention is to use the <tt/BLKGETSIZE/ ioctl
|
|
to get total size, and <tt/HDIO_GETGEO/ to get number of heads and
|
|
sectors/track, and, if needed, get C by C = size/(H*S).)
|
|
<sect1>
|
|
IDE problems with 34+ GB disks
|
|
<label id="verylarge">
|
|
<p>
|
|
(Below a discussion of Linux kernel problems. BIOS problems
|
|
and jumpers that clip capacity were discussed
|
|
<ref id="jumperbig" name="above">.)
|
|
<p>
|
|
Drives larger than 33.8 GB will not work with kernels older than
|
|
2.0.39 / 2.2.14 / 2.3.21.
|
|
The details are as follows.
|
|
Suppose you bought a new IBM-DPTA-373420 disk with a capacity
|
|
of 66835440 sectors (34.2 GB). Pre-2.3.21 kernels will tell you
|
|
that the size is 769*16*63 = 775152 sectors (0.4 GB), which
|
|
is a bit disappointing. And giving command line parameters
|
|
hdc=4160,255,63 doesn't help at all - these are just ignored.
|
|
What happens? The routine idedisk_setup()
|
|
retrieves the geometry reported by the disk (which is
|
|
16383/16/63) and overwrites what the user specified on
|
|
the command line, so that the user data is used only
|
|
for the BIOS geometry. The routine current_capacity()
|
|
or idedisk_capacity() recomputes the cylinder number as
|
|
66835440/(16*63)=66305, but since this is stored in a short,
|
|
it becomes 769. Since lba_capacity_is_ok() destroyed id->cyls,
|
|
every following call to it will return false, so that the
|
|
disk capacity becomes 769*16*63.
|
|
For several kernels a patch is available.
|
|
A patch for 2.0.38 can be found at
|
|
<htmlurl url="ftp://ftp.us.kernel.org/pub/linux/kernel/people/aeb/"
|
|
name="ftp.kernel.org">.
|
|
A patch for 2.2.12 can be found at
|
|
<htmlurl name="www.uwsg.indiana.edu"
|
|
url="http://www.uwsg.indiana.edu/hypermail/linux/kernel/9910.2/0636.html">
|
|
(some editing may be required to get rid of the html markup).
|
|
The 2.2.14 kernels do support these disks.
|
|
In the 2.3.* kernel series, there is support for these disks
|
|
since 2.3.21.
|
|
One can also `solve' the problem in hardware by
|
|
<ref id="jumperbig" name="using a jumper"> to clip the size to 33.8 GB.
|
|
In many cases a <ref id="biosupgrades" name="BIOS upgrade"> will be
|
|
required if one wants to boot from the disk.
|
|
|
|
<sect>
|
|
Extended and logical partitions
|
|
<p>
|
|
<ref id="partitiontable" name="Above,"> we saw the structure of
|
|
the MBR (sector 0): boot loader code followed by 4 partition
|
|
table entries of 16 bytes each, followed by an AA55 signature.
|
|
Partition table entries of type 5 or F or 85 (hex) have a special
|
|
significance: they describe <it>extended</it> partitions: blobs of
|
|
space that are further partitioned into <it>logical</it> partitions.
|
|
(So, an extended partition is only a box, it cannot be used itself,
|
|
one uses the logical partitions inside.)
|
|
Only the location of the first sector of an extended partition is
|
|
important. This first sector contains a partition table with four
|
|
entries: one a logical partition, one an extended partition, and
|
|
two unused. In this way one gets a chain of partition table sectors,
|
|
scattered over the disk, where the first one describes three primary
|
|
partitions and the extended partition, and each following partition
|
|
table sector describes one logical partition and the location of
|
|
the next partition table sector.
|
|
<p>
|
|
It is important to understand this: When people do something stupid
|
|
while partitioning a disk, they want to know: Is my data still there?
|
|
And the answer is usually: Yes. But if logical partitions were created
|
|
then the partition table sectors describing them are written at the
|
|
beginning of these logical partitions, and data that was there before is lost.
|
|
<p>
|
|
The program sfdisk will show the full chain. E.g.,
|
|
<tscreen><verb>
|
|
# sfdisk -l -x /dev/hda
|
|
|
|
Disk /dev/hda: 16 heads, 63 sectors, 33483 cylinders
|
|
Units = cylinders of 516096 bytes, blocks of 1024 bytes, counting from 0
|
|
|
|
Device Boot Start End #cyls #blocks Id System
|
|
/dev/hda1 0+ 101 102- 51376+ 83 Linux
|
|
/dev/hda2 102 2133 2032 1024128 83 Linux
|
|
/dev/hda3 2134 33482 31349 15799896 5 Extended
|
|
/dev/hda4 0 - 0 0 0 Empty
|
|
|
|
/dev/hda5 2134+ 6197 4064- 2048224+ 83 Linux
|
|
- 6198 10261 4064 2048256 5 Extended
|
|
- 2134 2133 0 0 0 Empty
|
|
- 2134 2133 0 0 0 Empty
|
|
|
|
/dev/hda6 6198+ 10261 4064- 2048224+ 83 Linux
|
|
- 10262 16357 6096 3072384 5 Extended
|
|
- 6198 6197 0 0 0 Empty
|
|
- 6198 6197 0 0 0 Empty
|
|
...
|
|
/dev/hda10 30581+ 33482 2902- 1462576+ 83 Linux
|
|
- 30581 30580 0 0 0 Empty
|
|
- 30581 30580 0 0 0 Empty
|
|
- 30581 30580 0 0 0 Empty
|
|
|
|
#
|
|
</verb></tscreen>
|
|
<p>
|
|
It is possible to construct bad partition tables.
|
|
Many kernels get into a loop if some extended partition points back
|
|
to itself or to an earlier partition in the chain.
|
|
It is possible to have two extended partitions in one of these
|
|
partition table sectors so that the partition table chain forks.
|
|
(This can happen for example with an fdisk that does not recognize
|
|
each of 5, F, 85 as an extended partition, and creates a 5 next to an F.)
|
|
No standard fdisk type program can handle such situations, and some
|
|
handwork is required to repair them.
|
|
The Linux kernel will accept a fork at the outermost level.
|
|
That is, you can have two chains of logical partitions.
|
|
Sometimes this is useful - for example, one can use type 5 and be
|
|
seen by DOS, and the other type 85, invisible for DOS, so that
|
|
DOS FDISK will not crash because of logical partitions past cylinder 1024.
|
|
Usually one needs <tt>sfdisk</tt> to create such a setup.
|
|
<p>
|
|
|
|
<sect>
|
|
Problem solving
|
|
<p>
|
|
Many people think they have problems, while in fact nothing is wrong.
|
|
Or, they think that the problems they have are due to disk geometry,
|
|
while in fact disk geometry has nothing to do with the matter.
|
|
All of the above may have sounded complicated, but disk geometry
|
|
handling is extremely easy: do nothing at all, and all is fine;
|
|
or perhaps give LILO the keyword <tt>lba32</tt> if it doesn't get past
|
|
`LI' when booting. Watch the kernel boot messages, and
|
|
remember: the more you fiddle with geometries (specifying heads
|
|
and cylinders to LILO and fdisk and on the kernel command line)
|
|
the less likely it is that things will work.
|
|
Roughly speaking, all is fine by default.
|
|
<p>
|
|
And remember: nowhere in Linux is disk geometry used, so no problem
|
|
you have while running Linux can be caused by disk geometry.
|
|
Indeed, disk geometry is used only by LILO and by fdisk.
|
|
So, if LILO fails to boot the kernel, that may be a geometry problem.
|
|
If different operating systems do not understand the partition table,
|
|
that may be a geometry problem. Nothing else. In particular, if
|
|
mount doesnt seem to work, never worry about disk geometry -
|
|
the problem is elsewhere.
|
|
<p>
|
|
<sect1>
|
|
Problem: My IDE disk gets a bad geometry when I boot from SCSI.
|
|
<p>
|
|
It is quite possible that a disk gets the wrong geometry.
|
|
The Linux kernel asks the BIOS about hd0 and hd1 (the BIOS drives
|
|
numbered 80H and 81H) and assumes that this data is for hda and hdb.
|
|
But on a system that boots from SCSI, the first two disks may well
|
|
be SCSI disks, and thus it may happen that the fifth disk, which is
|
|
the first IDE disk hda, gets assigned a geometry belonging to sda.
|
|
Such things are easily solved by giving boot parameters
|
|
`hda=C,H,S' for the appropriate numbers C, H and S, either at boot time
|
|
or in /etc/lilo.conf.
|
|
<p>
|
|
Since Linux 2.5.51 this BIOS information is not used anymore,
|
|
and the same problem occurs for all disks. See below.
|
|
|
|
<sect1>
|
|
Nonproblem: Identical disks have different geometry?
|
|
<p>
|
|
`I have two identical 10 GB IBM disks. However, fdisk
|
|
gives different sizes for them. Look:
|
|
<tscreen><verb>
|
|
# fdisk -l /dev/hdb
|
|
Disk /dev/hdb: 255 heads, 63 sectors, 1232 cylinders
|
|
Units = cylinders of 16065 * 512 bytes
|
|
|
|
Device Boot Start End Blocks Id System
|
|
/dev/hdb1 1 1232 9896008+ 83 Linux native
|
|
# fdisk -l /dev/hdd
|
|
Disk /dev/hdd: 16 heads, 63 sectors, 19650 cylinders
|
|
Units = cylinders of 1008 * 512 bytes
|
|
|
|
Device Boot Start End Blocks Id System
|
|
/dev/hdd1 1 19650 9903568+ 83 Linux native
|
|
</verb></tscreen>
|
|
How come?'
|
|
|
|
What is happening here? Well, first of all these drives
|
|
really are 10gig: hdb has size 255<tt/*/63<tt/*/1232<tt/*/512 = 10133544960,
|
|
and hdd has size 16<tt/*/63<tt/*/19650<tt/*/512 = 10141286400, so, nothing
|
|
is wrong and the kernel sees both as 10.1 GB.
|
|
Why the difference in size? That is because the kernel gets
|
|
data for the first two IDE disks from the BIOS, and the BIOS
|
|
has remapped hdb to have 255 heads (and 16<tt/*/19650/255=1232 cylinders).
|
|
The rounding down here costs almost 8 MB.
|
|
<p>
|
|
If you would like to remap hdd in the same way, give the kernel
|
|
boot parameters `hdd=1232,255,63'.
|
|
<p>
|
|
On the other hand, if the disk is not shared with DOS or so,
|
|
it may be better to set hdb to Normal in the BIOS setup,
|
|
instead of asking for some translation like LBA.
|
|
<p>
|
|
Since Linux 2.5.51, the IDE driver no longer uses BIOS info on the first
|
|
two disks, and the different treatment of the first two disks has disappeared.
|
|
|
|
<sect1>
|
|
Problem: 2.4 and 2.6 report different geometries?
|
|
2.6 reports the wrong geometry? 2.6 reports no geometry at all?
|
|
<p>
|
|
Since geometry does not exist, it is not surprising that each of
|
|
2.0/2.2/2.4/2.6 reports a somewhat different disk geometry.
|
|
<p>
|
|
Some people will maintain that geometry *does* exist, and in that
|
|
case do not mean a property of the disk, but mean the values
|
|
reported by the BIOS. That is what several other operating systems
|
|
will use. Since Linux 2.5.51, the kernel no longer uses the values
|
|
reported by the BIOS - it is difficult to match BIOS device numbers
|
|
with Linux disk names, maybe data is only available for two disks,
|
|
maybe some disks are not present in the BIOS setup, etc.
|
|
However, if one needs these values, since Linux 2.6.5 one can set
|
|
CONFIG_EDD and mount sysfs, and then find the BIOS data for the
|
|
various disks under <tt>/sys/firmware/edd/int13_dev*</tt>.
|
|
Now the matching of BIOS numbers, represented in directory names
|
|
like <tt>int13_dev82</tt>, with Linux names like <tt>sda</tt> can
|
|
be done by user space software, possibly with help from the user.
|
|
<p>
|
|
This 2.5.51 change caused problems when many people using both Linux
|
|
and Windows on the same disk upgraded from 2.4 to 2.6 and used as
|
|
partitioning tool the program <tt>parted</tt> that had not yet
|
|
been updated. I have not checked whether current parted is OK.
|
|
|
|
<sect1>
|
|
Nonproblem: fdisk sees much more room than df?
|
|
<p>
|
|
fdisk will tell you how many blocks there are on the disk.
|
|
If you make a filesystem on the disk, say with mke2fs, then
|
|
this filesystem needs some space for bookkeeping - typically
|
|
something like 4% of the filesystem size, more if you ask for
|
|
a lot of inodes during mke2fs. For example:
|
|
<tscreen><verb>
|
|
# sfdisk -s /dev/hda9
|
|
4095976
|
|
# mke2fs -i 1024 /dev/hda9
|
|
mke2fs 1.12, 9-Jul-98 for EXT2 FS 0.5b, 95/08/09
|
|
...
|
|
204798 blocks (5.00%) reserved for the super user
|
|
...
|
|
# mount /dev/hda9 /somewhere
|
|
# df /somewhere
|
|
Filesystem 1024-blocks Used Available Capacity Mounted on
|
|
/dev/hda9 3574475 13 3369664 0% /mnt
|
|
# df -i /somewhere
|
|
Filesystem Inodes IUsed IFree %IUsed Mounted on
|
|
/dev/hda9 4096000 11 4095989 0% /mnt
|
|
#
|
|
</verb></tscreen>
|
|
We have a partition with 4095976 blocks, make an ext2 filesystem
|
|
on it, mount it somewhere and find that it only has 3574475 blocks -
|
|
521501 blocks (12%) was lost to inodes and other bookkeeping.
|
|
Note that the difference between the total 3574475 and the 3369664
|
|
available to the user are the 13 blocks in use plus the 204798
|
|
blocks reserved for root. This latter number can be changed by tune2fs.
|
|
This `-i 1024' is only reasonable for news spools and the like,
|
|
with lots and lots of small files. The default would be:
|
|
<tscreen><verb>
|
|
# mke2fs /dev/hda9
|
|
# mount /dev/hda9 /somewhere
|
|
# df /somewhere
|
|
Filesystem 1024-blocks Used Available Capacity Mounted on
|
|
/dev/hda9 3958475 13 3753664 0% /mnt
|
|
# df -i /somewhere
|
|
Filesystem Inodes IUsed IFree %IUsed Mounted on
|
|
/dev/hda9 1024000 11 1023989 0% /mnt
|
|
#
|
|
</verb></tscreen>
|
|
Now only 137501 blocks (3.3%) are used for inodes, so that we have
|
|
384 MB more than before. (Apparently, each inode takes 128 bytes.)
|
|
On the other hand, this filesystem can have at most 1024000 files
|
|
(more than enough), against 4096000 (too much) earlier.
|
|
|
|
</article>
|
|
|