LDP/LDP/howto/linuxdoc/Large-Disk-HOWTO/Large-Disk-HOWTO.sgml

1782 lines
80 KiB
Plaintext

<!doctype linuxdoc system>
<article>
<title>Large Disk HOWTO
<author>Andries Brouwer, <tt/aeb@cwi.nl/
<date>v2.5, 2004-11-01
<abstract>
All about disk geometry and the 1024 cylinder and other limits for disks.
<nidx>HOWTOs!large disk</nidx>
<nidx>HOWTOs!disk, large</nidx>
</abstract>
<p>
For the most recent version of this text, see
<htmlurl url="http://www.win.tue.nl/~aeb/linux/Large-Disk.html"
name="www.win.tue.nl">.
<sect>
Large disks
<p>
You got a new disk. What to do? Well, on the software side:
use <tt/fdisk/ or <tt/cfdisk/ to create partitions,
and then <tt/mke2fs/ or <tt/mkreiserfs/ or so to create a filesystem,
and then <tt/mount/ to attach the new filesystem to the big file hierarchy.
Make sure you have relatively recent versions of these utilities -
often old versions have problems handling large disks.
<p>
You need not read this HOWTO since there are <em/no/ problems
with large hard disks these days.
<p>
Long ago, disks were large when they had a capacity larger than
528 MB, or than 8.4 GB, or than 33.8 GB. These days the interesting
limit is 137 GB. In all cases, sufficiently recent Linux kernels
handle the disk fine.
<p>
Sometimes booting requires some care, since Linux cannot help you
when it isn't running yet. But again, with a sufficiently recent
BIOS and boot loader there are no problems.
Most of the text below will treat the cases of
(i) ancient hardware,
(ii) broken hardware or BIOS,
(iii) several operating systems on the same disk,
(iv) booting old systems.
<p>
<bf>Advice</bf>
For large SCSI disks: Linux has supported them from very early on.
No action required.
For large IDE disks (over 8.4 GB): make sure your kernel is 2.0.34 or later.
For large IDE disks (over 33.8 GB): make sure your kernel is
2.0.39/2.2.14/2.3.21 or later.
For large IDE disks (over 137 GB): make sure your kernel is
2.4.19/2.5.3 or later.
If the kernel boots fine, and the boot messages indicate that it
recognizes the disk correctly, but there are problems with utilities,
upgrade the utilities.
If <ref id="LILO" name="LILO"> hangs at boot time, make sure you have
version 21.4 or later, and specify the keyword <tt>lba32</tt>
in the configuration file <tt>/etc/lilo.conf</tt>. With an older version
of LILO, try both with and without the <tt>linear</tt> keyword.
There may be geometry problems that can be solved by giving
an explicit geometry to kernel/LILO/fdisk.
If you have an old <tt/fdisk/ and it warns about
<ref id="overlap" name="overlapping"> partitions:
ignore the warnings, or check using <tt/cfdisk/ that really all is well.
For HPT366, see the <htmlurl name="Linux HPT366 HOWTO"
url="http://www.csie.ntu.edu.tw/~b6506063/hpt366/">.
If at boot time the kernel cannot read the partition table,
consider the possibility that UDMA66 was selected while
the controller or the cable or the disk drive did not
support UDMA66. In such a case every attempt to read will
fail, and reading the partition table is the first thing
the kernel does. Make sure no UDMA66 is used.
If the BIOS hangs at boot time because of a large disk, and
flashing a newer version is not an option, take the disk out
of the BIOS setup. If you have to boot from the disk, look
whether a capacity clipping jumper helps.
If you think something is wrong with the size of your disk,
make sure that you are not confusing binary and decimal <ref id="units">,
and realize that the free space that <tt/df/ reports on an empty disk
is a few percent smaller than the partition size, because there
is administrative overhead. Software that does not understand
48-bit addressing will view a 137+ GB disk as having a capacity
of 137 GB. When a capacity clipping <ref id="jumpers" name="jumper">
is present, a larger disk may have been clipped to 33 GB or to 2 GB.
If for a removable drive the kernel reports two different sizes,
then one is found from the drive, and the other from the disk/floppy.
This second value will be zero when the drive has no media.
<p>
Now, if you still think there are problems, or just are curious,
read on.
<p>
Below a rather detailed description of all relevant details.
I used kernel version 2.0.8 source as a reference.
Other versions may differ a bit.
<p>
<sect>
Units
<label id="units">
<p>
<nidx>units!megabyte</nidx>
<nidx>units!gigabyte</nidx>
A kilobyte (kB) is 1000 bytes.
A megabyte (MB) is 1000 kB.
A gigabyte (GB) is 1000 MB.
A terabyte (TB) is 1000 GB.
This is the
<htmlurl url="http://physics.nist.gov/cuu/Units/prefixes.html"
name="SI norm">.
However, there are people that use 1 MB=1024000 bytes and talk
about 1.44 MB floppies, and people who think that 1 MB=1048576 bytes.
Here I follow the
<htmlurl url="http://physics.nist.gov/cuu/Units/binary.html"
name="recent standard">
and write Ki, Mi, Gi, Ti for the binary units, so that
these floppies are 1440 KiB (1.47 MB, 1.41 MiB),
1 MiB is 1048576 bytes (1.05 MB),
1 GiB is 1073741824 bytes (1.07 GB)
and 1 TiB is 1099511627776 bytes (1.1 TB).
<p>
Quite correctly, the disk drive manufacturers follow the SI norm
and use the decimal units. However, Linux kernel boot messages
(for not-so-recent kernels) and some old fdisk-type programs
use the symbols MB and GB for binary, or
mixed binary-decimal units. So, before you think your disk is
smaller than was promised when you bought it, compute first the
actual size in decimal units (or just in bytes).
<p>
Concerning terminology and abbreviation for binary units,
<htmlurl name="Knuth" url="http://www-cs-staff.stanford.edu/~knuth/">
has an alternative <htmlurl name="proposal"
url="http://www-cs-staff.stanford.edu/~knuth/news99.html">, namely
to use KKB, MMB, GGB, TTB, PPB, EEB, ZZB, YYB and to call these
<it>large kilobyte</it>, <it>large megabyte</it>, ... <it>large yottabyte</it>.
He writes: `Notice that doubling the letter connotes both
binary-ness and large-ness.' This is a good proposal -
`large gigabyte' sounds better than `gibibyte'. For our purposes
however the only important thing is to stress that a megabyte
has precisely 1000000 bytes, and that some other term and abbreviation
is required if you mean something else.
<p>
<sect>
Disk Access
<p>
Disk access is done in units called <it>sectors</it>.
In order to read or write something from or to the disk, we have
to specify the position on the disk, for example by giving the
sector number.
If the disk is a SCSI disk, then this sector number goes directly
into the SCSI command and is understood by the disk.
If the disk is an IDE disk using LBA, then precisely the same holds.
But if the disk is old, RLL or MFM or IDE from before the LBA times,
then the disk hardware expects a triple (cylinder,head,sector) to
designate the desired spot on the disk.
<p>
<sect1>
Cylinders, heads and sectors
<p>
A disk has sectors numbered 0, 1, 2, ...
This is called <it>LBA addressing</it>.
<p>
In ancient times, before the advent of IDE disks,
disks had a <it>geometry</it> described by three constants
C, H, S: the number of cylinders, the number of heads,
the number of sectors per track.
The address of a sector was given by three numbers:
<it>c</it>, <it>h</it>, <it>s</it>: the cylinder number
(between 0 and C-1), the head number (between 0 and H-1),
and the sector number within the track (between 1 and S), where
for some mysterious reason <it>c</it> and <it>h</it> count from 0,
but <it>s</it> counts from 1. This is called <it>CHS addressing</it>.
<p>
No disk manufactured less than ten years ago has a geometry, but
this ancient 3D sector addressing is still used by the INT13
BIOS interface (with fantasy numbers C, H, S
unrelated to any physical reality).
<p>
The correspondence between the linear numbering and this 3D notation
is as follows: for a disk with C cylinders, H heads and S sectors/track
position (<it>c</it>,<it>h</it>,<it>s</it>) in 3D or CHS notation
is the same as position
<it>c</it><tt/*/H<tt/*/S + <it>h</it><tt/*/S + (<it>s</it>-1)
in linear or LBA notation.
<p>
Consequently, in order to access a very old non-SCSI disk, we need to know
its <em/geometry/, that is, the values of C, H and S.
(And if you don't know, there is a lot of good information on
<htmlurl url="http://www.thetechpage.com/cgi-bin/default.cgi"
name="www.thetechpage.com">.)
<p>
<sect1>
Sectorsize
<p>
<nidx>disk!sectorsize</nidx>
In the present text a sector has 512 bytes. This is almost always
true, but for example certain MO disks use a sectorsize of 2048 bytes,
and all capacities given below must be multiplied by four.
(When using <tt/fdisk/ on such disks, make sure you have version
2.9i or later, and give the `-b 2048' option.)
<sect1>
Disksize
<p>
<nidx>disk!disksize</nidx>
A disk with C cylinders, H heads and S sectors per track
has C<tt/*/H<tt/*/S sectors in all, and can store
C<tt/*/H<tt/*/S<tt/*/512 bytes.
For example, if the disk label says C/H/S=4092/16/63
then the disk has 4092<tt/*/16<tt/*/63=4124736 sectors, and can hold
4124736<tt/*/512=2111864832 bytes (2.11 GB).
There is an industry convention to give C/H/S=16383/16/63
for disks larger than 8.4 GB, and the disk size can no longer
be read off from the C/H/S values reported by the disk.
<sect1>
The 1024 cylinder and 8.5 GB limits
<p>
The old INT13 BIOS interface to disk I/O uses 24 bits to address
a sector: 10 bits for the cylinder, 8 bits for the head, and 6 bits
for the sector number within the track (counting from 1).
This means that this interface cannot address more than
1024*256*63 sectors, which is 8.5 GB (with 512-byte sectors).
And if the (fantasy) geometry specified for the disk has fewer
than 1024 cylinders, or 256 heads, or 63 sectors per track,
then this limit will be less.
<p>
(More precisely: with INT 13, AH selects the function to perform,
CH is the low 8 bits of the cylinder number, CL
has in bits 7-6 the high two bits of the cylinder number
and in bits 5-0 the sector number, <tt/DH/ is the head number,
and DL is the drive number (80h or 81h).
This explains part of the layout of the partition table.)
<p>
This state of affairs was rectified when the so-called Extended INT13
functions were introduced. A modern BIOS has no problems accessing
large disks.
<p>
(More precisely: DS:SI points at a 16-byte Disk Address Packet
that contains an 8-byte starting absolute block number.)
<p>
Linux does not use the BIOS, so does (and did) not have this problem.
<p>
However, this geometry stuff plays a role in the interpretation
of partition tables, so if Linux shares a disk with for example DOS,
then it needs to know what geometry DOS will think the disk has.
It also plays a role at boot time, where the BIOS has to load
a boot loader, and the boot loader has to load the operating system.
<p>
<sect1>
The 137 GB limit
<p>
The old ATA standard describes how to address a sector on an IDE disk
using 28 bits (8 bits for the sector, 4 for the head, 16 for the cylinder).
This means that an IDE disk can have at most 2^28 addressable sectors
With 512-byte sectors this is 2^37 bytes, that is, 137.4 GB.
<p>
The ATA-6 standard includes a specification how to address
past this 2^28 sector boundary. The new standard allows addressing
of 2^48 sectors. There is support in recent Linux kernels that
have incorporated Andre Hedrick's IDE patch, for example
2.4.18-pre7-ac3 and 2.5.3.
<p>
Maxtor sells 160 GB IDE disks since Fall 2001.
An old kernel will treat such disks as 137.4 GB disks.
<p>
<sect>
History of BIOS and IDE limits
<p>
<descrip>
<tag/ATA Specification (for IDE disks) - the 137 GB limit/
At most 65536 cylinders (numbered 0-65535), 16 heads (numbered 0-15),
255 sectors/track (numbered 1-255), for a maximum total capacity of
267386880 sectors (of 512 bytes each), that is, 136902082560 bytes (137 GB).
In Sept 2001, the first drives larger than this (160 GB Maxtor Diamondmax)
appeared.
<p>
<tag/BIOS Int 13 - the 8.5 GB limit/
At most 1024 cylinders (numbered 0-1023), 256 heads (numbered 0-255),
63 sectors/track (numbered 1-63) for a maximum total capacity of
8455716864 bytes (8.5 GB). This is a serious limitation today.
It means that DOS cannot use present day large disks.
<p>
<tag/The 528 MB limit/
If the same values for c,h,s are used for the BIOS Int 13 call and
for the IDE disk I/O, then both limitations combine, and one can
use at most 1024 cylinders, 16 heads, 63 sectors/track, for a
maximum total capacity of 528482304 bytes (528MB), the infamous
504 MiB limit for DOS with an old BIOS.
This started being a problem around 1993, and people resorted to all kinds
of trickery, both in hardware (LBA), in firmware (translating BIOS),
and in software (disk managers).
The concept of `translation' was invented (1994): a BIOS could use
one geometry while talking to the drive, and another, fake, geometry
while talking to DOS, and translate between the two.
<p>
<tag/The 2.1 GB limit (April 1996)/
Some older BIOSes only allocate 12 bits for the field in CMOS RAM that
gives the number of cylinders. Consequently, this number can be at most
4095, and only 4095<tt/*/16<tt/*/63<tt/*/512=2113413120 bytes are accessible.
The effect of having a larger disk would be a hang at boot time.
This made disks with geometry 4092/16/63 rather popular. And still today
many large disk drives come with a jumper to make them appear 4092/16/63.
See also <htmlurl url="http://www.firmware.com/support/bios/over2gb.htm"
name="over2gb.htm">. <htmlurl name="Other BIOSes"
url="http://www.asus.com/Products/Techref/Ide/Intel/intel-ide-001.html">
would not hang but just detect a much smaller disk, like 429 MB instead of 2.5 GB.
<p>
<tag/The 3.2 GB limit/
There was a bug in the Phoenix 4.03 and 4.04 BIOS firmware that would
cause the system to lock up in the CMOS setup for drives with a capacity
over 3277 MB. See <htmlurl url="http://www.firmware.com/support/bios/over3gb.htm"
name="over3gb.htm">.
<p>
<tag/The 4.2 GB limit (Feb 1997)/
Simple BIOS translation (ECHS=Extended CHS, sometimes called `Large
disk support' or just `Large')
works by repeatedly doubling the number of heads and halving the number
of cylinders shown to DOS, until the number of cylinders is at most 1024.
Now DOS and Windows 95 cannot handle 256 heads,
and in the common case that the disk reports 16 heads, this means that
this simple mechanism only works up to 8192<tt/*/16<tt/*/63<tt/*/512=4227858432
bytes (with a fake geometry with 1024 cylinders, 128 heads, 63 sectors/track).
Note that ECHS does not change the number of sectors per track, so if
that is not 63, the limit will be lower.
See <htmlurl url="http://www.firmware.com/support/bios/over4gb.htm"
name="over4gb.htm">.
<p>
<tag/The 7.9 GB limit/
Slightly smarter BIOSes avoid the previous problem by first adjusting the
number of heads to 15 (`revised ECHS'), so that a fake geometry with
240 heads can be obtained, good for
1024<tt/*/240<tt/*/63<tt/*/512=7927234560 bytes.
<p>
<tag/The 8.4 GB limit/
<label id="The 8.4 GB limit">
Finally, if the BIOS does all it can to make this translation a success,
and uses 255 heads and 63 sectors/track (`assisted LBA' or just `LBA')
it may reach 1024<tt/*/255<tt/*/63<tt/*/512=8422686720 bytes, slightly less
than the earlier 8.5 GB limit because the geometries with 256 heads must be
avoided.
(This translation will use for the number of heads the first value H
in the sequence 16, 32, 64, 128, 255 for which the total disk capacity
fits in 1024<tt/*/H<tt/*/63<tt/*/512, and then computes the number of
cylinders C as total capacity divided by (H<tt/*/63<tt/*/512).)
<p>
<tag/The 33.8 GB limit (August 1999)/
<label id="biosupgrades">
The next hurdle comes with a size over 33.8 GB.
The problem is that with the default 16 heads and 63 sectors/track
this corresponds to a number of cylinders of more than 65535, which
does not fit into a short. Many BIOSes couldn't handle such disks.
(See, e.g., <htmlurl name="Asus upgrades"
url="http://www.asus.com/Products/Motherboard/bios_slot1.html">
for new flash images that work.)
Linux kernels older than 2.2.14 / 2.3.21 need a patch.
See <ref id="verylarge" name="IDE problems with 34+ GB disks"> below.
<p>
<tag/The 137 GB limit (Sept 2001)/
As mentioned above, the old ATA protocol uses 16+4+8 = 28 bits
to specify the sector number, and hence cannot address more than
2^28 sectors. ATA-6 describes an extension that allows the addressing
of 2^48 sectors, a million times as much.
There is support in very recent kernels.
<p>
<tag/The 2 TiB limit/
With 32-bit sector numbers, one can address 2 TiB.
A lot of software will have to be rewritten once disks get larger.
</descrip>
Hard drives over 8.4 GB are supposed to report their geometry as 16383/16/63.
This in effect means that the `geometry' is obsolete, and the total disk
size can no longer be computed from the geometry, but is found in the
LBA capacity field returned by the <ref id="identify" name="IDENTIFY command">.
Hard drives over 137.4 GB are supposed to report an LBA capacity of
0xfffffff = 268435455 sectors (137438952960 bytes). Now the actual
disk size is found in the new 48-capacity field.
<sect>
Booting
<p>
<nidx>booting!BIOS usage during</nidx>
<nidx>disk!BIOS access during booting</nidx>
When the system is booted, the BIOS reads sector 0 (known as
the MBR - the Master Boot Record) from the first disk
(or from floppy or CDROM), and jumps to the code found there - usually
some bootstrap loader. These small bootstrap programs
found there typically have no own disk drivers and use
BIOS services. This means that a Linux kernel can only be
booted when it is entirely located within the first 1024
cylinders, unless you both have a modern BIOS (a BIOS that supports
the Extended INT13 functions), and a modern bootloader
(a bootloader that uses these functions when available).
<p>
This problem (if it is a problem) is very easily solved:
make sure that the kernel (and perhaps other files used during bootup,
such as LILO map files) are located on a partition that is entirely
contained in the first 1024 cylinders of a disk that the BIOS can access -
probably this means the first or second disk.
<p>
Thus: create a small partition, say 10 MB large, so that there
is room for a handful of kernels, making sure that it is entirely
contained within the first 1024 cylinders of the first or second
disk. Mount it on <tt>/boot</tt> so that LILO will put its stuff there.
<p>
Most systems from 1998 or later will have a modern BIOS.
<p>
<sect1>LILO and the `lba32' and `linear' options
<label id="LILO">
<label id="linear">
<p>
Executive summary: If you use LILO as boot loader, make sure you have
LILO version 21.4 or later. (It can be found at
<htmlurl name="ftp://metalab.unc.edu/pub/Linux/system/boot/lilo/"
url="ftp://metalab.unc.edu/pub/Linux/system/boot/lilo/">.)
Always use the <tt>lba32</tt> option.
<p>
An invocation of <tt>/sbin/lilo</tt> (the boot map installer) stores a list
of addresses in the boot map, so that LILO (the boot loader) knows from
where to read the kernel image. By default these addresses are
stored in (c,h,s) form, and ordinary INT13 calls are used at boot time.
<p>
When the configuration file specifies <tt>lba32</tt> or <tt>linear</tt>,
linear addresses are stored. With <tt>lba32</tt> also linear addresses
are used at boot time, when the BIOS supports extended INT13.
With <tt>linear</tt>, or with an old BIOS, these linear addresses are
converted back to (c,h,s) form, and ordinary INT13 calls are used.
<p>
Thus, with <tt>lba32</tt> there are no geometry problems and there is
no 1024 cylinder limit. Without it there is a 1024 cylinder limit.
What about the geometry?
<p>
The boot loader and the BIOS must agree as to the disk geometry.
<tt>/sbin/lilo</tt> asks the kernel for the geometry,
but there is no guarantee that the Linux kernel geometry coincides
with what the BIOS will use. Thus, often the geometry
supplied by the kernel is worthless. In such cases it helps
to give LILO the `<tt/linear/' option. The advantage is that
the Linux kernel idea of the geometry no longer plays a role.
The disadvantage is that <tt>lilo</tt> cannot warn you when
part of the kernel was stored above the 1024 cylinder limit,
and you may end up with a system that does not boot.
<sect1>A LILO bug<p>
With LILO versions below v21 there is another disadvantage:
the address conversion done at boot time has a bug: when c*H is 65536
or more, overflow occurs in the computation.
For H larger than 64 this causes a stricter limit on c than the
well-known c &lt; 1024; for example, with H=255 and an old LILO
one must have c &lt; 258. (c=cylinder where kernel image lives,
H=number of heads of disk)
<sect1>1024 cylinders is not 1024 cylinders<p>
Tim Williams writes: `I had my Linux partition within the first 1024
cylinders and still it wouldnt boot. First when I moved it below 1 GB
did things work.' How can that be? Well, this was a SCSI disk with
AHA2940UW controller which uses either H=64, S=32 (that is, cylinders
of 1 MiB = 1.05 MB), or H=255, S=63 (that is, cylinders of 8.2 MB),
depending on setup options in firmware and BIOS. No doubt the BIOS
assumed the former, so that the 1024 cylinder limit was found at 1 GiB,
while Linux used the latter and LILO thought that this limit was at 8.4 GB.
<sect1>No 1024 cylinder limit on old machines with IDE<p>
The <tt>nuni</tt> boot loader does not use BIOS services
but accesses IDE drives directly. So, one can put it on a
floppy or in the MBR and boot from anywhere on any IDE drive
(not only from the first two). Find it at
<htmlurl name="//metalab.unc.edu/pub/Linux/system/boot/loaders/"
url="//metalab.unc.edu/pub/Linux/system/boot/loaders/">.
<sect1>
Other boot loaders
<p>
LILO is a bit fragile, it requires the discipline of running
<tt>/sbin/lilo</tt> each time one installs a new kernel.
Some other boot loaders do not have this disadvantage.
Especially <tt/grub/ is popular these days; a major
disadvantage is that it does not support the
<tt>lilo -R label</tt> function.
<sect>
Disk geometry, partitions and `overlap'
<label id="overlap">
<p>
<nidx>disk!geometry</nidx>
<nidx>disk!partitions</nidx>
If you have several operating systems on your disks, then each
uses one or more disk partitions. A disagreement on where these
partitions are may have catastrophic consequences.
<label id="partitiontable">
The MBR contains a <it>partition table</it> describing where the
(primary) partitions are. There are 4 table entries, for 4
primary partitions, and each looks like
<tscreen><verb>
struct partition {
char active; /* 0x80: bootable, 0: not bootable */
char begin[3]; /* CHS for first sector */
char type;
char end[3]; /* CHS for last sector */
int start; /* 32 bit sector number (counting from 0) */
int length; /* 32 bit number of sectors */
};
</verb></tscreen>
(where CHS stands for Cylinder/Head/Sector).
This information is redundant: the location of a partition
is given both by the 24-bit <tt/begin/ and <tt/end/ fields,
and by the 32-bit <tt/start/ and <tt/length/ fields.
Linux only uses the <tt/start/ and <tt/length/ fields, and can
therefore handle partitions of not more than 2^32 sectors,
that is, partitions of at most 2 TiB. That is twelve times
larger than the disks available today, so maybe it will be
enough for the next five years or so.
(So, partitions can be very large, but there is a serious
restriction in that a file in an ext2 filesystem on hardware
with 32-bit integers cannot be larger than 2 GiB.)
DOS uses the <tt/begin/ and <tt/end/ fields, and uses the
BIOS INT13 call to access the disk, and can therefore only
handle disks of not more than 8.4 GB, even with a translating
BIOS. (Partitions cannot be larger than 2.1 GB because of
restrictions of the FAT16 file system.) The same holds for
Windows 3.11 and WfWG and Windows NT 3.*.
Windows 95 has support for the Extended INT13 interface, and
uses special partition types (c, e, f instead of b, 6, 5)
to indicate that a partition should be accessed in this way.
When these partition types are used, the <tt/begin/ and <tt/end/ fields
contain dummy information (1023/255/63).
Windows 95 OSR2 introduces the FAT32 file system (partition type
b or c), that allows partitions of size at most 2 TiB.
What is this nonsense you get from <tt/fdisk/ about `overlapping'
partitions, when in fact nothing is wrong?
Well - there is something `wrong': if you look at the <tt/begin/
and <tt/end/ fields of such partitions, as DOS does, they overlap.
(And that cannot be corrected, because these fields cannot store
cylinder numbers above 1024 - there will always be `overlap'
as soon as you have more than 1024 cylinders.)
However, if you look at the <tt/start/ and <tt/length/ fields,
as Linux does, and as Windows 95 does in the case of partitions
with partition type c, e or f, then all is well.
So, ignore these warnings when <tt/cfdisk/ is satisfied and you
have a Linux-only disk. Be careful when the disk is shared with DOS.
Use the commands <tt>cfdisk -Ps /dev/hdx</tt> and <tt>cfdisk -Pt /dev/hdx</tt>
to look at the partition table of <tt>/dev/hdx</tt>.
<sect1>The last cylinder<p>
Many old IBM PS/2 systems used disks with a defect map written
to the end of the disk. (Bit 0x20 in the control word of the
<htmlurl name="disk parameter table"
url="http://www.win.tue.nl/~aeb/linux/hdtypes/hdtypes-2.html"> is set.)
Therefore, FDISK would not use the last cylinder. Just to be sure, the BIOS
often already reports the size of the disk as one cylinder smaller than
reality, and that may mean that two cylinders are lost.
Newer BIOSes have several disk size reporting functions, where internally
one calls the other. When both subtract 1 for this reserved cylinder and
also FDISK does so, then one may lose three cylinders.
These days all of this is irrelevant, but this may provide an explanation
if one observes that different utilities have slightly different opinions
about the disk size.
<sect1>Cylinder boundaries<p>
A well-known claim says that partitions should start and end
at cylinder boundaries.
<p>
Since "disk geometry" is something without objective existence,
different operating systems will invent different geometries
for the same disk. One often sees a translated geometry like */255/63
used by one and an untranslated geometry like */16/63 used by another OS.
(People tell me Windows NT uses */64/32 while Windows 2K uses */255/63.)
Thus, it may be impossible to align partitions to cylinder boundaries
according to each of the various ideas about the size of a cylinder
that one's systems have. Also different Linux kernels may assign
different geometries to the same disk.
Also, enabling or disabling the BIOS of a SCSI card may change the
fake geometry of the connected SCSI disks.
<p>
Fortunately, for Linux there is no alignment requirement at all.
(Except that some semi-broken installation software likes to be very sure
that all is OK; thus, it may be impossible to install RedHat 7.1
on a disk with unaligned partitions because DiskDruid is unhappy.)
<p>
People report that it is easy to create nonaligned partitions
in Windows NT, without any noticeable bad effects.
<p>
But MSDOS 6.22 has an alignment requirement. Extended partition sectors
that are not on a cylinder boundary are ignored by its FDISK.
The system itself is happy with any alignment, but interprets
relative starting addresses as if relative to an aligned address:
The starting address of a logical partition is given relative not
to the address of the extended partition sector that describes it,
but relative to the start of the cylinder that contains that sector.
(So, it is not surprising that also PartitionMagic requires alignment.)
<p>
What is the definition of alignment?
MSDOS 6.22 FDISK will do the following:
1. If the first sector of the cylinder is a partition
table sector, then the rest of the track is unused,
and the partition starts with the next track.
This applies to sector 0 (the MBR) and the partition table sectors
preceding logical partitions.
2. Otherwise, the partition starts at the first sector of the
cylinder. Also the extended partition starts at a cylinder boundary.
The <tt>cfdisk</tt> man page says that old versions of DOS did not
align partitions.
<p>
Use of partition type 85 for the extended partition makes it invisible
to DOS, making sure that only Linux will look inside.
<p>
As an aside: on a Sparc, the boot partition must start on a cylinder
boundary (but there is no requirement on the end).
<sect>
Translation and Disk Managers
<p>
<nidx>disk!geometry translation</nidx>
<nidx>BIOS!translating</nidx>
<nidx>BIOS!LBA support</nidx>
Disk geometry (with heads, cylinders and tracks) is something
from the age of MFM and RLL. In those days it corresponded to
a physical reality. Nowadays, with IDE or SCSI, nobody is
interested in what the `real' geometry of a disk is.
Indeed, the number of sectors per track is variable - there are
more sectors per track close to the outer rim of the disk - so there
is no `real' number of sectors per track.
Quite the contrary: the IDE command INITIALIZE DRIVE PARAMETERS (91h)
serves to tell the disk how many heads and sectors per track
it is supposed to have today.
It is quite normal to see a large modern disk that has 2 heads
report 15 or 16 heads to the BIOS, while the BIOS may again report
255 heads to user software.
For the user it is best to regard a disk as just a linear array
of sectors numbered 0, 1, ..., and leave it to the firmware
to find out where a given sector lives on the disk. This linear
numbering is called LBA.
So now the conceptual picture is the following.
DOS, or some boot loader, talks to the BIOS, using (c,h,s) notation.
The BIOS converts (c,h,s) to LBA notation using the fake geometry
that the user is using. If the disk accepts LBA then this value
is used for disk I/O. Otherwise, it is converted back to (c',h',s')
using the geometry that the disk uses today, and that is used for
disk I/O.
Note that there is a bit of confusion in the use of the expression `LBA':
As a term describing disk capabilities it means `Linear Block Addressing'
(as opposed to CHS Addressing). As a term in the BIOS Setup, it describes
a translation scheme sometimes called `assisted LBA' - see above
under `<ref id="The 8.4 GB limit">'.
Something similar works when the firmware doesn't speak LBA
but the BIOS knows about translation. (In the setup this is
often indicated as `Large'.) Now the BIOS will present
a geometry (C,H,S) to the operating system, and use
(C',H',S') while talking to the disk controller. Usually S = S',
C = C'/N and H = H'<tt/*/N, where N is the smallest power of
two that will ensure C' &lt;= 1024 (so that least capacity
is wasted by the rounding down in C' = C/N).
Again, this allows access of up to 8.4 GB (7.8 GiB).
(The third setup option usually is `Normal', where no translation
is involved.)
If a BIOS does not know about `Large' or `LBA', then there are
software solutions around. Disk Managers like OnTrack or EZ-Drive
replace the BIOS disk handling routines by their own.
Often this is accomplished by having the disk manager code live
in the MBR and subsequent sectors (OnTrack calls this code DDO:
Dynamic Drive Overlay), so that it is booted before any other
operating system. That is why one may have problems
when booting from a floppy when a Disk Manager has been installed.
The effect is more or less the same as with a translating BIOS -
but especially when running several different operating systems
on the same disk, disk managers can cause a lot of trouble.
Linux did support OnTrack Disk Manager since version 1.3.14,
and EZ-Drive since version 1.3.29. Some more details are
given in the next section.
In 2.5.70 the automatic disk manager support was removed.
Instead, two boot options were added: "hda=remap" to do
the EZ-Drive remapping of sector 0 to sector 1, and
"hda=remap63" to do the OnTrack Disk Manager shift over 63 sectors.
<sect>
Kernel disk translation for IDE disks
<p>
<nidx>disk!translation done by kernel</nidx>
If the Linux kernel detects the presence of some disk manager
on an IDE disk, it will try to remap the disk in the same way
this disk manager would have done, so that Linux sees the same
disk partitioning as for example DOS with OnTrack or EZ-Drive.
However, NO remapping is done when a geometry was specified
on the command line - so a
`<tt/hd=/<it/cyls/<tt/,/<it/heads/<tt/,/<it/secs/' command line option
might well kill compatibility with a disk manager.
If you are hit by this, and know someone who can compile a new
kernel for you, find the file <tt>linux/drivers/block/ide.c</tt>
and remove in the routine <tt>ide_xlate_1024()</tt> the test
<tt>if (drive->forced_geom) { ...; return 0; }</tt>.
The remapping is done by trying 4, 8, 16, 32, 64, 128, 255 heads
(keeping H<tt/*/C constant) until either C &lt;= 1024 or H = 255.
The details are as follows - subsection headers are the strings
appearing in the corresponding boot messages. Here and everywhere
else in this text partition types are given in hexadecimal.
<sect1>EZD<p>
<nidx>disk!EZ-Drive translation</nidx>
<nidx>disk!EZD translation</nidx>
EZ-Drive is detected by the fact that the first primary partition
has type 55. The geometry is remapped as described above,
and the partition table from sector 0 is discarded - instead
the partition table is read from sector 1. Disk block numbers
are not changed, but writes to sector 0 are redirected to sector 1.
This behaviour can be changed by recompiling the kernel with
<tt/ &num;define FAKE_FDISK_FOR_EZDRIVE 0 /
in <tt/ide.c/.
<sect1>DM6:DDO<p>
<nidx>disk!OnTrack DiskManager translation</nidx>
<nidx>disk!DM6:DD0 translation</nidx>
OnTrack DiskManager (on the first disk) is detected by the fact
that the first primary partition has type 54. The geometry is
remapped as described above and the entire disk is shifted by
63 sectors (so that the old sector 63 becomes sector 0).
Afterwards a new MBR (with partition table) is read from
the new sector 0. Of course this shift is to make room for
the DDO - that is why there is no shift on other disks.
<sect1>DM6:AUX<p>
<nidx>disk!OnTrack DiskManager translation</nidx>
<nidx>disk!DM6:AUX</nidx>
OnTrack DiskManager (on other disks) is detected by the fact
that the first primary partition has type 51 or 53.
The geometry is remapped as described above.
<sect1>DM6:MBR<p>
<nidx>disk!OnTrack DiskManager translation</nidx>
<nidx>disk!DM6:MBR</nidx>
An older version of OnTrack DiskManager is detected not by
partition type, but by signature. (Test whether the offset
found in bytes 2 and 3 of the MBR is not more than 430, and
the short found at this offset equals 0x55AA, and is followed
by an odd byte.) Again the geometry is remapped as above.
<sect1>PTBL<p>
<nidx>disk!PTBL translation</nidx>
Finally, there is a test that tries to deduce a translation
from the <tt/start/ and <tt/end/ values of the primary partitions:
If some partition has start and end sector number 1 and 63, respectively,
and end heads 31, 63, 127 or 254, then, since it is customary
to end partitions on a cylinder boundary, and since moreover
the IDE interface uses at most 16 heads, it is conjectured
that a BIOS translation is active, and the geometry is
remapped to use 32, 64, 128 or 255 heads, respectively.
However, no remapping is done when the current idea of the
geometry already has 63 sectors per track and at least as
many heads (since this probably means that a remapping was
done already).
<sect1>Getting rid of a disk manager<p>
When Linux detects OnTrack Disk Manager, it will shift all disk
accesses by 63 sectors. Similarly, when Linux detects EZ-Drive,
it shifts all accesses of sector 0 to sector 1.
This means that it may be difficult to get rid of these disk managers.
Most disk managers have an uninstall option, but if you need to remove
some disk manager an approach that often works is to give an explicit
disk geometry on the command line. Now Linux skips the <tt>ide_xlate_1024()</tt>
routine, and one can wipe out the partition table with disk manager
(and probably lose access to all disk data) with the command
<tscreen><verb>
dd if=/dev/zero of=/dev/hdx bs=512 count=1
</verb></tscreen>
The details depend a little on kernel version.
Recent kernels (since 2.3.21) recognize boot parameters like "hda=remap" and
"hdb=noremap", so that it is possible to get or avoid the EZD shift regardless of
the contents of the partition table. The "hdX=noremap" boot parameter also
avoids the OnTrack Disk Manager shift.
<sect1>Since 2.5.70: boot parameters<p>
In 2.5.70 the automatic disk manager support was removed.
Instead, two boot options were added: "hda=remap" to do
the EZ-Drive remapping of sector 0 to sector 1, and
"hda=remap63" to do the OnTrack Disk Manager shift over 63 sectors.
This also means that it no longer is a problem to get rid of
a disk manager.
<sect>
Consequences
<p>
<nidx>disk!consequences of translation</nidx>
What does all of this mean? For Linux users only one thing:
that they must make sure that LILO and <tt/fdisk/ use the right
geometry where `right' is defined for <tt/fdisk/ as the geometry
used by the other operating systems on the same disk, and for
LILO as the geometry that will enable successful interaction
with the BIOS at boot time. (Usually these two coincide.)
How does <tt/fdisk/ know about the geometry?
There are three sources of information. First, if the user has specified
the geometry interactively or on the command line, then we take
the user input. Second, if it is possible to guess the geometry used
from the partition table, then we use that. Third, when nothing else
is available, <tt/fdisk/ asks the kernel, using the <tt/HDIO_GETGEO/ ioctl.
How does LILO know about the geometry?
It asks the kernel, using the <tt/HDIO_GETGEO/ ioctl.
But the user can override the geometry using the `<tt/disk=/' option
in <tt>/etc/lilo.conf</tt> (see lilo.conf(5)).
One may also give the <tt/linear/ option to LILO, and it will store
LBA addresses instead of CHS addresses in its map file,
and find out of the geometry to use at boot time (by using
INT 13 Function 8 to ask for the drive geometry).
How does the kernel know what to answer?
Well, first of all, the user may have specified an explicit geometry
with a `<tt/hda=/<it/cyls/<tt/,/<it/heads/<tt/,/<it/secs/'
kernel command line option (see bootparam(7)), perhaps by hand, or by
asking the boot loader to supply such an option to the kernel.
For example, one can tell LILO to supply such an option by adding
an `<tt/append = "hda=/<it/cyls/<tt/,/<it/heads/<tt/,/<it/secs/<tt/"/'
line in <tt>/etc/lilo.conf</tt> (see lilo.conf(5)).
And otherwise the kernel will guess, possibly using values
obtained from the BIOS or the hardware.
Note that values guessed by the kernel are very unreliable.
The kernel does not have a good way of finding out what values
the BIOS uses, or indeed whether the disk is known to the BIOS at all.
It is possible (since Linux 2.1.79) to change the kernel's ideas
about the geometry by using the <tt>/proc</tt> filesystem.
For example
<tscreen><verb>
# sfdisk -g /dev/hdc
/dev/hdc: 4441 cylinders, 255 heads, 63 sectors/track
# cd /proc/ide/ide1/hdc
# echo bios_cyl:17418 bios_head:128 bios_sect:32 > settings
# sfdisk -g /dev/hdc
/dev/hdc: 17418 cylinders, 128 heads, 32 sectors/track
#
</verb></tscreen>
This is especially useful if you need so many boot parameters
that you overflow LILO's (very limited) command line length.
(It also helps if you want to influence a utility that gets its
idea of the geometry from the kernel via the HDIO_GETGEO ioctl.)
Since Linux 2.6.5 the kernel will (when compiled with CONFIG_EDD)
ask the BIOS for legacy_cylinders, legacy_heads, legacy_sectors
using INT 13/AH=08h. The values obtained are made available in
<tt>/sys/firmware/edd/int13_dev{80,81,82,83}/legacy_*</tt>.
In 2.6.5 the files were <tt>legacy_{cylinders,heads,sectors}</tt>
(with contents in hex, e.g. 0xfe for 254), but those names are
confusing, and in 2.6.7 they were changed to <tt>legacy_max_cylinder</tt>,
<tt>legacy_max_head</tt>, <tt>legacy_sectors_per_track</tt>
(with contents in decimal).
A geometry like C/H/S=1000/255/63 is found here as 999, 254, 63.
<tscreen><verb>
# insmod edd.ko
# cd /sys/firmware/edd/int13_dev83
# cat legacy_max_head
254
# cat sectors
120064896
#
</verb></tscreen>
Thus, we see here a disk with 255 heads and 120064896 sectors in all.
Careful comparison shows that this is <tt>/dev/hdf</tt>.
How does the BIOS know about the geometry?
The user may have specified it in the CMOS setup.
Or the geometry is read from the disk, and possibly translated
as specified in the setup. In the case of SCSI disks, where no
geometry exists, the geometry that the BIOS has to invent can
often be specified by jumpers or setup options. (For example,
Adaptec controllers have the possibility to choose between
the usual H=64, S=32 and the `extended translation' H=255, S=63.)
Sometimes the BIOS reads the partition table to see with what
geometry the disk was last partitioned - it will assume that
a valid partition table is present when the 55aa signature
is present. This is good, in that it allows moving disks to
a different machine. But having the BIOS behaviour depend on
the disk contents also causes strange problems.
(For example, it has been <htmlurl name="reported"
url="http://www.heise.de/ct/faq/hotline/98/07/hotline9807_11.shtml">
that a 2.5 GB disk was seen as having 528 MB because the BIOS read
the partition table and concluded that it should use untranslated
CHS. Another effect is found in the <htmlurl name="report"
url="http://www.heise.de/ct/faq/hotline/98/19/hotline9819_11.shtml">
that unpartitioned disks were slower than partitioned ones,
because the BIOS tested 32-bit mode by reading the MBR and
seeing whether it correctly got the 55aa signature.)
How does the disk know about the geometry?
Well, the manufacturer invents a geometry that multiplies out
to approximately the right capacity. Many disks have jumpers
that change the reported geometry, in order to avoid BIOS bugs.
For example, all IBM disks allow the user to choose between
15 and 16 heads, and many manufacturers add jumpers to make
the disk seem smaller than 2.1 GB or 33.8 GB. See also
<ref id="jumpers" name="below">.
Sometimes there are utilities that change the disk firmware.
<sect1>
Computing LILO parameters
<p>
Sometimes it is useful to force a certain geometry
by adding `<tt/hda=/<it/cyls/<tt/,/<it/heads/<tt/,/<it/secs/'
on the kernel command line. Almost always one wants <it/secs/=63,
and the purpose of adding this is to specify <it/heads/.
(Reasonable values today are <it/heads/=16 and <it/heads/=255.)
What should one specify for <it/cyls/? Precisely that number
that will give the right total capacity of C*H*S sectors.
For example, for a drive with 71346240 sectors (36529274880 bytes)
one would compute C as 71346240/(255*63)=4441 (for example using
the program <tt/bc/), and give boot parameter <tt/hdc=4441,255,63/.
How does one know the right total capacity? For example,
<tscreen><verb>
# hdparm -g /dev/hdc | grep sectors
geometry = 4441/255/63, sectors = 71346240, start = 0
# hdparm -i /dev/hdc | grep LBAsects
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=71346240
</verb></tscreen>
gives two ways of finding the total number of sectors 71346240.
Recent kernels also give the precise size in the boot messages:
<tscreen><verb>
# dmesg | grep hde
hde: Maxtor 93652U8, ATA DISK drive
hde: 71346240 sectors (36529 MB) w/2048KiB Cache, CHS=70780/16/63
hde: hde1 hde2 hde3 < hde5 > hde4
hde2: <bsd: hde6 hde7 hde8 hde9 >
</verb></tscreen>
Older kernels only give MB and CHS. In general the CHS value is
rounded down, so that the above output tells us that there are
at least 70780*16*63=71346240 sectors. In this example that happens
to be the precise value. The MB value may be rounded instead of
truncated, and in old kernels may be `binary' (MiB) instead of decimal.
Note the agreement between the kernel size in MB and the Maxtor model number.
Also in the case of SCSI disks the precise number of sectors is given
in the kernel boot messages:
<tscreen><verb>
SCSI device sda: 17755792 512-byte hdwr sectors (9091 MB)
</verb></tscreen>
<sect>Details<p>
<sect1>IDE details - the seven geometries<p>
<nidx>disk!IDE geometry setting</nidx>
The IDE driver has five sources of information about the geometry.
The first (G_user) is the one specified by the user on the command line.
The second (G_bios) is the BIOS Fixed Disk Parameter Table
(for first and second disk only) that is read on system startup,
before the switch to 32-bit mode.
The third (G_phys) and fourth (G_log) are returned by the IDE controller
as a response to the <ref id="identify" name="IDENTIFY command"> - they
are the `physical' and `current logical' geometries.
On the other hand, the driver needs two values for the geometry:
on the one hand G_fdisk, returned by a <tt/HDIO_GETGEO/ ioctl, and
on the other hand G_used, which is actually used for doing I/O.
Both G_fdisk and G_used are initialized to G_user if given, to
G_bios when this information is present according to CMOS, and
to G_phys otherwise. If G_log looks reasonable then G_used is set
to that. Otherwise, if G_used is unreasonable and G_phys looks
reasonable then G_used is set to G_phys. Here `reasonable' means
that the number of heads is in the range 1-16.
To say this in other words: the command line overrides the BIOS,
and will determine what <tt/fdisk/ sees, but if it specifies a
translated geometry (with more than 16 heads), then for kernel I/O
it will be overridden by output of the IDENTIFY command.
Note that G_bios is rather unreliable: for systems booting from SCSI
the first and second disk may well be SCSI disks, and the geometry
that the BIOS reported for sda is used by the kernel for hda.
Moreover, disks that are not mentioned in the BIOS Setup are not
seen by the BIOS. This means that, e.g., in an IDE-only system where
hdb is not given in the Setup, the geometries reported by the BIOS
for the first and second disk will apply to hda and hdc.
In order to avoid such confusion, since Linux 2.5.51 G_bios is
not used anymore.
<sect2>The IDENTIFY DRIVE command
<label id="identify">
<p>
When an IDE drive is sent the IDENTIFY DRIVE (0xec) command,
it will return 256 words (512 bytes) of information.
This contains lots of technical stuff.
Let us only describe here what plays a role in geometry matters.
The words are numbered 0-255.
<p>
We find four pieces of information here: DefaultCHS (words 1,3,6),
CurrentCHS (words 54-58) and LBAcapacity (words 60-61) and
48-bit capacity (words 100-103).
<p><table><tabular ca="c|l">
| Description | Example @@
0 | bit field: bit 6: fixed disk, bit 7: removable medium | 0x0040 @@
1 | Default number of cylinders | 16383 @
3 | Default number of heads | 16 @
6 | Default number of sectors per track | 63 @@
10-19 | Serial number (in ASCII) | G8067TME @
23-26 | Firmware revision (in ASCII) | GAK&1B0 @
27-46 | Model name (in ASCII) | Maxtor 4G160J8 @@
49 | bit field: bit 9: LBA supported | 0x2f00 @@
53 | bit field: bit 0: words 54-58 are valid | 0x0007 @
54 | Current number of cylinders | 16383 @
55 | Current number of heads | 16 @
56 | Current number of sectors per track | 63 @
57-58 | Current LBA capacity | 16514064 @@
60-61 | Default LBA capacity | 268435455 @@
82-83 | Command sets supported | 7c69 4f09 @@
85-86 | Command sets enabled | 7c68 0e01 @@
100-103 | Maximum user LBA for 48-bit addressing | 320173056 @@
255 | Checksum and signature (0xa5) | 0x44a5 @@
</tabular></table>
In the ASCII strings each word contains two characters,
the high order byte the first, the low order byte the second.
The 32-bit values are given with low order word first.
Words 54-58 are set by the command INITIALIZE DRIVE PARAMETERS (0x91).
They are significant only when CHS addressing is used, but may
help to find the actual disk size in case the disk sets
DefaultCHS to 4092/16/63 in order to avoid BIOS problems.
Sometimes, when a jumper causes a big drive to misreport LBAcapacity
(often to 66055248 sectors, in order to stay below the 33.8 GB limit),
one needs a fourth piece of information to find the actual disk size,
namely the result of the READ NATIVE MAX ADDRESS (0xf8) command.
<sect1>SCSI details<p>
<nidx>disk!SCSI geometry setting</nidx>
The situation for SCSI is slightly different, as the SCSI commands
already use logical block numbers, so a `geometry' is entirely
irrelevant for actual I/O.
However, the format of the partition table is still the same,
so <tt/fdisk/ has to invent some geometry, and also uses <tt/HDIO_GETGEO/ here -
indeed, <tt/fdisk/ does not distinguish between IDE and SCSI disks.
As one can see from the detailed description below, the various
drivers each invent a somewhat different geometry. Indeed, one big mess.
<p>
If you are not using DOS or so, then avoid all extended translation
settings, and just use 64 heads, 32 sectors per track (for a nice,
convenient 1 MiB per cylinder), if possible, so that no problems
arise when you move the disk from one controller to another.
Some SCSI disk drivers (aha152x, pas16, ppa, qlogicfas, qlogicisp)
are so nervous about DOS compatibility that they will not allow
a Linux-only system to use more than about 8 GiB. This is a bug.
<p>
What is the real geometry?
The easiest answer is that there is no such thing.
And if there were, you wouldn't want to know, and certainly
NEVER, EVER tell <tt/fdisk/ or LILO or the kernel about it.
It is strictly a business between the SCSI controller and the disk.
Let me repeat that: only silly people tell <tt/fdisk//LILO/kernel about
the true SCSI disk geometry.
<p>
But if you are curious and insist, you might ask the disk itself.
There is the important command READ CAPACITY that will give the total
size of the disk, and there is the MODE SENSE command, that in the
Rigid Disk Drive Geometry Page (page 04) gives the number of cylinders
and heads (this is information that cannot be changed), and in the
Format Page (page 03) gives the number of bytes per sector,
and sectors per track. This latter number is typically dependent upon
the notch, and the number of sectors per track varies - the outer
tracks have more sectors than the inner tracks.
The Linux program <tt/scsiinfo/ will give this information.
There are many details and complications, and it is clear that nobody
(probably not even the operating system) wants to use this information.
Moreover, as long as we are only concerned about <tt/fdisk/ and LILO,
one typically gets answers like C/H/S=4476/27/171 - values that
cannot be used by <tt/fdisk/ because the partition table reserves only
10 resp. 8 resp. 6 bits for C/H/S.
<p>
Then where does the kernel <tt/HDIO_GETGEO/ get its information from?
Well, either from the SCSI controller, or by making an educated guess.
Some drivers seem to think that we want to know `reality', but
of course we only want to know what the DOS or OS/2 FDISK
(or Adaptec AFDISK, etc) will use.
<p>
Note that Linux <tt/fdisk/ needs the numbers H and S of heads and sectors
per track to convert LBA sector numbers into c/h/s addresses, but the
number C of cylinders does not play a role in this conversion.
Some drivers use (C,H,S) = (1023,255,63) to signal that the drive
capacity is at least 1023<tt/*/255<tt/*/63 sectors. This is unfortunate,
since it does not reveal the actual size, and will limit the
users of most <tt/fdisk/ versions to about 8 GiB of their disks -
a real limitation in these days.
<p>
In the description below, M denotes the total disk capacity,
and C, H, S the number of cylinders, heads and sectors per track.
It suffices to give H, S if we regard C as defined by M / (H<tt/*/S).
<p>
By default, H=64, S=32.
<p>
<descrip>
<tag/aha1740, dtc, g_NCR5380, t128, wd7000:/ <p>
H=64, S=32.
<p>
<tag/aha152x, pas16, ppa, qlogicfas, qlogicisp:/ <p>
H=64, S=32 unless C &gt; 1024, in which case
H=255, S=63, C = min(1023, M/(H<tt/*/S)).
(Thus C is truncated, and H<tt/*/S<tt/*/C is not an approximation to
the disk capacity M. This will confuse most versions of <tt/fdisk/.)
The <tt/ppa.c/ code uses M+1 instead of M and says that due to a
bug in <tt/sd.c/ M is off by 1.
<p>
<tag/advansys:/ <p>
H=64, S=32 unless C &gt; 1024 and moreover the `&gt; 1 GB' option
in the BIOS is enabled, in which case H=255, S=63.
<p>
<tag/aha1542:/ <p>
Ask the controller which of two possible translation schemes
is in use, and use either H=255, S=63 or H=64, S=32. In the former
case there is a boot message "aha1542.c: Using extended bios translation".
<p>
<tag/aic7xxx:/ <p>
H=64, S=32 unless C &gt; 1024, and moreover
either the "extended" boot parameter was given,
or the `extended' bit was set in the SEEPROM or BIOS,
in which case H=255, S=63.
In Linux 2.0.36 this extended translation would always be set
in case no SEEPROM was found, but in Linux 2.2.6 if no SEEPROM
is found extended translation is set only when the user asked
for it using this boot parameter (while when a SEEPROM is found,
the boot parameter is ignored).
This means that a setup that works under 2.0.36 may fail to boot
with 2.2.6 (and require the <tt>linear</tt> keyword for LILO, or
the <tt>aic7xxx=extended</tt> kernel boot parameter).
<p>
<tag/buslogic:/ <p>
H=64, S=32 unless C &gt;= 1024, and moreover extended translation
was enabled on the controller, in which case if M &lt; 2^22 then
H=128, S=32; otherwise H=255, S=63. However, after making this choice
for (C,H,S), the partition table is read, and if for one of the
three possibilities (H,S) = (64,32), (128,32), (255,63) the value
endH=H-1 is seen somewhere then that pair (H,S) is used, and a boot message
is printed "Adopting Geometry from Partition Table".
<p>
<tag/fdomain:/ <p>
Find the geometry information in the BIOS Drive Parameter Table,
or read the partition table and use H=endH+1, S=endS for the first
partition, provided it is nonempty, or use H=64, S=32 for M &lt; 2^21 (1 GiB),
H=128, S=63 for M &lt; 63<tt/*/2^17 (3.9 GiB) and H=255, S=63 otherwise.
<p>
<tag/in2000:/ <p>
Use the first of (H,S) = (64,32), (64,63), (128,63), (255,63)
that will make C &lt;= 1024. In the last case, truncate C at 1023.
<p>
<tag/seagate:/ <p>
Read C,H,S from the disk. (Horrors!) If C or S is too large, then
put S=17, H=2 and double H until C &lt;= 1024. This means that H will
be set to 0 if M &gt; 128<tt/*/1024<tt/*/17 (1.1 GiB). This is a bug.
<p>
<tag/ultrastor and u14_34f:/ <p>
One of three mappings
((H,S) = (16,63), (64,32), (64,63))
is used depending on the controller mapping mode.
<p>
</descrip>
If the driver does not specify the geometry, we fall back
on an educated guess using the partition table, or using the
total disk capacity.
<p>
Look at the partition table. Since by convention partitions end
on a cylinder boundary, we can, given <tt/end = (endC,endH,endS)/
for any partition, just put H = <tt/endH+1/ and S = <tt/endS/. (Recall
that sectors are counted from 1.)
More precisely, the following is done.
If there is a nonempty partition, pick the partition with the largest <tt/beginC/.
For that partition, look at <tt/end+1/, computed
both by adding <tt/start/ and <tt/length/ and by assuming that this
partition ends on a cylinder boundary. If both values agree, or
if <tt/endC/ = 1023 and <tt/start+length/ is an integral multiple of
<tt/(endH+1)<tt/*/endS/,
then assume that this partition really was aligned on a cylinder
boundary, and put H = <tt/endH+1/ and S = <tt/endS/.
If this fails, either because there are no partitions, or because
they have strange sizes, then look only at the disk capacity M.
Algorithm: put H = M/(62<tt/*/1024) (rounded up), S = M/(1024<tt/*/H)
(rounded up), C = M/(H<tt/*/S) (rounded down).
This has the effect of producing a (C,H,S) with C at most 1024
and S at most 62.
<sect>
Clipped disks
<p>
<sect1>
The Linux IDE 8 GiB limit
<p>
The Linux IDE driver gets the geometry and capacity of a disk
(and lots of other stuff) by using an
<ref id="identify" name="ATA IDENTIFY"> request.
Linux kernels older than 2.0.34/2.1.90 would not believe the returned value
of lba_capacity if it was more than 10% larger than the capacity
computed by C<tt/*/H<tt/*/S. However, by industry agreement
large IDE disks (with more than 16514064 sectors)
return C=16383, H=16, S=63, for a total of 16514064 sectors (7.8 GB)
independent of their actual size, but give their actual size in
lba_capacity.
<p>
Since versions 2.0.34/2.1.90, Linux kernels know about this
and do the right thing. If you have an older Linux kernel and do
not want to upgrade, and this kernel only sees 8 GiB of a much larger disk,
then try changing the routine <tt/lba_capacity_is_ok/ in
<tt>/usr/src/linux/drivers/block/ide.c</tt> into something like
<tscreen><verb>
static int lba_capacity_is_ok (struct hd_driveid *id) {
id->cyls = id->lba_capacity / (id->heads * id->sectors);
return 1;
}
</verb></tscreen>
For a more cautious patch, see 2.1.90.
<sect1>
BIOS complications
<p>
As just mentioned, large disks return the geometry
C=16383, H=16, S=63 independent of the actual size,
while the actual size is returned in the value of LBAcapacity.
Some BIOSes do not recognize this, and translate this
16383/16/63 into something with fewer cylinders and more heads,
for example 1024/255/63 or 1027/255/63. So, the kernel must not
only recognize the single geometry 16383/16/63, but also all
BIOS-mangled versions of it.
Since 2.2.2 this is done correctly (by taking the BIOS idea
of H and S, and computing C = capacity/(H*S)).
Usually this problem is solved by setting the disk to Normal
in the BIOS setup (or, even better, to None, not mentioning
it at all to the BIOS). If that is impossible because you have
to boot from it or use it also with DOS/Windows, and upgrading
to 2.2.2 or later is not an option, use kernel boot parameters.
<p>
If a BIOS reports 16320/16/63, then this is usually done
in order to get 1024/255/63 after translation.
<p>
There is an additional problem here. If the disk was partitioned
using a geometry translation, then the kernel may at boot time
see this geometry used in the partition table, and report
<tt>hda: [PTBL] [1027/255/63]</tt>. This is bad, because now the
disk is only 8.4 GB. This was fixed in 2.3.21. Again, kernel
boot parameters will help.
<sect1>
Jumpers that select the number of heads
<label id="jumpers">
<p>
Many disks have jumpers that allow you to choose between
a 15-head an a 16-head geometry. The default settings will give
you a 16-head disk. Sometimes both geometries address the same
number of sectors, sometimes the 15-head version is smaller.
There may be a good reason for this setup: Petri Kaukasoina
writes: `A 10.1 Gig IBM Deskstar 16 GP (model IBM-DTTA-351010) was
jumpered for 16 heads as default but this old PC (with AMI BIOS)
didn't boot and I had to jumper it for 15 heads. hdparm -i tells
RawCHS=16383/15/63 and LBAsects=19807200. I use 20960/15/63 to
get the full capacity.'
For the jumper settings, see
<htmlurl
name="http://www.hitachigst.com/hdd/support/jumpers.htm"
url="http://www.hitachigst.com/hdd/support/jumpers.htm">.
<sect1>
Jumpers that clip total capacity
<p>
Many disks have jumpers that allow you to make the disk
appear smaller than it is. A silly thing to do, and probably
no Linux user ever wants to use this, but some BIOSes crash
on big disks. The usual solution is to keep the disk entirely
out of the BIOS setup. But this may be feasible only if the
disk is not your boot disk.
<p>
<sect2>Clip to 2.1 GB<p>
The first serious limit was the 4096 cylinder limit (that is,
with 16 heads and 63 sectors/track, 2.11 GB).
For example, a Fujitsu MPB3032ATU 3.24 GB disk has default geometry
6704/15/63, but can be jumpered to appear as 4092/16/63,
and then reports LBAcapacity 4124736 sectors, so that the operating
system cannot guess that it is larger in reality.
In such a case (with a BIOS that crashes if it hears how big the disk is
in reality, so that the jumper is required) one needs boot parameters
to tell Linux about the size of the disk.
<p>
That is unfortunate. Most disks can be jumpered so as to appear as a 2 GB disk
and then report a clipped geometry like 4092/16/63 or 4096/16/63, but still
report full LBAcapacity. Such disks will work well, and use full capacity
under Linux, regardless of jumper settings.
<p>
<sect2>Clip to 33 GB
<label id="jumperbig">
<p>
A more recent limit is <ref id="verylarge" name="the 33.8 GB limit">.
Linux kernels older than 2.2.14 / 2.3.21 need a patch to be able to cope with
IDE disks larger than this.
<p>
With an old BIOS and a disk larger than 33.8 GB, the BIOS may hang,
and in such cases booting may be impossible, even when the disk
is removed from the CMOS settings.
<!-- doesnt exist anymore
See also <htmlurl name="the BIOS 33.8 GB limit"
url="http://www.storage.ibm.com/techsup/hddtech/bios338gb.htm">.
-->
<p>
Therefore, large IBM and Maxtor and Seagate disks come with a jumper
that make the disk appear as a 33.8 GB disk.
For example, the IBM Deskstar 37.5 GB (DPTA-353750) with 73261440 sectors
(corresponding to 72680/16/63, or 4560/255/63) can be jumpered to appear
as a 33.8 GB disk, and then reports geometry 16383/16/63 like any big disk,
but LBAcapacity 66055248 (corresponding to 65531/16/63, or 4111/255/63).
Similar things hold for recent large Maxtor disks.
<p>
Below some more details that used
to be relevant but probably can be ignored now.
<p>
<sect3>Maxtor<p>
With the jumper present, both the geometry (16383/16/63) and the size
(66055248) are conventional and give no information about the actual size.
Moreover, attempts to access sector 66055248 and above yield I/O errors.
However, on Maxtor drives the actual size can be found and made accessible
using the READ NATIVE MAX ADDRESS and SET MAX ADDRESS commands.
Presumably this is what MaxBlast/EZ-Drive does.
There is a small Linux utility
<htmlurl url="http://www.win.tue.nl/~aeb/linux/setmax.c" name="setmax.c">
that does the same. Only very few disks need it - almost always
CONFIG_IDEDISK_STROKE does the trick.
<p>
For drives larger than 137 GB also READ NATIVE MAX ADDRESS returns
a conventional value, namely 0xfffffff, corresponding to 137 GB.
Here READ NATIVE MAX ADDRESS EXT and SET MAX ADDRESS EXT (using
48-bit addressing) are required. The <tt>setmax</tt> utility does not yet
know about this. A very small patch makes 2.5.3 handle this situation.
<p>
Early large Maxtor disks
<!-- (early releases of the 36GB drive in the DM36 family) -->
<!-- (older models of the DiamondMax 36 and 40) -->
<!-- 36GB confirmed -->
have an additional detail: the J46 jumper for these 34-40 GB disks
changes the geometry from 16383/16/63 to 4092/16/63 and does not
change the reported LBAcapacity.
This means that also with jumper present the BIOS (old Award 4.5*)
will hang at boot time. For this case Maxtor provides a utility
<htmlurl url="http://www.maxtor.com/technology/technotes/20012.html"
name="JUMPON.EXE"> that upgrades the firmware to make J46 behave as
described above.
<p>
On recent Maxtor drives the call <tt>setmax -d 0 /dev/hdX</tt> will
give you max capacity again. However, on slightly older drives a
firmware bug does not allow you to use <tt>-d 0</tt>, and
<tt>setmax -d 255 /dev/hdX</tt> returns you to almost full capacity.
For Maxtor D540X-4K, see below.
<p>
<sect3>IBM<p>
For IBM things are worse: the jumper really clips capacity
and there is no software way to get it back. The solution is
not to use the jumper but use <tt>setmax -m 66055248 /dev/hdX</tt>
to software-clip the disk. "How?" you say - "I cannot boot!".
IBM gives the tip: <it>If a system with Award BIOS hangs during drive
detection: Reboot the system and hold the F4 key to bypass autodetection
of the drive(s).</it> If this doesn't help, find a different computer,
connect the drive to it, and run <tt>setmax</tt> there. After doing this
you go back to the first machine and are in the same situation as
with jumpered Maxtor disks: booting works, and after getting past
the BIOS either a patched kernel or a <tt>setmax -d 0</tt>
gets you full capacity.
<p>
Thomas Charbonnel reports on a different approach:
"I had a 80 GB IBM IC35L080AVVA07-0 drive and installed IBM's
Disk Manager. Installed my boot loader on the drive's MBR.
Everything worked fine. Note that the IDE drive must become
the boot drive so that one can install only one 34+ GB drive
using this approach."
<p>
<sect3>Seagate<p>
Seagate disks have a jumper that will clip the reported number
of cylinders to 4092 on drives smaller than 33.8 GB, while it
will limit the reported LBA capacity (Identify words 60/61) to
33.8 GB on larger disks.
<p>
For models ST-340810A, ST-360020A, ST-380020A:
The ATA Read Native Max and Set Max commands may be used to reset
the true full capacity.
<p>
For models ST-340016A, ST-340823A, ST-340824A, ST-360021A, ST-380021A:
The ATA Set Features F1 sub-command will cause Identify Data words
60-61 to report the true full capacity.
<p>
<sect3>Maxtor D540X-4K<p>
The Maxtor Diamond Max drives 4K080H4, 4K060H3, 4K040H2 (aka D540X-4K)
are identical to the drives 4D080H4, 4D060H3, 4D040H2 (aka D540X-4D),
except that the jumper settings differ. A Maxtor FAQ specifies the
Master/Slave/CableSelect settings for them, but the capacity clip jumper
for the "4K" drives seems to be undocumented. Nils Ohlmeier reports that
he experimentally finds that it is the J42 jumper ("reserved for
factory use") closest to the power connector.
(The "4D" drives use the J46 jumper, like all other Maxtor drives.)
<p>
However, it may be that this undocumented jumper acts like the IBM jumper:
the machine boots correctly, but the disk has been clipped to 33 GB
and <tt>setmax -d 0</tt> does not help to get full capacity back.
And the IBM solution works: do not use any disk-clipping jumpers, but
first put the disk in a machine with non-broken BIOS, soft-clip it
with <tt>setmax -m 66055248 /dev/hdX</tt>, then put it back in the
first machine, and after booting run <tt>setmax -d 0 /dev/hdX</tt>
to get full capacity again.
<p>
In the meantime, some docs and pictures have appeared on the Maxtor site,
confirming part of the above. Compare
<figure><eps file="absent">
<img src="images/MaxtorStyle.gif">
</figure>
<figure><eps file="absent">
<img src="images/MaxtorStyleB.gif">
</figure>
<figure><eps file="absent">
<img src="images/MaxtorStyleC.gif">
</figure>
<p>
<sect3>Western Digital<p>
Some info, including the settings for capacity-clipping jumpers, is given on
<htmlurl url="http://support.wdc.com/techinfo/general/jumpers.asp"
name="the Western Digital site">. I do not know what precisely
these jumpers do.
<p>
<sect1>READ NATIVE MAX ADDRESS / SET MAX ADDRESS<p>
If an IDE/ATA disk has support for the Host Protected Area (HPA) feature set,
then it is possible to set the LBA capacity to any value below
the actual capacity. Access past the assigned point usually leads
to I/O errors. Since classical software finds out about the disk size
by looking at the LBA capacity field of the Identify information,
such software will not suspect that the disk actually is larger.
<p>
The actual total size of the disk is read using the
READ NATIVE MAX ADDRESS command.
This "soft disk size" is set using the SET MAX ADDRESS command.
It comes in two flavours: if the "volatile" bit is set, the
command will have effect until the next reboot or hardware reset;
otherwise the effect is permanent.
It is possible to protect settings with a password.
(For details, see the ATA standard.)
<p>
This clipped size has (at least) two applications:
on the one hand it is possible to fake a smaller disk,
so that the BIOS will not have problems, and have Linux,
or (for DOS/Windows) a disk manager restore total size;
on the other hand one can have a vendor area at the end,
inaccessible to the ordinary user.
<p>
For many of the disks discussed above, setting a jumper has
precisely this effect: LBA capacity is diminished while
the native max capacity remains the same, and the SET MAX ADDRESS
will restore full capacity.
<sect1>CONFIG_IDEDISK_STROKE<p>
The CONFIG_IDEDISK_STROKE option of Linux 2.4.19/2.5.3 and later,
will tell Linux to read the native max capacity and do a
SET MAX ADDRESS to get access to full capacity.
This configuration option lives under the heading
"Auto-Geometry Resizing support" in the
"IDE, ATA and ATAPI block devices" kernel configuration section.
<p>
The configuration option went away in 2.6.7
and was replaced by a (per-disk) boot parameter,
so that one can say "hda=stroke".
<p>
With this "stroke" option jumpered disks will in many cases
be handled correctly, i.e., be seen with full capacity
(in spite of the jumper). And the same holds when the disk
got a Host Protected Area in some other (non-jumper) way.
<p>
This is the preferred way to handle disks that need a jumper
because of a broken BIOS.
<sect>
The Linux 65535 cylinder limit
<p>
The <tt/HDIO_GETGEO/ ioctl returns the number of cylinders in a short.
This means that if you have more than 65535 cylinders, the number is
truncated, and (for a typical SCSI setup with 1 MiB cylinders)
a 80 GiB disk may appear as a 16 GiB one.
Once one recognizes what the problem is, it is easily avoided.
Use fdisk 2.10i or newer.
<p>
(The programming convention is to use the <tt/BLKGETSIZE/ ioctl
to get total size, and <tt/HDIO_GETGEO/ to get number of heads and
sectors/track, and, if needed, get C by C = size/(H*S).)
<sect1>
IDE problems with 34+ GB disks
<label id="verylarge">
<p>
(Below a discussion of Linux kernel problems. BIOS problems
and jumpers that clip capacity were discussed
<ref id="jumperbig" name="above">.)
<p>
Drives larger than 33.8 GB will not work with kernels older than
2.0.39 / 2.2.14 / 2.3.21.
The details are as follows.
Suppose you bought a new IBM-DPTA-373420 disk with a capacity
of 66835440 sectors (34.2 GB). Pre-2.3.21 kernels will tell you
that the size is 769*16*63 = 775152 sectors (0.4 GB), which
is a bit disappointing. And giving command line parameters
hdc=4160,255,63 doesn't help at all - these are just ignored.
What happens? The routine idedisk_setup()
retrieves the geometry reported by the disk (which is
16383/16/63) and overwrites what the user specified on
the command line, so that the user data is used only
for the BIOS geometry. The routine current_capacity()
or idedisk_capacity() recomputes the cylinder number as
66835440/(16*63)=66305, but since this is stored in a short,
it becomes 769. Since lba_capacity_is_ok() destroyed id->cyls,
every following call to it will return false, so that the
disk capacity becomes 769*16*63.
For several kernels a patch is available.
A patch for 2.0.38 can be found at
<htmlurl url="ftp://ftp.us.kernel.org/pub/linux/kernel/people/aeb/"
name="ftp.kernel.org">.
A patch for 2.2.12 can be found at
<htmlurl name="www.uwsg.indiana.edu"
url="http://www.uwsg.indiana.edu/hypermail/linux/kernel/9910.2/0636.html">
(some editing may be required to get rid of the html markup).
The 2.2.14 kernels do support these disks.
In the 2.3.* kernel series, there is support for these disks
since 2.3.21.
One can also `solve' the problem in hardware by
<ref id="jumperbig" name="using a jumper"> to clip the size to 33.8 GB.
In many cases a <ref id="biosupgrades" name="BIOS upgrade"> will be
required if one wants to boot from the disk.
<sect>
Extended and logical partitions
<p>
<ref id="partitiontable" name="Above,"> we saw the structure of
the MBR (sector 0): boot loader code followed by 4 partition
table entries of 16 bytes each, followed by an AA55 signature.
Partition table entries of type 5 or F or 85 (hex) have a special
significance: they describe <it>extended</it> partitions: blobs of
space that are further partitioned into <it>logical</it> partitions.
(So, an extended partition is only a box, it cannot be used itself,
one uses the logical partitions inside.)
Only the location of the first sector of an extended partition is
important. This first sector contains a partition table with four
entries: one a logical partition, one an extended partition, and
two unused. In this way one gets a chain of partition table sectors,
scattered over the disk, where the first one describes three primary
partitions and the extended partition, and each following partition
table sector describes one logical partition and the location of
the next partition table sector.
<p>
It is important to understand this: When people do something stupid
while partitioning a disk, they want to know: Is my data still there?
And the answer is usually: Yes. But if logical partitions were created
then the partition table sectors describing them are written at the
beginning of these logical partitions, and data that was there before is lost.
<p>
The program sfdisk will show the full chain. E.g.,
<tscreen><verb>
# sfdisk -l -x /dev/hda
Disk /dev/hda: 16 heads, 63 sectors, 33483 cylinders
Units = cylinders of 516096 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/hda1 0+ 101 102- 51376+ 83 Linux
/dev/hda2 102 2133 2032 1024128 83 Linux
/dev/hda3 2134 33482 31349 15799896 5 Extended
/dev/hda4 0 - 0 0 0 Empty
/dev/hda5 2134+ 6197 4064- 2048224+ 83 Linux
- 6198 10261 4064 2048256 5 Extended
- 2134 2133 0 0 0 Empty
- 2134 2133 0 0 0 Empty
/dev/hda6 6198+ 10261 4064- 2048224+ 83 Linux
- 10262 16357 6096 3072384 5 Extended
- 6198 6197 0 0 0 Empty
- 6198 6197 0 0 0 Empty
...
/dev/hda10 30581+ 33482 2902- 1462576+ 83 Linux
- 30581 30580 0 0 0 Empty
- 30581 30580 0 0 0 Empty
- 30581 30580 0 0 0 Empty
#
</verb></tscreen>
<p>
It is possible to construct bad partition tables.
Many kernels get into a loop if some extended partition points back
to itself or to an earlier partition in the chain.
It is possible to have two extended partitions in one of these
partition table sectors so that the partition table chain forks.
(This can happen for example with an fdisk that does not recognize
each of 5, F, 85 as an extended partition, and creates a 5 next to an F.)
No standard fdisk type program can handle such situations, and some
handwork is required to repair them.
The Linux kernel will accept a fork at the outermost level.
That is, you can have two chains of logical partitions.
Sometimes this is useful - for example, one can use type 5 and be
seen by DOS, and the other type 85, invisible for DOS, so that
DOS FDISK will not crash because of logical partitions past cylinder 1024.
Usually one needs <tt>sfdisk</tt> to create such a setup.
<p>
<sect>
Problem solving
<p>
Many people think they have problems, while in fact nothing is wrong.
Or, they think that the problems they have are due to disk geometry,
while in fact disk geometry has nothing to do with the matter.
All of the above may have sounded complicated, but disk geometry
handling is extremely easy: do nothing at all, and all is fine;
or perhaps give LILO the keyword <tt>lba32</tt> if it doesn't get past
`LI' when booting. Watch the kernel boot messages, and
remember: the more you fiddle with geometries (specifying heads
and cylinders to LILO and fdisk and on the kernel command line)
the less likely it is that things will work.
Roughly speaking, all is fine by default.
<p>
And remember: nowhere in Linux is disk geometry used, so no problem
you have while running Linux can be caused by disk geometry.
Indeed, disk geometry is used only by LILO and by fdisk.
So, if LILO fails to boot the kernel, that may be a geometry problem.
If different operating systems do not understand the partition table,
that may be a geometry problem. Nothing else. In particular, if
mount doesnt seem to work, never worry about disk geometry -
the problem is elsewhere.
<p>
<sect1>
Problem: My IDE disk gets a bad geometry when I boot from SCSI.
<p>
It is quite possible that a disk gets the wrong geometry.
The Linux kernel asks the BIOS about hd0 and hd1 (the BIOS drives
numbered 80H and 81H) and assumes that this data is for hda and hdb.
But on a system that boots from SCSI, the first two disks may well
be SCSI disks, and thus it may happen that the fifth disk, which is
the first IDE disk hda, gets assigned a geometry belonging to sda.
Such things are easily solved by giving boot parameters
`hda=C,H,S' for the appropriate numbers C, H and S, either at boot time
or in /etc/lilo.conf.
<p>
Since Linux 2.5.51 this BIOS information is not used anymore,
and the same problem occurs for all disks. See below.
<sect1>
Nonproblem: Identical disks have different geometry?
<p>
`I have two identical 10 GB IBM disks. However, fdisk
gives different sizes for them. Look:
<tscreen><verb>
# fdisk -l /dev/hdb
Disk /dev/hdb: 255 heads, 63 sectors, 1232 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hdb1 1 1232 9896008+ 83 Linux native
# fdisk -l /dev/hdd
Disk /dev/hdd: 16 heads, 63 sectors, 19650 cylinders
Units = cylinders of 1008 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hdd1 1 19650 9903568+ 83 Linux native
</verb></tscreen>
How come?'
What is happening here? Well, first of all these drives
really are 10gig: hdb has size 255<tt/*/63<tt/*/1232<tt/*/512 = 10133544960,
and hdd has size 16<tt/*/63<tt/*/19650<tt/*/512 = 10141286400, so, nothing
is wrong and the kernel sees both as 10.1 GB.
Why the difference in size? That is because the kernel gets
data for the first two IDE disks from the BIOS, and the BIOS
has remapped hdb to have 255 heads (and 16<tt/*/19650/255=1232 cylinders).
The rounding down here costs almost 8 MB.
<p>
If you would like to remap hdd in the same way, give the kernel
boot parameters `hdd=1232,255,63'.
<p>
On the other hand, if the disk is not shared with DOS or so,
it may be better to set hdb to Normal in the BIOS setup,
instead of asking for some translation like LBA.
<p>
Since Linux 2.5.51, the IDE driver no longer uses BIOS info on the first
two disks, and the different treatment of the first two disks has disappeared.
<sect1>
Problem: 2.4 and 2.6 report different geometries?
2.6 reports the wrong geometry? 2.6 reports no geometry at all?
<p>
Since geometry does not exist, it is not surprising that each of
2.0/2.2/2.4/2.6 reports a somewhat different disk geometry.
<p>
Some people will maintain that geometry *does* exist, and in that
case do not mean a property of the disk, but mean the values
reported by the BIOS. That is what several other operating systems
will use. Since Linux 2.5.51, the kernel no longer uses the values
reported by the BIOS - it is difficult to match BIOS device numbers
with Linux disk names, maybe data is only available for two disks,
maybe some disks are not present in the BIOS setup, etc.
However, if one needs these values, since Linux 2.6.5 one can set
CONFIG_EDD and mount sysfs, and then find the BIOS data for the
various disks under <tt>/sys/firmware/edd/int13_dev*</tt>.
Now the matching of BIOS numbers, represented in directory names
like <tt>int13_dev82</tt>, with Linux names like <tt>sda</tt> can
be done by user space software, possibly with help from the user.
<p>
This 2.5.51 change caused problems when many people using both Linux
and Windows on the same disk upgraded from 2.4 to 2.6 and used as
partitioning tool the program <tt>parted</tt> that had not yet
been updated. I have not checked whether current parted is OK.
<sect1>
Nonproblem: fdisk sees much more room than df?
<p>
fdisk will tell you how many blocks there are on the disk.
If you make a filesystem on the disk, say with mke2fs, then
this filesystem needs some space for bookkeeping - typically
something like 4% of the filesystem size, more if you ask for
a lot of inodes during mke2fs. For example:
<tscreen><verb>
# sfdisk -s /dev/hda9
4095976
# mke2fs -i 1024 /dev/hda9
mke2fs 1.12, 9-Jul-98 for EXT2 FS 0.5b, 95/08/09
...
204798 blocks (5.00%) reserved for the super user
...
# mount /dev/hda9 /somewhere
# df /somewhere
Filesystem 1024-blocks Used Available Capacity Mounted on
/dev/hda9 3574475 13 3369664 0% /mnt
# df -i /somewhere
Filesystem Inodes IUsed IFree %IUsed Mounted on
/dev/hda9 4096000 11 4095989 0% /mnt
#
</verb></tscreen>
We have a partition with 4095976 blocks, make an ext2 filesystem
on it, mount it somewhere and find that it only has 3574475 blocks -
521501 blocks (12%) was lost to inodes and other bookkeeping.
Note that the difference between the total 3574475 and the 3369664
available to the user are the 13 blocks in use plus the 204798
blocks reserved for root. This latter number can be changed by tune2fs.
This `-i 1024' is only reasonable for news spools and the like,
with lots and lots of small files. The default would be:
<tscreen><verb>
# mke2fs /dev/hda9
# mount /dev/hda9 /somewhere
# df /somewhere
Filesystem 1024-blocks Used Available Capacity Mounted on
/dev/hda9 3958475 13 3753664 0% /mnt
# df -i /somewhere
Filesystem Inodes IUsed IFree %IUsed Mounted on
/dev/hda9 1024000 11 1023989 0% /mnt
#
</verb></tscreen>
Now only 137501 blocks (3.3%) are used for inodes, so that we have
384 MB more than before. (Apparently, each inode takes 128 bytes.)
On the other hand, this filesystem can have at most 1024000 files
(more than enough), against 4096000 (too much) earlier.
</article>