old-www/HOWTO/Multi-Disk-HOWTO-5.html

744 lines
22 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
<TITLE>HOWTO: Multi Disk System Tuning: File Systems</TITLE>
<LINK HREF="Multi-Disk-HOWTO-6.html" REL=next>
<LINK HREF="Multi-Disk-HOWTO-4.html" REL=previous>
<LINK HREF="Multi-Disk-HOWTO.html#toc5" REL=contents>
</HEAD>
<BODY>
<A HREF="Multi-Disk-HOWTO-6.html">Next</A>
<A HREF="Multi-Disk-HOWTO-4.html">Previous</A>
<A HREF="Multi-Disk-HOWTO.html#toc5">Contents</A>
<HR>
<H2><A NAME="s5">5. File Systems</A></H2>
<P>
<!--
disk!file systems
-->
Over time the requirements for file systems have increased and the
demands for large structures, large files, long file names and more
has prompted ever more advanced file systems, the system that
accesses and organises the data on mass storage.
Today there is a large number of file systems to choose from and this
section will describe these in detail.
<P>The emphasis is on Linux but with more input I will be happy to add
information for a wider audience.
<P>
<P>
<H2><A NAME="ss5.1">5.1 General Purpose File Systems</A>
</H2>
<P>Most operating systems usually have a general purpose file system for
every day use for most kinds of files, reflecting available features
in the OS such as permission flags, protection and recovery.
<P>
<H3><CODE>minix</CODE></H3>
<P>
<!--
disk!file system!minix
-->
This was the original fs for Linux, back in the days Linux was hosted
on minix machines. It is simple but limited in features and hardly ever
used these days other than in some rescue disks as it is rather compact.
<P>
<H3><CODE>xiafs</CODE> and <CODE>extfs</CODE></H3>
<P>
<!--
disk!file system!xiafs
-->
<!--
disk!file system!extfs
-->
These are also old and have fallen in disuse and are no longer recommended.
<P>
<H3><CODE>ext2fs</CODE></H3>
<P>
<!--
disk!file system!ext2fs
-->
This is the established standard for general purpose in the Linux world.
It is fast, efficient and mature and is under continuous development and
features such as ACL and transparent compression are on the horizon.
<P>For more information check the
<A HREF="http://web.mit.edu/tytso/www/linux/ext2.html">ext2fs</A>
home page.
<P>
<P>
<H3><CODE>ext3fs</CODE></H3>
<P>
<!--
disk!file system!ext3fs
-->
This is the name for the upcoming successor to <CODE>ext2fs</CODE> due to enter
stable kernel in the near future. Many features are added to
<CODE>ext2fs</CODE> but to avoid confusion over the name after such a radical
upgrade the name will be changed too. You may have heard of it already
but source code is now in beta release .
<P>Patches are available at
<A HREF="ftp://ftp.linux.org.uk/pub/linux/sct/fs/jfs">Linux.org</A>.
<P>
<P>
<P>
<H3><CODE>ufs</CODE></H3>
<P>
<!--
disk!file system!ufs
-->
This is the fs used by BSD and variants thereof. It is mature but also
developed for older types of disk drives where geometries were known. The
fs uses a number of tricks to optimise performance but as disk geometries
are translated in a number of ways the net effect is no longer so optimal.
<P>
<P>
<H3><CODE>efs</CODE></H3>
<P>
<!--
disk!file system!efs
-->
The Extent File System (efs) is Silicon Graphics' early file system
widely used on IRIX before version 6.0 after which xfs has taken over.
While migration to xfs is encouraged efs is still supported
and much used on CDs.
<P>There is a Linux driver available in early beta stage, available at
<A HREF="http://aeschi.ch.eu.org/efs/">Linux extent file system</A>
home page.
<P>
<P>
<H3><CODE>XFS</CODE></H3>
<P>
<!--
disk!file system!XFS
-->
<A HREF="http://www.sgi.com/">Silicon Graphics Inc (sgi)</A>
has started porting its mainframe grade file system to Linux.
Source is not yet available as they are busily cleaning out
legal encumbrance but once that is done they will provide the
source code under GPL.
<P>More information is already available on the
<A HREF="http://oss.sgi.com/projects/xfs/">XFS project page</A>
at SGI.
<P>
<P>
<P>
<H3><CODE>reiserfs</CODE></H3>
<P>
<!--
disk!file system!reiserfs
-->
<!--
disk!file system!tree based
-->
As of July, 23th 1997
Hans Reiser <CODE>reiser (at) RICOCHET.NET</CODE>
has put up the source to his tree based
<A HREF="http://www.namesys.com">reiserfs</A>
on the web. While his filesystem has some very interesting features and
is much faster than <CODE>ext2fs</CODE> and is in use by a number of people.
Hopefully it will be ready for kernel 2.4.0 which might be ready at
the end of the year.
<P>
<P>
<P>
<H3><CODE>enh-fs</CODE></H3>
<P>
<!--
disk!file system!enhanced fs
-->
The Enhanced File System project is now dead.
<P>
<P>
<H3><CODE>Tux2 fs</CODE></H3>
<P>
<!--
disk!file system!Tux2 fs
-->
This is a variation on the <CODE>ext2fs</CODE> that adds robustness
in case of unexpected interruptions such as power failure.
After such an event <CODE>Tux2 fs</CODE> will restart with the file system
in a consistent, recently recorded state without fsck or
other recovery operations. To achieve this <CODE>Tux2 fs</CODE> uses
a newly designed algorithm called Phase Tree.
<P>More information can be found at the
<A HREF="http://tux2.sourceforge.net">project home page</A>.
<P>
<P>
<H2><A NAME="ss5.2">5.2 Microsoft File Systems</A>
</H2>
<P>
<!--
disk!file system!Microsoft
-->
<!--
disk!file system!confusion
-->
This company is responsible for a lot, including a number of filesystems
that has at the very least caused confusions.
<P>
<P>
<H3><CODE>fat</CODE></H3>
<P>
<!--
disk!file system!fat
-->
Actually there are 2 <CODE>fat</CODE>s out there, <CODE>fat12</CODE> and <CODE>fat16</CODE>
depending on the partition size used but fortunately the difference
is so minor that the whole issue is transparent.
<P>On the plus side these are fast and simple and most OSes understands
it and can both read and write this fs. And that is about it.
<P>The minus side is limited safety, severely limited permission flags
and atrocious scalability. For instance with <CODE>fat</CODE> you cannot
have partitions larger than 2 GB.
<P>
<P>
<H3><CODE>fat32</CODE></H3>
<P>
<!--
disk!file system!fat32
-->
After about 10 years Microsoft realised <CODE>fat</CODE> was about, well, 10 years
behind the times and created this fs which scales reasonably well.
<P>Permission flags are still limited.
NT 4.0 cannot read this file system but Linux can.
<P>
<P>
<H3><CODE>vfat</CODE></H3>
<P>
<!--
disk!file system!vfat
-->
At the same time as Microsoft launched <CODE>fat32</CODE> they also added
support for long file names, known as <CODE>vfat</CODE>.
<P>Linux reads <CODE>vfat</CODE> and <CODE>fat32</CODE> partitions by mounting with
type <CODE>vfat</CODE>.
<P>
<P>
<H3><CODE>ntfs</CODE></H3>
<P>
<!--
disk!file system!ntfs
-->
This is the native fs of Win-NT but as complete information is not available
there is limited support for other OSes.
<P>
<P>
<H2><A NAME="ss5.3">5.3 Logging and Journaling File Systems</A>
</H2>
<P>
<!--
disk!file system!logging file systems
-->
<!--
disk!file system!journaling file systems
-->
These take a radically different approach to file updates by
logging modifications for files in a log and later at some
time checkpointing the logs.
<P>Reading is roughly as fast as traditional file systems that
always update the files directly.
Writing is much faster as only updates are appended to a log.
All this is transparent to the user. It is in reliability and
particularly in checking file system integrity that these
file systems really shine.
Since the data before last checkpointing is known to be good
only the log has to be checked, and this is much faster than
for traditional file systems.
<P>Note that while
<EM>logging</EM> filesystems keep track of changes made to both data and inodes,
<EM>journaling</EM> filesystems keep track only of inode changes.
<P>Linux has quite a choice in such file systems but none are
yet in production quality. Some are also on hold.
<P>
<UL>
<LI>Adam Richter from Yggdrasil posted some time ago that they have been
working on a compressed log file based system but that this project is
currently on hold. Nevertheless a non-working version is available on
their FTP server. Check out
<A HREF="ftp://ftp.yggdrasil.com/private/adam">the Yggdrasil ftp server</A>
where special patched versions of the kernel can be found.
</LI>
<LI>Another project is the
<A HREF="http://outflux.net/projects/lfs/">Linux log-structured Filesystem Project</A>
which sadly also is on hold. Nevertheless this page contains
much information on the topic.
</LI>
<LI>Then there is the
<A HREF="http://www.complang.tuwien.ac.at/czezatke/lfs.html">LinLogFS -- A Log-Structured Filesystem For Linux</A>
(formerly known as dtfs)
which seems to be going strong. Still in alpha but sufficiently
complete to make programs run off this file system
</LI>
<LI>Finally there is the
<A HREF="http://developer.axis.com/software/jffs/">Journaling Flash File System</A>
designed for their embedded diskless systems such as
their Linux based web camera.
</LI>
</UL>
<P>Note that <CODE>ext3fs</CODE>, <CODE>XFS</CODE> and <CODE>reiserfs</CODE> also have
features for logging or journaling.
<P>
<H2><A NAME="ss5.4">5.4 Read-only File Systems</A>
</H2>
<P>
<!--
disk!file system!read-only file systems
-->
Read-only media has not escaped the ever increasing complexities
seen in more general file systems so again there is a large choice
to choose from with corresponding opportunities for exciting mistakes.
<P>Note that <CODE>ext2fs</CODE> works quite well on a CD-ROM
and seems to save space while offering the normal file system features
such as long file names and permissions that can be retained when
copying files across to read-write media. Also having
<A HREF="file:///dev/">/dev</A>
on a CD-ROM is possible.
<P>
<!--
disk!file system!CD-ROM
-->
<!--
disk!file system!DVD
-->
<!--
disk!file system!loopback
-->
Most of these are used with the CD-ROM media but also the new
DVD can be used and you can even use it through the loopback device
on a hard disk file for verifying an image before burning a ROM.
<P>
<!--
disk!file system!rom file systems
-->
<!--
disk!file system!romfs
-->
There is a read-only <CODE>romfs</CODE> for Linux but as that is not disk
related nothing more will be said about it here.
<P>
<H3><CODE>High Sierra</CODE></H3>
<P>
<!--
disk!file system!High Sierra
-->
This was one of the earliest standards for CD-ROM formats,
supposedly named after the hotel where the final agreement took place.
<P><CODE>High Sierra</CODE> was so limited in features that new extensions simply
had to appear and while there has been no end to new formats the original
<CODE>High Sierra</CODE> remains the common precursor and is therefore still
widely supported.
<P>
<P>
<H3><CODE>iso9660</CODE></H3>
<P>
<!--
disk!file system!iso9660
-->
The International Standards Organisation made their extensions and
formalised the standard into what we know as the <CODE>iso9660</CODE> standard.
<P>The Linux iso9660 file system supports both High Sierra as well as
<CODE>Rock Ridge</CODE> extensions.
<P>
<P>
<H3><CODE>Rock Ridge</CODE></H3>
<P>
<!--
disk!file system!Rock Ridge
-->
Not everyone accepts limits like short filenames and lack of permissions
so very soon the <CODE>Rock Ridge</CODE> extensions appeared to rectify these
shortcomings.
<P>
<P>
<H3><CODE>Joliet</CODE></H3>
<P>
<!--
disk!file system!Joliet
-->
Microsoft, not be be outdone in the standards extension game, decided
it should extend CD-ROM formats with some internationalisation features
and called it <CODE>Joliet</CODE>.
<P>Linux supports this standards in kernels 2.0.34 or newer.
You need to enable NLS in order to use it.
<P>
<P>
<H3>Trivia</H3>
<P>
<!--
disk!file system!Trivia
-->
Joliet is a city outside Chicago; best known for being the site of
the prison where Jake was locked up in the movie "Blues Brothers."
Rock Ridge (the UNIX extensions to ISO 9660) is named
after the (fictional) town in the movie "Blazing Saddles."
<P>
<P>
<H3><CODE>UDF</CODE></H3>
<P>
<!--
disk!file system!UDF
-->
With the arrival of DVD with up to about 17 GB of storage capacity
the world seemingly needed another format, this time ambitiously named
Universal Disk Format (UDF).
This is intended to replace <CODE>iso9660</CODE> and will be required for DVD.
<P>Currently this is not in the standard Linux kernel but a project
is underway to make a
<A HREF="http://trylinux.com/projects/udf/index.html">http://trylinux.com/projects/udf/index.html</A>
name="UDF driver">
for Linux. Patches and documentation are available.
<P>More information is also available at the
<A HREF="http://atv.ne.mediaone.net/linux-dvd/">Linux and DVDs</A>
page.
<P>
<P>
<P>
<H2><A NAME="ss5.5">5.5 Networking File Systems</A>
</H2>
<P>
<!--
disk!file system!networking file systems
-->
There is a large number of networking technologies available that
lets you distribute disks throughout a local or even global networks.
This is somewhat peripheral to the topic of this HOWTO but as it can
be used with local disks I will cover this briefly. It would be best
if someone (else) took this into a separate HOWTO...
<P>
<H3><CODE>NFS</CODE></H3>
<P>
<!--
disk!file system!NFS
-->
This is one of the earliest systems that allows mounting a file space
on one machine onto another. There are a number of problems with <CODE>NFS</CODE>
ranging from performance to security but it has nevertheless become
established.
<P>
<H3><CODE>AFS</CODE></H3>
<P>
<!--
disk!file system!AFS
-->
This is a system that allows efficient sharing of files
across large networks. Starting out as an academic project
it is now sold by
<A HREF="http://www.transarc.com">Transarc</A>
whose home page gives you more details.
<P>Derek Atkins, of MIT, ported AFS to Linux and has also set up the
Linux AFS mailing List (
<A HREF="mailto:linux-afs@mit.edu">linux-afs@mit.edu</A>)
for this which is open to the public.
Requests to join the list should go to
<A HREF="mailto:linux-afs-request@mit.edu">linux-afs-request@mit.edu</A>
and finally bug reports should be directed to
<A HREF="mailto:linux-afs-bugs@mit.edu">linux-afs-bugs@mit.edu</A>.
<P>Important: as AFS uses encryption it is
restricted software and cannot easily be exported from the US.
<P>IBM who owns Transarc, has announced the availability of the latest
version of client as well as server for Linux.
<P>Arla is a free AFS implementation, check the
<A HREF="http://www.stacken.kth.se/projekt/arla/">Arla homepage</A>
for more information as well as documentation.
<P>
<P>
<H3>Coda</H3>
<P>
<!--
disk!file system!Coda
-->
A networking filesystem similar to <CODE>AFS</CODE> is underway and is called
<A HREF="http://coda.cs.cmu.edu/">Coda</A>.
This is designed to be more robust and fault tolerant than <CODE>AFS</CODE>,
and supports mobile, disconnected operations.
Currently it does not scale very well, and does not really have
proper administrative tools, as <CODE>AFS</CODE> does and <CODE>ARLA</CODE> is
beginning to.
<P>
<P>
<H3><CODE>nbd</CODE></H3>
<P>
<!--
disk!file system!nbd
-->
<!--
disk!device!network block device
-->
The
<A HREF="http://atrey.karlin.mff.cuni.cz/~pavel/">Network Block Device</A>
(<CODE>nbd</CODE>) is available in Linux kernel 2.2
and later and offers reportedly excellent performance. The interesting
thing here is that it can be combined with RAID (see later).
<P>
<P>
<H3><CODE>enbd</CODE></H3>
<P>
<!--
disk!file system!enbd
-->
<!--
disk!device!enhanced network block device
-->
The
<A HREF="http://www.it.uc3m.es/~ptb/nbd">http://www.it.uc3m.es/~ptb/nbd</A>
name="Enhanced Network Block Device">
(<CODE>enbd</CODE>) is a project to enhance the <CODE>nbd</CODE> with
features such as block journaled multi channel communications,
internal failover and automatic balancing between channels
and more.
<P>The intended use is for RAID over the net.
<P>
<H3>GFS</H3>
<P>
<!--
disk!file system!GFS
-->
<!--
disk!device!Global File System
-->
The
<A HREF="http://gfs.lcse.umn.edu/">Global File System</A>
is a new file system designed for storage across a wide area network.
It is currently in the early stages and more information will come
later.
<P>
<P>
<P>
<H2><A NAME="ss5.6">5.6 Special File Systems</A>
</H2>
<P>In addition to the general file systems there is also a number of
more specific ones, usually to provide higher performance or other
features, usually with a tradeoff in other respects.
<P>
<P>
<H3><A NAME="tmpfs"></A> <CODE>tmpfs</CODE> and <CODE>swapfs</CODE> </H3>
<P>
<!--
disk!file system!tmpfs
-->
<!--
disk!file system!swapfs
-->
For short term fast file storage SunOS offers <CODE>tmpfs</CODE> which is
about the same as the <CODE>swapfs</CODE> on NeXT.
This overcomes the inherent slowness in <CODE>ufs</CODE> by caching file data
and keeping control information in memory. This means that data on such
a file system will be lost when rebooting and is therefore mainly
suitable for <CODE>/tmp</CODE> area but not <CODE>/var/tmp</CODE> which is where
temporary data that must survive a reboot, is placed.
<P>SunOS offers very limited tuning for <CODE>tmpfs</CODE> and the number of
files is even limited by total physical memory of the machine.
<P>
<P>Linux now features <CODE>tmpfs</CODE> since kernel version 2.4 and is
enabled by turning on virtual memory file system support (former shm fs).
Under certain circumstances <CODE>tmpfs</CODE> can lock up the system in
early kerbel versions, make sure you use version 2.4.6 or later.
<P>
<P>
<H3><CODE>userfs</CODE></H3>
<P>
<!--
disk!file system!userfs
-->
<!--
disk!file system!arcfs
-->
<!--
disk!file system!docfs
-->
The user file system (<CODE>userfs</CODE>) allows a number of extensions to
traditional file system use such as
FTP based file system, compression (<CODE>arcfs</CODE>) and fast prototyping
and many other features. The <CODE>docfs</CODE> is based on this filesystem.
Check the
<A HREF="http://www.goop.org/~jeremy/userfs/">userfs homepage</A>
for more information.
<P>
<P>
<H3><CODE>devfs</CODE></H3>
<P>
<!--
disk!file system!devfs
-->
When disks are added, removed or just fail it is likely that
disk device names of the remaining disks will change.
For instance if <CODE>sdb</CODE> fails then the old <CODE>sdc</CODE> becomes <CODE>sdb</CODE>,
the old <CODE>sdc</CODE> becomes <CODE>sdb</CODE> and so on.
Note that in this case <CODE>hda</CODE>, <CODE>hdb</CODE> etc will remain unchanged.
Likewise if a new drive is added the reverse may happen.
<P>There is no guarantee that SCSI ID 0 becomes <CODE>sda</CODE> and that adding
disks in increasing ID order will just add a new device name without
renaming previous entries, as some SCSI drivers assign from ID 0 and up
while others reverse the scanning order.
Likewise adding a SCSI host adapter can also cause renaming.
<P>Generally device names are assigned in the order they are found.
<P>The source of the problem lies in the limited number of bits available
for major and minor numbering in the device files used to describe the
device itself. You an see these in the
<A HREF="file:///dev/">/dev</A>
directory, info
on the numbering and allocation can be found in <CODE>man MAKEDEV</CODE>.
Currently there are 2 solutions to this problem in various stages of
development:
<DL>
<DT><B>scsidev</B><DD><P>works by creating a database of drives and where they
belong, check <EM> man scsifs</EM> and the
<A HREF="http://www.garloff.de/kurt/linux/scsidev/">scsidev home page</A>
for more information
<DT><B>devfs</B><DD><P>is a more long term project aimed at getting around the
whole business of device numbering by making the
<A HREF="file:///dev/">/dev</A>
directory a kernel file system in the same way as
<A HREF="file:///proc/">/proc</A>
is.
More information will appear as it becomes available.
</DL>
<P>
<P>
<H3><CODE>smugfs</CODE></H3>
<P>
<!--
disk!file system!smugfs
-->
<!--
disk!file system!huge files
-->
For a number of reasons it is currently difficult to have files
bigger than 2 GB. One file system that tries to overcome this
limit is <CODE>smugfs</CODE> which is very fast but also simple. For instance
there are no directories and the block allocation is simple.
<P>It is available as
<A HREF="ftp://atrey.karlin.mff.cuni.cz/pub/local/mj/linux/">compressed tarred source code</A>
and while it worked with kernel version 2.1.85 it is quite possible some
work is required to make it fit into newer kernels. Also the low version
number (0.0) suggests extra care is required.
<P>
<P>
<H2><A NAME="ss5.7">5.7 File System Recommendations</A>
</H2>
<P>There is a jungle of choices but generally it is recommended to
use the general file system that comes with your distribution.
If you use <CODE>ufs</CODE> and have some kind of <CODE>tmpfs</CODE> available
you should first start off with the general file system to get
an idea of the space requirements and if necessary buy more
RAM to support the size of <CODE>tmpfs</CODE> you need. Otherwise you
will end up with mysterious crashes and lost time.
<P>If you use dual boot and need to transfer data between the two
OSes one of the simplest ways is to use an appropriately sized
partition formatted with <CODE>fat</CODE> as most systems can reliably
read and write this.
Remember the limit of 2 GB for <CODE>fat</CODE> partitions.
<P>For more information of file system interconnectivity you can
check out the
<A HREF="http://students.ceid.upatras.gr/~gef/fs/oldindex.html">file system</A>
page
which has been superseded by
<A HREF="http://www.penguin.cz/~mhi/fs/">file system</A>
and the article
<A HREF="http://linuxtoday.com/stories/5556.html">Kragen's Amazing List of Filesystems</A>.
<P>
<P>That guide is being superseded by a HOWTO which is underway and
a link will be added when it is ready.
<P>To avoid total havoc with device renaming if a drive fails
check out the scanning order of your system and try to keep
your root system on <CODE>hda</CODE> or <CODE>sda</CODE> and removable media
such as ZIP drives at the end of the scanning order.
<P>
<P>
<P>
<P>
<HR>
<A HREF="Multi-Disk-HOWTO-6.html">Next</A>
<A HREF="Multi-Disk-HOWTO-4.html">Previous</A>
<A HREF="Multi-Disk-HOWTO.html#toc5">Contents</A>
</BODY>
</HTML>