old-www/LDP/LG/issue32/ayers1.html

<!--startcut ==========================================================-->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<title>An Introduction to Patch LG #32</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#A000A0"
ALINK="#FF0000">
<!--endcut ============================================================-->

<H4>
"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>

<P> <HR> <P>
<!--===================================================================-->

<center>
<H1><font color="maroon">Patch for Beginners</font></H1>
<h4>By <a href="mailto: layers@marktwain.net">Larry Ayers</a></h4>
</center>
<P> <HR> <P>

<center><h3>Introduction</h3></center>

<p>The aim of this article is to introduce new Linux users to an invaluable
resource, Larry Wall's <i>patch</i> program.  <i>Patch</i> is an interface to
the GNU <i>diff</i> utility, which is used to find differences between files;
<i>diff</i> has a multitude of options, but it's most often used to generate a file
which lists lines which have been changed, showing both the original and
changed lines and ignoring lines which have remained the same.  <i>Patch</i>
is typically used to update a directory of source code files to a newer
version, obviating the need to download an entire new source archive.
Downloading a patch in effect is just downloading the lines which have been
changed.

<p><i>Patch</i> originated in the nascent, bandwidth-constrained internet
environment of a decade ago, but like many Unix tools of that era it is still
much-used today.  In the February issue of the programmer's magazine
<b>Dr. Dobb's Journal</b> Larry Wall had some interesting comments on the
early days of <i>patch</i>:

<pre><blockquote>

DDJ: By the way, what came first, patch or diff?

LW: diff, by a long ways.  patch is one of those things
that, in retrospect, I was totally amazed that nobody had
thought of it sooner, because I think that diff predated
patch by at least ten years or so.

I think I know why, though.  And it's one of these little
psychological things.  When they made diff, they added an
option called -e, I think it was, and that would spit out an
ed script, so people said to themselves, "Well, if I wanted
to automate the applying of a diff, I would use that." So it
never actually occurred to someone that you could write a
computer program to take the other forms of output and apply
them.  Either that, or it did not occur to them that there
was some benefit to using the context diff form, because you
could apply it to something that had been changed and still
easily get it to do the right thing.

It's one of those things that's obvious in retrospect.  But
to be perfectly honest, it wasn't really a brilliant flash
of inspiration so much as self defense.  I put out the first
version of rn, and then I started putting out patches for
it, and it was a total mess.  You could not get people to
apply patches because they had to apply them by hand.  So,
they would skip the ones that they didn't think they needed,
and they'd apply the new ones over that, and they'd get
totally messed up.  I wrote patch so that they wouldn't have
this excuse that it was too hard.

I don't know whether it's still the case, but for many
years, I told people that I thought patch had changed the
culture of computing more than either rn or Perl had.  Now
that the Internet is getting a lot faster than it used to
be, and it's getting much easier to distribute whole
distributions, patches tend to be sent around only among
developers.  I haven't sent out a patch kit for Perl in
years.  I think patch has became less important for the
whole thing, but still continues to be a way for developers
to interchange ideas.  But for a while in there, patch
really did make a big difference to how software was
developed.  </blockquote></pre>

<p>Larry Wall's assessment of the diminishing importance of <i>patch</i> to
the computing community as a whole is probably accurate, but in the free
software world it's still an essential tool.  The ubiquity of <i>patch</i>
makes it possible for new users and non-programmers to easily participate in
alpha- and beta-testing of software, thus benefiting the entire community.

<p>It occurred to me to write this article after noticing a thread which
periodically resurfaces in the linux-kernel mailing list.  About every three
months someone will post a plea for a split Linux kernel source distribution,
so that someone just interested in, say, the i386 code and the IDE disk driver
wouldn't have to download the Alpha, Sparc, etc. files and the many SCSI
drivers for each new kernel release.  A series of patient (and some
not-so-patient) replies will follow, most urging the original poster to use
patches to upgrade the kernel source.  Linus Torvalds will then once again
state that he has no interest in undertaking the laborious task of splitting
the kernel source into chunks, but that if anyone else wants to, they should
feel free to do so as an independent project.  So far no-one has volunteered.
I can't blame the kernel-hackers for not wanting to further complicate their
lives; I imagine it would be much more interesting and challenging to work
directly with the kernel than to overhaul the entire kernel distribution
scheme! Downloading an eleven megabyte kernel source archive <i>is</i>
time-consuming (and, for those folks paying by the minute for net access,
expensive as well) but the kernel patches can be as small as a few dozen
kilobytes, and are hardly ever larger than one megabyte.  The 2.1.119
development kernel source on my hard disk has been incrementally patched up
from version 2.1.99, and I doubt if I'd follow the development as closely if I
had to download each release in its entirety.

<center><h3>Using Patch</h3></center>

<p><i>Patch</i> comes with a good manual-page which lists its numerous
options, but 99% of the time just two of them will suffice:

<ul>
  <li><kbd>patch -p1 &lt; [patchfile]</kbd>
  <li><kbd>patch -R &lt; [patchfile]</kbd> &nbsp;(used to undo a patch)
</ul>

<p>The <i>-p1</i> option strips the left-most directory level from the
filenames in the patch-file, as the top-level directory is likely to vary on
different machines.  To use this option, place your patch within the directory
being patched, and then run <i>patch -p1 &lt; [patchfile]</i> from within that
directory.  A short excerpt from a Linux kernel patch will illustrate
this:<br>

<pre><kbd>
diff -u --recursive --new-file v2.1.118/linux/mm/swapfile.c linux/mm/swapfile.c
--- v2.1.118/linux/mm/swapfile.c	Wed Aug 26 11:37:45 1998
+++ linux/mm/swapfile.c	Wed Aug 26 16:01:57 1998
@@ -489,7 +489,7 @@
 	int swap_header_version;
 	int lock_map_size = PAGE_SIZE;
 	int nr_good_pages = 0;
-	char tmp_lock_map = 0;
+	unsigned long tmp_lock_map = 0;
</kbd></pre>

<p>Applying the patch from which this segment was copied with the <i>-p1</i>
switch effectively truncates the path which <i>patch</i> will seek;
<i>patch</i> will look for a subdirectory of the current directory named
<kbd>/mm</kbd>, and should then find the <i>swapfile.c</i> file there, waiting
to be patched.  In this excerpt, the line preceded by a dash will be
replaced with the line preceded by a plus sign.  A typical patch will contain
updates for many files, each section consisting of the output of <i>diff
-u</i> run on two versions of a file.

<p><i>Patch</i> displays its output to the screen as it works, but this output
usually scrolls by too quickly to read.  The original, pre-patch files are
renamed <i>*.orig</i>, while the new patched files will bear the original
filenames.

<center><h3>Patching Problems</h3></center>

<p>One possible source of problems using patch is differences between various
versions, all of which are available on the net.  Larry Wall hasn't done much
to improve patch in recent years, possibly because his last release of the
utility works well in the majority of situations.  FSF programmers
from the GNU project have been releasing new versions of patch for the past
several years.  Their first revisions of patch had a few problems, but I've been
using version 2.5 (which is the version distributed with Debian 2.0) lately
with no problems.  Version 2.1 has worked well for me in the past.  The source
for the current GNU version of <i>patch</i> is available from the
<a href="ftp://ftp.gnu.org/pub/gnu">GNU</a> FTP site, though most people will
just use the version supplied with their distribution of Linux.

<p>Let's say you have patched a directory of source files, and the patch
didn't apply cleanly .  This happens occasionally, and when it does
<i>patch</i> will show an error message indicating which file confused it,
along with the line numbers.  Sometimes the error will be obvious, such as an
omitted semicolon, and can be fixed without too much trouble.  Another
possibility is to delete from the patch the section which is causing trouble,
but this may or may not work, depending on the file involved.

<p>Another common error scenario: suppose you have un-tarred a kernel source
archive, and while exploring the various subdirectories under
<kbd>/linux/arch/</kbd> you notice the various machine architecture
subdirectories, such as <kbd>alpha</kbd>, <kbd>sparc</kbd>, etc.  If you, like
most Linux users, are running a machine with an Intel processor (or one of the
Intel clones), you might decide to delete these directories, which are not
needed for compiling your particular kernel and which occupy needed
disk space.  Some time later a new kernel patch is released and while
attempting to apply it <i>patch</i> stalls when it is unable to find the Alpha
or PPC files it would like to patch.  Luckily <i>patch</i> allows user
intervention at this point, asking the question "Skip this patch?"  Tell it
"y", and <i>patch</i> will proceed along its merry way.  You will probably
have to answer the question numerous times, which is a good argument for
allowing the un-needed directories to remain on your disk.


<center><h3>Kernel-Patching Tips</h3></center>

<p>Many Linux users use <i>patch</i> mainly for patching the kernel source, so
a few tips are in order.  Probably the easiest method is to use the
shell-script <i>patch-kernel</i>, which can be found in the
<kbd>/scripts</kbd> subdirectory of the kernel source-tree.   This handy and
well-written script was written by Nick Holloway in 1995; a couple of years
later Adam Sulmicki added support for several compression algorithms,
including *.bz, *.bz2, compress, gzip, and plain-text (i.e., a patch which has
already been uncompressed).  The script assumes that your kernel source is in
<kbd>/usr/src/linux,</kbd>, with your new patch located in the current
directory.  Both of these defaults can be overridden by command-line switches
in this format: <kbd>patch-kernel [ sourcedir [ patchdir ] ]</kbd>.
<i>Patch-kernel</i> will abort if any part of the patch fails, but if the
patch applies cleanly it will invoke <i>find</i>, which will delete all of the
<kbd>*.orig</kbd> files which <i>patch</i> leaves behind.

<p>If you prefer to see the output of commands, or perhaps you would rather
keep the <kbd>*.orig</kbd> files until you are certain the patched source
compiles, running <i>patch</i> directly (with the patch located in the kernel
source top-level directory, as outlined above) has been very reliable in my
experience.  In order to avoid uncompressing the patch before
applying it a simple pipe will do the trick:

<p><kbd>gzip -cd patchXX.gz | patch -p1</kbd><br>

<p>or:

<p><kbd>bzip2 -dc patchXX.bz2 | patch -p1</kbd>

<p>After the patch has been applied the <i>find</i> utility can be used to
check for rejected files:

<p><kbd>find . -name \*.rej</kbd>

<p>At first the syntax of this command is confusing.  The period indicates
that <i>find</i> should look in the current directory and recursively in all
subdirectories beneath it.  Remember the period should have a space both
before and after it.  The backslash before the wildcard "*" "escapes" the
asterisk in order to avoid confusing the shell, for which an asterisk has
another meaning.  If <i>find</i> locates any <kbd>*.rej</kbd> files it will
print the filenames on the screen.  If <i>find</i> exits without any visible
output it's nearly certain the patch applied correctly.

<p>Another job for <i>find</i> is to remove the <kbd>*.orig</kbd> files:

<p><kbd>find . -name \*.orig -print0 | xargs -0r rm -f</kbd>

<p>This command is sufficiently cumbersome to type that it would be a good
candidate for a new shell alias.  A line in your ~/.bashrc file such as:

<p><kbd>alias findorig 'find . -name \*.orig -print0 | xargs -0r rm -f'</kbd>

<p>will allow just typing <kbd>findorig</kbd> to invoke the above command.
The single quotes in the alias definition are necessary if an aliased command
contains spaces.  In order to use a new alias without logging out and then
back in again, just type <kbd>source ~/.bashrc</kbd> at the prompt.

<center><h3>Incidental Comments and Conclusion</h3></center>


<p>While putting this article together I upgraded the version of <i>patch</i>
on my machine from version 2.1 to version 2.5.  Both of these versions come
from the current FSF/GNU maintainers.  Immediately I noticed that the default
output of version 2.5 has been changed, with less information appearing on the
screen.  Gone is Larry Wall's "...hmm" which used to appear while <i>patch</i>
was attempting to determine the proper lines to patch.  The output of version
2.5 is simply a list of messages such as "patching file [filename]", rather
than the more copious information shown by earlier versions.  Admittedly, the
information scrolled by too quickly to read, but the output could be
redirected to a file for later perusal.  This change doesn't affect the
functionality of the program, but does lessen the human element.  It seems to
me that touches such as the old "...hmm" messages, as well as comments in
source code, are valuable in that they remind the user that a program is the
result of work performed by a living, breathing human being, rather than a
sterile collection of bits.  The old behavior can be restored by appending the
switch <kbd>--verbose</kbd> to the <i>patch</i> command-line, but I'm sure
that many users either won't be aware of the option or won't bother to type it
in.  Another difference between 2.1 and 2.5 is that the <i>*.orig</i> back-up
files aren't created unless <i>patch</i> is given the <kbd>-b</kbd> option.

<p><i>Patch</i> is not strictly necessary for an end-user who isn't interested
in trying out and providing bug-reports for "bleeding-edge" software and
kernels, but often the most interesting developments in the Linux world belong
in this category.  It isn't difficult to get the hang of using <i>patch</i>, and the
effort will be amply repaid.

<p>
<hr>
<!-- hhmts start -->
Last modified: Mon 31 Aug 1998
<!-- hhmts end -->

<!--===================================================================-->
<P> <hr> <P>
<center><H5>Copyright &copy; 1998, Larry Ayers <BR>
Published in Issue 32 of <i>Linux Gazette</i>, September 1998</H5></center>

<!--===================================================================-->
<P> <hr> <P>
<A HREF="./index.html"><IMG ALIGN=BOTTOM SRC="../gx/indexnew.gif"
ALT="[ TABLE OF CONTENTS ]"></A>
<A HREF="../index.html"><IMG ALIGN=BOTTOM SRC="../gx/homenew.gif"
ALT="[ FRONT PAGE ]"></A>
<A HREF="./bandel.html"><IMG SRC="../gx/back2.gif"
ALT=" Back "></A>
<A HREF="./ayers2.html"><IMG SRC="../gx/fwd.gif" ALT=" Next "></A>
<P> <hr> <P>
<!--startcut ==========================================================-->
</BODY>
</HTML>
<!--endcut ============================================================-->