LDP/LDP/guide/docbook/Bash-Beginners-Guide/chap5.xml

318 lines
17 KiB
XML

<chapter id="chap_05">
<title>The GNU sed stream editor</title>
<abstract>
<para>At the end of this chapter you will know about the following topics:</para>
<para><itemizedlist>
<listitem><para>What is <command>sed</command>?</para></listitem>
<listitem><para>Interactive use of <command>sed</command></para></listitem>
<listitem><para>Regular expressions and stream editing</para></listitem>
<listitem><para>Using <command>sed</command> commands in scripts</para></listitem>
</itemizedlist></para>
<para>
<note><title>This is an introduction</title>
<para>These explanations are far from complete and certainly not meant to be used as the definite user manual for <command>sed</command>. This chapter is only included in order to show some more interesting topics in the next chapters, and because every power user should have a basic knowledge of things that can be done with this editor.</para>
<para>For detailed information, refer to the <command>sed</command> info and man pages.</para>
</note>
</para>
</abstract>
<sect1 id="sect_05_01"><title>Introduction</title>
<sect2 id="sect_05_01_01"><title>What is sed?</title>
<para>A <application>Stream EDitor</application> is used to perform basic transformations on text read from a file or a pipe. The result is sent to standard output. The syntax for the <command>sed</command> command has no output file specification, but results can be saved to a file using output redirection. The editor does not modify the original input.</para>
<para>What distinguishes <command>sed</command> from other editors, such as <command>vi</command> and <command>ed</command>, is its ability to filter text that it gets from a pipeline feed. You do not need to interact with the editor while it is running; that is why <command>sed</command> is sometimes called a <emphasis>batch editor</emphasis>. This feature allows use of editing commands in scripts, greatly easing repetitive editing tasks. When facing replacement of text in a large number of files, <command>sed</command> is a great help.</para>
</sect2>
<sect2 id="sect_05_01_02"><title>sed commands</title>
<para>The <command>sed</command> program can perform text pattern substitutions and deletions using regular expressions, like the ones used with the <command>grep</command> command; see <xref linkend="sect_04_02" />.</para>
<para>The editing commands are similar to the ones used in the <command>vi</command> editor:</para>
<table id="tab_05_01" frame="all"><title>Sed editing commands</title>
<tgroup cols="2" align="left" colsep="1" rowsep="1">
<thead>
<row><entry>Command</entry><entry>Result</entry></row>
</thead>
<tbody>
<row><entry>a\</entry><entry>Append text below current line.</entry></row>
<row><entry>c\</entry><entry>Change text in the current line with new text.</entry></row>
<row><entry>d</entry><entry>Delete text.</entry></row>
<row><entry>i\</entry><entry>Insert text above current line.</entry></row>
<row><entry>p</entry><entry>Print text.</entry></row>
<row><entry>r</entry><entry>Read a file.</entry></row>
<row><entry>s</entry><entry>Search and replace text.</entry></row>
<row><entry>w</entry><entry>Write to a file.</entry></row>
</tbody>
</tgroup>
</table>
<para>Apart from editing commands, you can give options to <command>sed</command>. An overview is in the table below:</para>
<table id="tab_05_02" frame="all"><title>Sed options</title>
<tgroup cols="2" align="left" colsep="1" rowsep="1">
<thead>
<row><entry>Option</entry><entry>Effect</entry></row>
</thead>
<tbody>
<row><entry><option>-e SCRIPT</option></entry><entry>Add the commands in SCRIPT to the set of commands to be run while processing the input.</entry></row>
<row><entry><option>-f</option></entry><entry>Add the commands contained in the file SCRIPT-FILE to the set of commands to be run while processing the input.</entry></row>
<row><entry><option>-n</option></entry><entry>Silent mode.</entry></row>
<row><entry><option>-V</option></entry><entry>Print version information and exit.</entry></row>
</tbody>
</tgroup>
</table>
<para>The <command>sed</command> info pages contain more information; we only list the most frequently used commands and options here.</para>
</sect2>
</sect1>
<sect1 id="sect_05_02"><title>Interactive editing</title>
<sect2 id="sect_05_02_01"><title>Printing lines containing a pattern</title>
<para>This is something you can do with <command>grep</command>, of course, but you can't do a <quote>find and replace</quote> using that command. This is just to get you started.</para>
<para>This is our example text file:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>cat <option>-n</option> <filename>example</filename></command>
1 This is the first line of an example text.
2 It is a text with erors.
3 Lots of erors.
4 So much erors, all these erors are making me sick.
5 This is a line not containing any errors.
6 This is the last line.
<prompt>sandy ~&gt;</prompt>
</screen>
<para>We want <command>sed</command> to find all the lines containing our search pattern, in this case <quote>erors</quote>. We use the <command>p</command> to obtain the result:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <parameter>'/erors/p'</parameter> <filename>example</filename></command>
This is the first line of an example text.
It is a text with erors.
It is a text with erors.
Lots of erors.
Lots of erors.
So much erors, all these erors are making me sick.
So much erors, all these erors are making me sick.
This is a line not containing any errors.
This is the last line.
<prompt>sandy ~&gt;</prompt>
</screen>
<para>As you notice, <command>sed</command> prints the entire file, but the lines containing the search string are printed twice. This is not what we want. In order to only print those lines matching our pattern, use the <option>-n</option> option:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <option>-n</option> <parameter>'/erors/p'</parameter> <filename>example</filename></command>
It is a text with erors.
Lots of erors.
So much erors, all these erors are making me sick.
<prompt>sandy ~&gt;</prompt>
</screen>
</sect2>
<sect2 id="sect_05_02_02"><title>Deleting lines of input containing a pattern</title>
<para>We use the same example text file. Now we only want to see the lines <emphasis>not</emphasis> containing the search string:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <parameter>'/erors/d'</parameter> <filename>example</filename></command>
This is the first line of an example text.
This is a line not containing any errors.
This is the last line.
<prompt>sandy ~&gt;</prompt>
</screen>
<para>The <command>d</command> command results in excluding lines from being displayed.</para>
<para>Matching lines starting with a given pattern and ending in a second pattern are showed like this:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <option>-n</option> <parameter>'/^This.*errors.$/p'</parameter> <filename>example</filename></command>
This is a line not containing any errors.
<prompt>sandy ~&gt;</prompt>
</screen>
</sect2>
<sect2 id="sect_05_02_03"><title>Ranges of lines</title>
<para>This time we want to take out the lines containing the errors. In the example these are lines 2 to 4. Specify this range to address, together with the <command>d</command> command:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <parameter>'2,4d'</parameter> <filename>example</filename></command>
This is the first line of an example text.
This is a line not containing any errors.
This is the last line.
<prompt>sandy ~&gt;</prompt>
</screen>
<para>To print the file starting from a certain line until the end of the file, use a command similar to this:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <parameter>'3,$d'</parameter> <filename>example</filename></command>
This is the first line of an example text.
It is a text with erors.
<prompt>sandy ~&gt;</prompt>
</screen>
<para>This only prints the first two lines of the example file.</para>
<para>The following command prints the first line containing the pattern <quote>a text</quote>, up to and including the next line containing the pattern <quote>a line</quote>:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <option>-n</option> <parameter>'/a text/,/This/p'</parameter> <filename>example</filename></command>
It is a text with erors.
Lots of erors.
So much erors, all these erors are making me sick.
This is a line not containing any errors.
<prompt>sandy ~&gt;</prompt>
</screen>
</sect2>
<sect2 id="sect_05_02_04"><title>Find and replace with sed</title>
<para>In the example file, we will now search and replace the errors instead of only (de)selecting the lines containing the search string.</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <parameter>'s/erors/errors/'</parameter> <filename>example</filename></command>
This is the first line of an example text.
It is a text with errors.
Lots of errors.
So much errors, all these erors are making me sick.
This is a line not containing any errors.
This is the last line.
<prompt>sandy ~&gt;</prompt>
</screen>
<para>As you can see, this is not exactly the desired effect: in line 4, only the first occurrence of the search string has been replaced, and there is still an 'eror' left. Use the <command>g</command> command to indicate to <command>sed</command> that it should examine the entire line instead of stopping at the first occurrence of your string:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <parameter>'s/erors/errors/g'</parameter> <filename>example</filename></command>
This is the first line of an example text.
It is a text with errors.
Lots of errors.
So much errors, all these errors are making me sick.
This is a line not containing any errors.
This is the last line.
<prompt>sandy ~&gt;</prompt>
</screen>
<para>To insert a string at the beginning of each line of a file, for instance for quoting:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <parameter>'s/^/&gt; /'</parameter> <filename>example</filename></command>
&gt; This is the first line of an example text.
&gt; It is a text with erors.
&gt; Lots of erors.
&gt; So much erors, all these erors are making me sick.
&gt; This is a line not containing any errors.
&gt; This is the last line.
<prompt>sandy ~&gt;</prompt>
</screen>
<para>Insert some string at the end of each line:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <parameter>'s/$/EOL/'</parameter> <filename>example</filename></command>
This is the first line of an example text.EOL
It is a text with erors.EOL
Lots of erors.EOL
So much erors, all these erors are making me sick.EOL
This is a line not containing any errors.EOL
This is the last line.EOL
<prompt>sandy ~&gt;</prompt>
</screen>
<para>Multiple find and replace commands are separated with individual <option>-e</option> options:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>sed <option>-e</option> <parameter>'s/erors/errors/g'</parameter> <option>-e</option> <parameter>'s/last/final/g'</parameter> <filename>example</filename></command>
This is the first line of an example text.
It is a text with errors.
Lots of errors.
So much errors, all these errors are making me sick.
This is a line not containing any errors.
This is the final line.
<prompt>sandy ~&gt;</prompt>
</screen>
<para>Keep in mind that by default <command>sed</command> prints its results to the standard output, most likely your terminal window. If you want to save the output to a file, redirect it:</para>
<cmdsynopsis><command>sed <option>option</option> <function>'some/expression'</function> <filename>file_to_process</filename> &gt; <filename>sed_output_in_a_file</filename></command></cmdsynopsis>
<tip><title>More examples</title>
<para>Plenty of <command>sed</command> examples can be found in the startup scripts for your machine, which are usually in <filename>/etc/init.d</filename> or <filename>/etc/rc.d/init.d</filename>. Change into the directory containing the initscripts on your system and issue the following command:</para>
<cmdsynopsis><command>grep <parameter>sed</parameter> <filename>*</filename></command></cmdsynopsis>
</tip>
</sect2>
</sect1>
<sect1 id="sect_05_03"><title>Non-interactive editing</title>
<sect2 id="sect_05_03_01"><title>Reading sed commands from a file</title>
<para>Multiple <command>sed</command> commands can be put in a file and executed using the <option>-f</option> option. When creating such a file, make sure that:</para>
<itemizedlist>
<listitem><para>No trailing white spaces exist at the end of lines.</para></listitem>
<listitem><para>No quotes are used.</para></listitem>
<listitem><para>When entering text to add or replace, all except the last line end in a backslash.</para></listitem>
</itemizedlist>
</sect2>
<sect2 id="sect_05_03_02"><title>Writing output files</title>
<para>Writing output is done using the output redirection operator <command>&gt;</command>. This is an example script used to create very simple HTML files from plain text files.</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>cat <filename>script.sed</filename></command>
1i\
&lt;html&gt;\
&lt;head&gt;&lt;title>sed generated html&lt;/title&gt;&lt;/head&gt;\
&lt;body bgcolor="#ffffff"&gt;\
&lt;pre&gt;
$a\
&lt;/pre&gt;\
&lt;/body&gt;\
&lt;/html&gt;
<prompt>sandy ~&gt;</prompt> <command>cat <filename>txt2html.sh</filename></command>
#!/bin/bash
# This is a simple script that you can use for converting text into HTML.
# First we take out all newline characters, so that the appending only happens
# once, then we replace the newlines.
echo "converting $1..."
SCRIPT="/home/sandy/scripts/script.sed"
NAME="$1"
TEMPFILE="/var/tmp/sed.$PID.tmp"
sed "s/\n/^M/" $1 | sed -f $SCRIPT | sed "s/^M/\n/" &gt; $TEMPFILE
mv $TEMPFILE $NAME
echo "done."
<prompt>sandy ~&gt;</prompt>
</screen>
<para><varname>$1</varname> holds the first argument to a given command, in this case the name of the file to convert:</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>cat <filename>test</filename></command>
line1
line2
line3
</screen>
<para>More on positional parameters in <xref linkend="chap_07" />.</para>
<screen>
<prompt>sandy ~&gt;</prompt> <command>txt2html.sh <filename>test</filename></command>
converting test...
done.
<prompt>sandy ~&gt;</prompt> <command>cat <filename>test</filename></command>
&lt;html&gt;
&lt;head&gt;&lt;title&gt;sed generated html&lt;/title&gt;&lt;/head&gt;
&lt;body bgcolor="#ffffff"&gt;
&lt;pre&gt;
line1
line2
line3
&lt;/pre&gt;
&lt;/body&gt;
&lt;/html&gt;
<prompt>sandy ~&gt;</prompt>
</screen>
<para>This is not really how it is done; this example just demonstrates <command>sed</command> capabilities. See <xref linkend="sect_06_03" /> for a more decent solution to this problem, using <command>awk</command> <emphasis>BEGIN</emphasis> and <emphasis>END</emphasis> constructs.</para>
<note><title>Easy sed</title>
<para>Advanced editors, supporting syntax highlighting, can recognize <command>sed</command> syntax. This can be a great help if you tend to forget backslashes and such.</para></note>
</sect2>
</sect1>
<sect1 id="sect_05_04"><title>Summary</title>
<para>The <command>sed</command> stream editor is a powerful command line tool, which can handle streams of data: it can take input lines from a pipe. This makes it fit for non-interactive use. The <command>sed</command> editor uses <command>vi</command>-like commands and accepts regular expressions.</para>
<para>The <command>sed</command> tool can read commands from the command line or from a script. It is often used to perform find-and-replace actions on lines containing a pattern.</para>
</sect1>
<sect1 id="sect_05_05"><title>Exercises</title>
<para>These exercises are meant to further demonstrate what <command>sed</command> can do.</para>
<orderedlist>
<listitem><para>Print a list of files in your <filename>scripts</filename> directory, ending in <quote>.sh</quote>. Mind that you might have to unalias <command>ls</command>. Put the result in a temporary file.</para></listitem>
<listitem><para>Make a list of files in <filename>/usr/bin</filename> that have the letter <quote>a</quote> as the second character. Put the result in a temporary file.</para></listitem>
<listitem><para>Delete the first 3 lines of each temporary file.</para></listitem>
<listitem><para>Print to standard output only the lines containing the pattern <quote>an</quote>.</para></listitem>
<listitem><para>Create a file holding <command>sed</command> commands to perform the previous two tasks. Add an extra command to this file that adds a string like <quote>*** This might have something to do with man and man pages ***</quote> in the line preceding every occurence of the string <quote>man</quote>. Check the results.</para></listitem>
<listitem><para>A long listing of the root directory, <filename>/</filename>, is used for input. Create a file holding <command>sed</command> commands that check for symbolic links and plain files. If a file is a symbolic link, precede it with a line like <quote>--This is a symlink--</quote>. If the file is a plain file, add a string on the same line, adding a comment like <quote>&lt;--- this is a plain file</quote>.</para></listitem>
<listitem><para>Create a script that shows lines containing trailing white spaces from a file. This script should use a <command>sed</command> script and show sensible information to the user.</para></listitem>
</orderedlist>
</sect1>
</chapter>