old-www/LDP/GNU-Linux-Tools-Summary/html/text-manipulation-tools.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML
><HEAD
><TITLE
>Text manipulation tools</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
REL="HOME"
TITLE="GNU/Linux Command-Line Tools Summary"
HREF="index.html"><LINK
REL="UP"
TITLE="Text Related Tools"
HREF="text-related-tools.html"><LINK
REL="PREVIOUS"
TITLE="Text Information Tools"
HREF="text-information-tools.html"><LINK
REL="NEXT"
TITLE="Text Conversion/Filter Tools"
HREF="text-filter-tools.html"></HEAD
><BODY
CLASS="SECT1"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>GNU/Linux Command-Line Tools Summary</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="text-information-tools.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
>Chapter 11. Text Related Tools</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="text-filter-tools.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="TEXT-MANIPULATION-TOOLS"
></A
>11.4. Text manipulation tools</H1
><DIV
CLASS="TIP"
><P
></P
><TABLE
CLASS="TIP"
WIDTH="100%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/tip.gif"
HSPACE="5"
ALT="Tip"></TD
><TH
ALIGN="LEFT"
VALIGN="CENTER"
><B
>Also see</B
></TH
></TR
><TR
><TD
>&nbsp;</TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>Also see <EM
>tac</EM
>, and <EM
>cat</EM
> over in this section, <A
HREF="text-viewing-tools.html"
>Section 11.2</A
>, as they can perform text manipulation too</P
></TD
></TR
></TABLE
></DIV
><P
></P
><DIV
CLASS="VARIABLELIST"
><DL
><DT
>sort</DT
><DD
><P
>Sorting text with no options the sort is alphabetical. Can be run on text files to sort them alphabetically (note it also concatenates files), can also be used with a pipe '|' to sort the output of a command. </P
><P
>Use<EM
> sort -r</EM
> to reverse the sort output, use the<EM
> -g </EM
> option to sort 'numerically' (ie read the entire number, not just the first digit).</P
><P
>Examples:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>cat shoppinglist.txt | sort</PRE
></FONT
></TD
></TR
></TABLE
><P
>The above command would run <EM
>cat</EM
> on the shopping list then sort the results and display them in alphabetical order.</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>sort -r shoppinglist.txt</PRE
></FONT
></TD
></TR
></TABLE
><P
>The above command would run <EM
>sort</EM
> on a file and <EM
>sort</EM
> the file in reverse alphabetical order. </P
><P
>Advanced sort commands: </P
><P
><EM
>sort</EM
> is a powerful utility, here are some of the more hard to learn (and lesser used) commands. Use the <EM
>-t</EM
> option to use a particular symbol as the separator then use the <EM
>-k</EM
> option to specify which column you would like to sort by, where column 1 is the first column <EM
>before</EM
> the separator. Also use the <EM
>-g</EM
> option if numeric sorting is not working correctly (without the -g option sort just looks at the first digit of the number). Here is a complex example:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>sort -t : -k 4 -k 1 -g /etc/passwd | more </PRE
></FONT
></TD
></TR
></TABLE
><P
>This will sort the &#8220;/etc/passwd&#8221; file, using the colon ':' as the separator. It will sort via the 4th column (GID section, in the file) and then sort within that sort using the first (name) to resolve any ties. The <EM
>-g</EM
> is there so it sorts via full numbers, otherwise it will have 4000 before 50 (it will just look at the first digit...).</P
><P
></P
></DD
><DT
>join</DT
><DD
><P
>Will put two lines together assuming they share at least one common value on the relevant line. It won't print lines if they don't have a common value. </P
><P
>Command syntax:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>join file1 file2</PRE
></FONT
></TD
></TR
></TABLE
></DD
><DT
>cut</DT
><DD
><P
>Prints selected parts of lines (of a text file), or, in other words, removes certain sections of a line. You may wish to remove things according to tabs or commas, or anything else you can think of... </P
><P
>Options for <EM
>cut:</EM
></P
><P
></P
><UL
><LI
><P
>-d --- allows you to specify another delimiter, for example ':' is often used with /etc/passwd:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>cut -d ':' (and probably some more options here) /etc/passwd</PRE
></FONT
></TD
></TR
></TABLE
></LI
><LI
><P
><EM
>-f </EM
>--- this option works with the text by columns, separated according to the delimiter. For example if your file had lines like &#8220;result,somethingelse,somethingelse&#8221; and you only wanted result you would use:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>cut -d ',' -f 1 /etc/passwd </PRE
></FONT
></TD
></TR
></TABLE
><P
>This would get you only the usernames in /etc/passwd</P
></LI
><LI
><P
>&#8220;,&#8221; (commas) --- used to separate numbers, these allow you to cut particular columns. For example:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>cut -d ':' -f 1,7 /etc/passwd</PRE
></FONT
></TD
></TR
></TABLE
><P
>This would only show the username and the shell that each person is setup for in /etc/passwd.</P
></LI
><LI
><P
>&#8220;-&#8221; (hyphen) --- used to show from line x to line y, for example 1-4, (would be from lines 1 to line 4).</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>cut -c 1-50 file1.txt</PRE
></FONT
></TD
></TR
></TABLE
><P
>This would cut (display) characters (columns) 1 to 50 of each line (and anything else on that line is ignored)</P
></LI
><LI
><P
>-x --- where x is a number, to cut from line 1 to &#8220;x&#8221;</P
></LI
><LI
><P
>x- --- where x is a number, to cut from &#8220;x&#8221; to the end.</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>cut -5, 20-, 8 file2.txt</PRE
></FONT
></TD
></TR
></TABLE
><P
>This would display (&#8220;cut&#8221;) characters (columns) 1 to 5, 8 and from 20 to the end.</P
></LI
></UL
></DD
><DT
>ispell/aspell</DT
><DD
><P
>To spell check a file interactively, prompts for you to replace word or continue. <EM
>aspell</EM
> is said to be better at suggesting replacement words, but its probably best to find out for yourself.</P
><P
><EM
>aspell</EM
> example:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>aspell -c FILE.txt</PRE
></FONT
></TD
></TR
></TABLE
><P
>This will run <EM
>aspell</EM
> on a particular file called &#8220;FILE.txt&#8221;, <EM
>aspell</EM
> will run interactively and prompt for user input.</P
><P
><EM
>ispell</EM
> example:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>ispell FILE.txt</PRE
></FONT
></TD
></TR
></TABLE
><P
>This will run <EM
>ispell</EM
> on a particular file called &#8220;FILE.txt&#8221; <EM
>ispell</EM
> will run interactively and prompt for user input.</P
></DD
><DT
>chcase</DT
><DD
><P
>Is used to change the uppercase letters in a file name to lowercase (or vice versa).</P
><P
>You could also use <EM
>tr</EM
> to do the same thing... </P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>cat fileName.txt | tr '[A-Z]' '[a-z]'  &#62; newFileName.txt</PRE
></FONT
></TD
></TR
></TABLE
><P
>The above would convert uppercase to lowercase using the the file &#8220;fileName.txt&#8221; as input and outputting the results to &#8220;newFileName.txt&#8221;.</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>cat fileName.txt | tr '[a-z]' '[A-Z]' &#62; newFileName.txt</PRE
></FONT
></TD
></TR
></TABLE
><P
>The above would convert lowercase to uppercase using the the file &#8220;fileName.txt&#8221; as input and outputting the results to &#8220;newFileName.txt&#8221;.</P
><P
><EM
>chcase</EM
> (a perl script) can be found at the <A
HREF="http://www.blemished.net/chcase.html"
TARGET="_top"
>chcase homepage.</A
></P
><P
></P
></DD
><DT
>fmt</DT
><DD
><P
>(format) a simple text formatter. Use<EM
> fmt </EM
>with the <EM
>-u</EM
> option to output text with "uniform spacing", where the space between words is reduced to one space character and the space between sentences is reduced to two space characters. </P
><P
>Example:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>fmt -u myessay.txt</PRE
></FONT
></TD
></TR
></TABLE
><P
>Will make sure the amount of space    between sentences is two spaces and the amount of space between words is one space.</P
></DD
><DT
>paste</DT
><DD
><P
>Puts lines from two files together, either lines of each file side by side (normally separated by a tab-stop but you can have any symbols(s) you like...) or it can have words from each file (the first file then the second file) side by side.</P
><P
>To obtain a list of lines side by side, the first lines from the first file on the left side separated by a tab-stop then the first lines from the second file. You would type:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>paste file1.txt file2.txt</PRE
></FONT
></TD
></TR
></TABLE
><P
>To have the list displayed in serial, first line from first file, [Tab], second line from first file, then third and fourth until the end of the first file type:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>paste --serial file1.txt file2.txt</PRE
></FONT
></TD
></TR
></TABLE
><DIV
CLASS="TIP"
><P
></P
><TABLE
CLASS="TIP"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/tip.gif"
HSPACE="5"
ALT="Tip"></TD
><TH
ALIGN="LEFT"
VALIGN="CENTER"
><B
>This command is very simple to understand if you make yourself an example</B
></TH
></TR
><TR
><TD
>&nbsp;</TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>Its much easier if you create an example for yourself. With just a couple of lines, I used "first line first file" and "first line second file" et cetera for a quick example.</P
></TD
></TR
></TABLE
></DIV
></DD
><DT
>expand</DT
><DD
><P
>Will convert tabs to spaces and output it. Use the option<EM
> -t num</EM
> to specify the size of a &#8220;tapstop&#8221;, the number of characters between each tab.</P
><P
>Command syntax:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>expand file_name.txt</PRE
></FONT
></TD
></TR
></TABLE
></DD
></DL
></DIV
><P
></P
><P
></P
><DIV
CLASS="VARIABLELIST"
><DL
><DT
>unexpand</DT
><DD
><P
>Will convert spaces to tabs and output it.</P
><P
>Command syntax:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>unexpand file_name.txt</PRE
></FONT
></TD
></TR
></TABLE
></DD
><DT
>uniq</DT
><DD
><P
>Eliminates duplicate entries from a file and it sometimes greatly simplifies the display. </P
><P
><EM
>uniq</EM
> options: </P
><P
></P
><UL
><LI
><P
><EM
>-c </EM
> --- count the number of occurances of each duplicate<EM
> </EM
></P
></LI
><LI
><P
><EM
>-u </EM
>--- list only unique entries </P
></LI
><LI
><P
><EM
>-d </EM
>--- list only duplicate entries</P
></LI
></UL
><P
>For example:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>uniq -cd phone_list.txt</PRE
></FONT
></TD
></TR
></TABLE
><P
>This would display any duplicate entries only and a count of the number of times that entry has appeared.</P
></DD
><DT
>tr</DT
><DD
><P
>(translation). A filter useful to replace all instances of characters in a text file or "squeeze" the whitespace.</P
><P
>Example:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>cat some_file | tr '3' '5' &#62; new_file</PRE
></FONT
></TD
></TR
></TABLE
><P
>This will run the <EM
>cat</EM
> program on some file, the output of this command will be sent to the <EM
>tr</EM
> command, <EM
>tr</EM
> will replace all the instances of 3 with 5, like a search and replace. You can also do other things such as:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>cat some_file | tr '[A-Z]' '[a-z]' &#62; new_file</PRE
></FONT
></TD
></TR
></TABLE
><P
>This will run <EM
>cat</EM
> on some_file and convert any capital letters to lowercase letters (you could use this to change the case of file names too...).</P
><DIV
CLASS="TIP"
><P
></P
><TABLE
CLASS="TIP"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/tip.gif"
HSPACE="5"
ALT="Tip"></TD
><TH
ALIGN="LEFT"
VALIGN="CENTER"
><B
>Alternatives</B
></TH
></TR
><TR
><TD
>&nbsp;</TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>You can also do a search and replace with a one line <SPAN
CLASS="APPLICATION"
>Perl</SPAN
> command, read about it at the end of this section.</P
></TD
></TR
></TABLE
></DIV
></DD
><DT
>nl</DT
><DD
><P
>The number lines tool, it's default action is to write it's input (either the file names given as an argument, or the standard input) to the standard output. </P
><P
>Line numbers are added to every line and the text is indented. </P
><P
>This command can do take some more advanced numbering options, simply read the info page on it. </P
><P
>These advanced options mainly relate to customisation of the numbering, including different forms of separation for sections/pages/footers etc.</P
><P
>Also try <EM
> cat -n</EM
> (number all lines) or<EM
> cat -b</EM
> (number all non-blank lines). For more info on <EM
>cat</EM
> check under this section: <A
HREF="text-viewing-tools.html"
>Section 11.2</A
> </P
><P
>There are two ways you can use <EM
>nl</EM
>:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>nl some_text_file.txt</PRE
></FONT
></TD
></TR
></TABLE
><P
>The above command would add numbers to each line of some_text_file. You could use <EM
>nl</EM
> to number the output of something as shown in the example below;</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>grep some_string some_file | nl</PRE
></FONT
></TD
></TR
></TABLE
></DD
><DT
>Perl<EFBFBD>search<EFBFBD>and<EFBFBD>replace<EFBFBD>text</DT
><DD
><P
>To search and replace text in a file is to use the following one-line Perl command<A
NAME="AEN7578"
HREF="#FTN.AEN7578"
><SPAN
CLASS="footnote"
>[1]</SPAN
></A
>:</P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>$ perl -pi -e "s/oldstring/newstring/g;" filespec [RET]</PRE
></FONT
></TD
></TR
></TABLE
><P
>In this example, &#8220;oldstring&#8221;<EM
> </EM
>is the string to search, &#8220;newstring<EM
>&#8221;</EM
> is the string to replace it with, and &#8220;filespe<EM
>c</EM
>&#8221; is the name of the file or files to work on. You can use this for more than one file. </P
><P
>Example: To replace the string  &#8220;helpless&#8221; with the string  &#8220;helpful&#8221; in all files in the current directory, type: </P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
>$ perl -pi -e "s/helpless/helpful/g;" * [RET]</PRE
></FONT
></TD
></TR
></TABLE
><P
>Also try using <EM
>tr</EM
> to do the same thing (see further above in this section).</P
></DD
></DL
></DIV
><DIV
CLASS="TIP"
><P
></P
><TABLE
CLASS="TIP"
WIDTH="100%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/tip.gif"
HSPACE="5"
ALT="Tip"></TD
><TH
ALIGN="LEFT"
VALIGN="CENTER"
><B
>If these tools are too primitive</B
></TH
></TR
><TR
><TD
>&nbsp;</TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>If these text tools are too simple for your purposes then you are probably looking at doing some programming or scripting.</P
><P
>If you would like more information on bash scripting then please see the <A
HREF="http://www.tldp.org/LDP/abs/html/"
TARGET="_top"
>advanced bash scripting guide</A
>, authored by Mendel Cooper.</P
><P
>sed and awk are traditional <SPAN
CLASS="PRODUCTNAME"
>UNIX</SPAN
> system tools for working with text, this guide does not provide an explanation of them. sed works on a line-by-line basis performing substitution and awk can perform a similar task or assist by working on a file and printing out certain information (its a programming language).</P
><P
>You will normally find them installed on your GNU/Linux system and will find many tutorials all over the internet, feel free to look them up if you ever have to perform many similar operations on a text file.</P
></TD
></TR
></TABLE
></DIV
></DIV
><H3
CLASS="FOOTNOTES"
>Notes</H3
><TABLE
BORDER="0"
CLASS="FOOTNOTES"
WIDTH="100%"
><TR
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="5%"
><A
NAME="FTN.AEN7578"
HREF="text-manipulation-tools.html#AEN7578"
><SPAN
CLASS="footnote"
>[1]</SPAN
></A
></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="95%"
><P
>This information has been taken from the <SPAN
CLASS="PRODUCTNAME"
>Linux</SPAN
> Cookbook (without editing). See [3] in the <A
HREF="references.html"
><I
>Bibliography</I
></A
> for further information.</P
></TD
></TR
></TABLE
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="text-information-tools.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="text-filter-tools.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Text Information Tools</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="text-related-tools.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Text Conversion/Filter Tools</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>