old-www/LDP/GNU-Linux-Tools-Summary/html/x7619.htm

864 lines
11 KiB
HTML

<HTML
><HEAD
><TITLE
>Text Conversion/Filter Tools</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
REL="HOME"
TITLE="GNU/Linux Command-Line Tools Summary"
HREF="book1.htm"><LINK
REL="UP"
TITLE="Text Related Tools"
HREF="c6435.htm"><LINK
REL="PREVIOUS"
TITLE="Text manipulation tools"
HREF="x6993.htm"><LINK
REL="NEXT"
TITLE="Finding Text Within Files"
HREF="x7969.htm"></HEAD
><BODY
CLASS="SECT1"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>GNU/Linux Command-Line Tools Summary</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="x6993.htm"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
>Chapter 11. Text Related Tools</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="x7969.htm"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="TEXT-FILTER-TOOLS"
></A
>Text Conversion/Filter Tools</H1
><P
></P
><DIV
CLASS="VARIABLELIST"
><DL
><DT
>Filters&nbsp;(UNIX&nbsp;System/dos&nbsp;formats)<A
NAME="TEXT-FILTERS-UNIX-DOS"
></A
></DT
><DD
><P
>The following <A
NAME="AEN7627"
></A
>filters <A
NAME="AEN7629"
></A
>allow you to change <A
NAME="AEN7631"
></A
>text from Dos-style <A
NAME="AEN7633"
></A
>to <SPAN
CLASS="PRODUCTNAME"
>UNIX</SPAN
> <A
NAME="AEN7636"
></A
>system style <A
NAME="AEN7638"
></A
>and vice-versa,<A
NAME="AEN7640"
></A
> or convert a file to other formats.<A
NAME="AEN7642"
></A
> Also note that many modern text editors can do this for you...</P
><P
></P
><DIV
CLASS="VARIABLELIST"
><DL
><DT
>Why&nbsp;use&nbsp;filters?</DT
><DD
><P
>Because <SPAN
CLASS="PRODUCTNAME"
>UNIX</SPAN
> <A
NAME="AEN7650"
></A
>systems and Microsoft <A
NAME="AEN7652"
></A
>use two different standards<A
NAME="AEN7654"
></A
> to represent <A
NAME="AEN7656"
></A
>the end-of-line <A
NAME="AEN7658"
></A
>in an <SPAN
CLASS="ACRONYM"
>ASCII</SPAN
> <A
NAME="AEN7661"
></A
>text file. </P
><P
>This can sometimes causes <A
NAME="AEN7664"
></A
>problems <A
NAME="AEN7666"
></A
>in editors <A
NAME="AEN7668"
></A
>or viewers <A
NAME="AEN7670"
></A
>which aren't familiar<A
NAME="AEN7672"
></A
> with the other operating <A
NAME="AEN7674"
></A
>systems end-of-line style. The following <A
NAME="AEN7676"
></A
>tools allow you to get around this difference.<A
NAME="AEN7678"
></A
></P
></DD
><DT
>Whats&nbsp;the&nbsp;difference?</DT
><DD
><P
>The difference <A
NAME="AEN7684"
></A
>is very simple, on a <SPAN
CLASS="PRODUCTNAME"
>Windows</SPAN
> <A
NAME="AEN7687"
></A
>text file, a newline <A
NAME="AEN7689"
></A
>is signalled <A
NAME="AEN7691"
></A
>by a carriage <A
NAME="AEN7693"
></A
>return <A
NAME="AEN7695"
></A
>followed by a newline, '\r\n' in <SPAN
CLASS="ACRONYM"
>ASCII</SPAN
>.<A
NAME="AEN7698"
></A
></P
><P
>On a <SPAN
CLASS="PRODUCTNAME"
>UNIX</SPAN
> system a newline <A
NAME="AEN7702"
></A
>is simply a newline, '\n' in <SPAN
CLASS="ACRONYM"
>ASCII</SPAN
>.<A
NAME="AEN7705"
></A
></P
></DD
></DL
></DIV
></DD
><DT
>dos2unix</DT
><DD
><P
><A
NAME="AEN7711"
></A
>This converts <A
NAME="AEN7713"
></A
>Microsoft-style end-of-line <A
NAME="AEN7715"
></A
>characters to <SPAN
CLASS="PRODUCTNAME"
>UNIX</SPAN
> system style end-of-line characters. </P
><P
>Simply type:</P
><PRE
CLASS="SCREEN"
>dos2unix file.txt</PRE
></DD
><DT
>fromdos</DT
><DD
><P
><A
NAME="AEN7724"
></A
>This does the same as<SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
> dos2unix <A
NAME="AEN7727"
></A
></I
></SPAN
>(above). </P
><P
>Simply type:</P
><PRE
CLASS="SCREEN"
>fromdos file.txt</PRE
><P
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>fromdos</I
></SPAN
> can be obtained from <A
HREF="http://www.thefreecountry.com/tofrodos/"
TARGET="_top"
>the from/to dos website.</A
><A
NAME="AEN7734"
></A
></P
></DD
><DT
>unix2dos</DT
><DD
><P
><A
NAME="AEN7740"
></A
>This converts <SPAN
CLASS="PRODUCTNAME"
>UNIX</SPAN
> system style end-of-line characters to Microsoft-style end-of-line <A
NAME="AEN7743"
></A
>characters. </P
><P
>Simply type:</P
><PRE
CLASS="SCREEN"
>unix2dos file.txt</PRE
></DD
><DT
>todos</DT
><DD
><P
><A
NAME="AEN7751"
></A
>This does the same as <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>unix2dos</I
></SPAN
> (above). </P
><P
>Simply type:</P
><PRE
CLASS="SCREEN"
>todos file.txt</PRE
><P
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>todos</I
></SPAN
> can be obtained from <A
HREF="http://www.thefreecountry.com/tofrodos/"
TARGET="_top"
>the from/to dos website.</A
><A
NAME="AEN7759"
></A
></P
></DD
><DT
>antiword</DT
><DD
><P
><A
NAME="AEN7765"
></A
>This filter <A
NAME="AEN7767"
></A
>converts <A
NAME="AEN7769"
></A
>Microsoft word documents into plain <A
NAME="AEN7771"
></A
>ASCII text <A
NAME="AEN7773"
></A
>documents.<A
NAME="AEN7775"
></A
> </P
><P
>Simply type:</P
><PRE
CLASS="SCREEN"
>antiword file.doc</PRE
><P
>You can get <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>antiword</I
></SPAN
> from <A
HREF="http://www.winfield.demon.nl/"
TARGET="_top"
>the antiword homepage.</A
><A
NAME="AEN7782"
></A
></P
></DD
><DT
>recode</DT
><DD
><P
><A
NAME="AEN7788"
></A
>Converts text files between various formats <A
NAME="AEN7790"
></A
>including HTML <A
NAME="AEN7792"
></A
>and dozens of different <A
NAME="AEN7794"
></A
>forms of text encodings.<A
NAME="AEN7796"
></A
> </P
><P
>Use<SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
> recode -l<A
NAME="AEN7800"
></A
></I
></SPAN
> for a full <A
NAME="AEN7802"
></A
>listing.<A
NAME="AEN7804"
></A
> It can also be used to convert text to and from <SPAN
CLASS="PRODUCTNAME"
>Windows</SPAN
> <A
NAME="AEN7807"
></A
>and <SPAN
CLASS="PRODUCTNAME"
>UNIX</SPAN
> <A
NAME="AEN7810"
></A
>system formats <A
NAME="AEN7812"
></A
>(so you don't get the weird symbols). </P
><DIV
CLASS="CAUTION"
><P
></P
><TABLE
CLASS="CAUTION"
BORDER="1"
WIDTH="90%"
><TR
><TD
ALIGN="CENTER"
><B
>Warning</B
></TD
></TR
><TR
><TD
ALIGN="LEFT"
><P
>
By default recode overwrites the input file, use '&#60;' to use recode as a filter only (and to not overwrite the file).
</P
></TD
></TR
></TABLE
></DIV
><P
></P
><DIV
CLASS="VARIABLELIST"
><DL
><DT
>Examples:</DT
><DD
><P
>&nbsp;</P
></DD
></DL
></DIV
><P
>UNIX system text to <SPAN
CLASS="PRODUCTNAME"
>Windows</SPAN
> text:</P
><PRE
CLASS="SCREEN"
>recode ..pc file_name</PRE
><P
>Windows text to <SPAN
CLASS="PRODUCTNAME"
>UNIX</SPAN
> system text:</P
><PRE
CLASS="SCREEN"
>recode ..pc/ file_name</PRE
><P
>UNIX system text to <SPAN
CLASS="PRODUCTNAME"
>Windows</SPAN
> text without overwriting <A
NAME="AEN7830"
></A
>the original <A
NAME="AEN7832"
></A
>file (and creating a new <A
NAME="AEN7834"
></A
>output file):</P
><PRE
CLASS="SCREEN"
>recode ..pc &#60; file_name &#62; recoded_file</PRE
></DD
><DT
>tr</DT
><DD
><P
><A
NAME="AEN7841"
></A
>(Windows to <SPAN
CLASS="PRODUCTNAME"
>UNIX</SPAN
> <A
NAME="AEN7844"
></A
>system style conversion <A
NAME="AEN7846"
></A
>only). While <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>tr</I
></SPAN
> is not specifically <A
NAME="AEN7849"
></A
>designed to convert files from Windows-format <A
NAME="AEN7851"
></A
>to UNIX<A
NAME="AEN7853"
></A
> system format <A
NAME="AEN7855"
></A
>by doing:</P
><PRE
CLASS="SCREEN"
>tr -d '\r' &#60; inputFile.txt &#62; outputFile.txt</PRE
><P
>The -d <A
NAME="AEN7859"
></A
>switch means to simply delete <A
NAME="AEN7861"
></A
>any occurances <A
NAME="AEN7863"
></A
>of the string. Since we are looking <A
NAME="AEN7865"
></A
>for '\r'<A
NAME="AEN7867"
></A
>, carriage returns <A
NAME="AEN7869"
></A
>it will remove <A
NAME="AEN7871"
></A
>any it finds,<A
NAME="AEN7873"
></A
> making the file a UNIX<A
NAME="AEN7875"
></A
> system text file. You can read more about <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>tr</I
></SPAN
> over here, <A
HREF="x6993.htm"
>the Section called <I
>Text manipulation tools</I
></A
>.</P
></DD
></DL
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="TEXT-CONVERSION-TOOLS"
></A
>Conversion tools</H2
><P
></P
><DIV
CLASS="VARIABLELIST"
><DL
><DT
>enscript</DT
><DD
><P
><A
NAME="AEN7886"
></A
>Converts text files to postscript,<A
NAME="AEN7888"
></A
> rtf,<A
NAME="AEN7890"
></A
> HTML <A
NAME="AEN7892"
></A
>(use <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>ghostview</I
></SPAN
> to view <A
NAME="AEN7895"
></A
>the postscript <A
NAME="AEN7897"
></A
>file). <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>enscript</I
></SPAN
> has a large number of options <A
NAME="AEN7900"
></A
>which can be used to customise <A
NAME="AEN7902"
></A
>the output.</P
><P
>Examples:<A
NAME="AEN7905"
HREF="#FTN.AEN7905"
><SPAN
CLASS="footnote"
>[1]</SPAN
></A
></P
><PRE
CLASS="SCREEN"
>enscript --language=html input_file.txt -o output_file.html</PRE
><P
>This will take some file and output it as a html <A
NAME="AEN7910"
></A
>file.</P
><PRE
CLASS="SCREEN"
>enscript --help-highlight</PRE
><P
>Display help on using the highlight <A
NAME="AEN7914"
></A
>feature (list all different types <A
NAME="AEN7916"
></A
>of highlighting <A
NAME="AEN7918"
></A
>available)</P
><PRE
CLASS="SCREEN"
>enscript --help-highlight</PRE
><P
>Highlight <A
NAME="AEN7922"
></A
>(pretty print), example:</P
><PRE
CLASS="SCREEN"
>enscript -E --color --language=html --toc --output=foo.html *.h *.c </PRE
><P
>Add <A
NAME="AEN7926"
></A
>all the files with a .h <A
NAME="AEN7928"
></A
>and a .c (C source <A
NAME="AEN7930"
></A
>and header <A
NAME="AEN7932"
></A
>files) into a file called foo.html, use colour <A
NAME="AEN7934"
></A
>and add <A
NAME="AEN7936"
></A
>a table <A
NAME="AEN7938"
></A
>of contents<A
NAME="AEN7940"
></A
></P
><P
>For further options refer to the well written manual page <A
NAME="AEN7943"
></A
>of enscript.<A
NAME="AEN7945"
></A
></P
></DD
><DT
>figlet</DT
><DD
><P
><A
NAME="AEN7951"
></A
>Used to create <SPAN
CLASS="ACRONYM"
>ASCII</SPAN
> <A
NAME="AEN7954"
></A
>&ldquo;art&rdquo;. Figlet can create several <A
NAME="AEN7956"
></A
>different forms (fonts) of <SPAN
CLASS="ACRONYM"
>ASCII</SPAN
> <A
NAME="AEN7959"
></A
>art,<A
NAME="AEN7961"
></A
> its one of the more <A
NAME="AEN7963"
></A
>unusual <A
NAME="AEN7965"
></A
>programs <A
NAME="AEN7967"
></A
>around.</P
></DD
></DL
></DIV
></DIV
></DIV
><H3
CLASS="FOOTNOTES"
>Notes</H3
><TABLE
BORDER="0"
CLASS="FOOTNOTES"
WIDTH="100%"
><TR
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="5%"
><A
NAME="FTN.AEN7905"
HREF="x7619.htm#AEN7905"
><SPAN
CLASS="footnote"
>[1]</SPAN
></A
></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="95%"
><P
>These examples are based off information from the enscript manual page, see [12] in the <A
HREF="b12722.htm"
><I
>Bibliography</I
></A
> for further information. </P
></TD
></TR
></TABLE
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="x6993.htm"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="book1.htm"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="x7969.htm"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Text manipulation tools</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="c6435.htm"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Finding Text Within Files</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>