old-www/HOWTO/Speech-Recognition-HOWTO/hardware.html

226 lines
5.3 KiB
HTML

<HTML
><HEAD
><TITLE
>Hardware</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+
"><LINK
REL="HOME"
TITLE="Speech Recognition HOWTO"
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="Introduction"
HREF="introduction.html"><LINK
REL="NEXT"
TITLE="Speech Recognition Software"
HREF="software.html"></HEAD
><BODY
CLASS="SECT1"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Speech Recognition HOWTO</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="introduction.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="software.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="HARDWARE">4. Hardware</H1
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="SOUNDCARDS">4.1. Sound Cards</H2
><P
>
Because speech requires a relatively low bandwidth, just about any
medium-high quality 16 bit sound card will get the job done. You must
have sound enabled in your kernel, and you must have correct drivers
installed. For more information on sound cards, please see "The Linux
Sound HOWTO" available at: http://www.LinuxDoc.org/. Sound card
quality often starts a heated discussion about their impact on accuracy
and noise. </P
><P
>Sound cards with the 'cleanest' A/D (analog to digital) conversions
are recommended, but most often the clarity of the digital sample is
more dependent on the microphone quality and even more dependent on the
environmental noise. Electrical "noise" from monitors, pci slots,
hard-drives, etc. are usually nothing compared to audible noise
from the computer fans, squeaking chairs, or heavy breathing.</P
><P
>Some ASR software packages may require a specific sound card. It's
usually a good idea to stay away from specific hardware requirements,
because it limits many of your possible future options and decisions.
You'll have to weigh the benefits and costs if you are considering
packages that require specific hardware to function properly.</P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="MICROPHONES">4.2. Microphones</H2
><P
>A quality microphone is key when utilizing ASR. In most cases, a
desktop microphone just won't do the job. They tend to pick up more
ambient noise that gives ASR programs a hard time.</P
><P
>Hand held microphones are also not the best choice as they can be
cumbersome to pick up all the time. While they do limit the amount
of ambient noise, they are most useful in applications that require
changing speakers often, or when speaking to the recognizer isn't
done frequently (when wearing a headset isn't an option).</P
><P
>
The best choice, and by far the most common is the headset style.
It allows the ambient noise to be minimized, while allowing you to
have the microphone at the tip of your tongue all the time. Headsets
are available without earphones and with earphones (mono or stereo).
I recommend the stereo headphones, but it's just a matter of personal
taste.</P
><P
>You can get excellent quality microphone headsets for between $25
$100. A good place to start looking is http://www.headphones.com or
http://www.speechcontrol.com.</P
><P
>
A quick note about levels: Don't forget to turn up your microphone
volume. This can be done with a program such as XMixer or OSS Mixer
and care should be used to avoid feedback noise. If the ASR software
includes auto-adjustment programs, use them instead, as they are
optimized for their particular recognition system.</P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="COMPUTERS">4.3. Computers/Processors</H2
><P
>ASR applications can be heavily dependent on processing speed. This
is because a large amount of digital filtering and signal processing
can take place in ASR.</P
><P
>As with just about any cpu intensive software, the faster the better.
Also, the more memory the better. It's possible to do some SR with 100MHz
and 16M RAM, but for fast processing (large dictionaries, complex
recognition schemes, or high sample rates), you should shoot for a
minimum of a 400MHz and 128M RAM. Because of the processing required,
most software packages list their minimum requirements.</P
><P
>Using a cluster (Beowulf or otherwise) to perform massive recognition
efforts hasn't yet been undertaken. If you know of any project underway,
or in development please send me a note! <A
HREF="mailto:scook@gear21.com"
TARGET="_top"
>scook@gear21.com</A
></P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="introduction.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="software.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Introduction</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Speech Recognition Software</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>