226 lines
5.3 KiB
HTML
226 lines
5.3 KiB
HTML
<HTML
|
|
><HEAD
|
|
><TITLE
|
|
>Hardware</TITLE
|
|
><META
|
|
NAME="GENERATOR"
|
|
CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+
|
|
"><LINK
|
|
REL="HOME"
|
|
TITLE="Speech Recognition HOWTO"
|
|
HREF="index.html"><LINK
|
|
REL="PREVIOUS"
|
|
TITLE="Introduction"
|
|
HREF="introduction.html"><LINK
|
|
REL="NEXT"
|
|
TITLE="Speech Recognition Software"
|
|
HREF="software.html"></HEAD
|
|
><BODY
|
|
CLASS="SECT1"
|
|
BGCOLOR="#FFFFFF"
|
|
TEXT="#000000"
|
|
LINK="#0000FF"
|
|
VLINK="#840084"
|
|
ALINK="#0000FF"
|
|
><DIV
|
|
CLASS="NAVHEADER"
|
|
><TABLE
|
|
SUMMARY="Header navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TH
|
|
COLSPAN="3"
|
|
ALIGN="center"
|
|
>Speech Recognition HOWTO</TH
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="left"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="introduction.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="80%"
|
|
ALIGN="center"
|
|
VALIGN="bottom"
|
|
></TD
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="right"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="software.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"></DIV
|
|
><DIV
|
|
CLASS="SECT1"
|
|
><H1
|
|
CLASS="SECT1"
|
|
><A
|
|
NAME="HARDWARE">4. Hardware</H1
|
|
><DIV
|
|
CLASS="SECT2"
|
|
><H2
|
|
CLASS="SECT2"
|
|
><A
|
|
NAME="SOUNDCARDS">4.1. Sound Cards</H2
|
|
><P
|
|
>
|
|
Because speech requires a relatively low bandwidth, just about any
|
|
medium-high quality 16 bit sound card will get the job done. You must
|
|
have sound enabled in your kernel, and you must have correct drivers
|
|
installed. For more information on sound cards, please see "The Linux
|
|
Sound HOWTO" available at: http://www.LinuxDoc.org/. Sound card
|
|
quality often starts a heated discussion about their impact on accuracy
|
|
and noise. </P
|
|
><P
|
|
>Sound cards with the 'cleanest' A/D (analog to digital) conversions
|
|
are recommended, but most often the clarity of the digital sample is
|
|
more dependent on the microphone quality and even more dependent on the
|
|
environmental noise. Electrical "noise" from monitors, pci slots,
|
|
hard-drives, etc. are usually nothing compared to audible noise
|
|
from the computer fans, squeaking chairs, or heavy breathing.</P
|
|
><P
|
|
>Some ASR software packages may require a specific sound card. It's
|
|
usually a good idea to stay away from specific hardware requirements,
|
|
because it limits many of your possible future options and decisions.
|
|
You'll have to weigh the benefits and costs if you are considering
|
|
packages that require specific hardware to function properly.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT2"
|
|
><H2
|
|
CLASS="SECT2"
|
|
><A
|
|
NAME="MICROPHONES">4.2. Microphones</H2
|
|
><P
|
|
>A quality microphone is key when utilizing ASR. In most cases, a
|
|
desktop microphone just won't do the job. They tend to pick up more
|
|
ambient noise that gives ASR programs a hard time.</P
|
|
><P
|
|
>Hand held microphones are also not the best choice as they can be
|
|
cumbersome to pick up all the time. While they do limit the amount
|
|
of ambient noise, they are most useful in applications that require
|
|
changing speakers often, or when speaking to the recognizer isn't
|
|
done frequently (when wearing a headset isn't an option).</P
|
|
><P
|
|
>
|
|
The best choice, and by far the most common is the headset style.
|
|
It allows the ambient noise to be minimized, while allowing you to
|
|
have the microphone at the tip of your tongue all the time. Headsets
|
|
are available without earphones and with earphones (mono or stereo).
|
|
I recommend the stereo headphones, but it's just a matter of personal
|
|
taste.</P
|
|
><P
|
|
>You can get excellent quality microphone headsets for between $25
|
|
$100. A good place to start looking is http://www.headphones.com or
|
|
http://www.speechcontrol.com.</P
|
|
><P
|
|
>
|
|
A quick note about levels: Don't forget to turn up your microphone
|
|
volume. This can be done with a program such as XMixer or OSS Mixer
|
|
and care should be used to avoid feedback noise. If the ASR software
|
|
includes auto-adjustment programs, use them instead, as they are
|
|
optimized for their particular recognition system.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT2"
|
|
><H2
|
|
CLASS="SECT2"
|
|
><A
|
|
NAME="COMPUTERS">4.3. Computers/Processors</H2
|
|
><P
|
|
>ASR applications can be heavily dependent on processing speed. This
|
|
is because a large amount of digital filtering and signal processing
|
|
can take place in ASR.</P
|
|
><P
|
|
>As with just about any cpu intensive software, the faster the better.
|
|
Also, the more memory the better. It's possible to do some SR with 100MHz
|
|
and 16M RAM, but for fast processing (large dictionaries, complex
|
|
recognition schemes, or high sample rates), you should shoot for a
|
|
minimum of a 400MHz and 128M RAM. Because of the processing required,
|
|
most software packages list their minimum requirements.</P
|
|
><P
|
|
>Using a cluster (Beowulf or otherwise) to perform massive recognition
|
|
efforts hasn't yet been undertaken. If you know of any project underway,
|
|
or in development please send me a note! <A
|
|
HREF="mailto:scook@gear21.com"
|
|
TARGET="_top"
|
|
>scook@gear21.com</A
|
|
></P
|
|
></DIV
|
|
></DIV
|
|
><DIV
|
|
CLASS="NAVFOOTER"
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"><TABLE
|
|
SUMMARY="Footer navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="introduction.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="index.html"
|
|
ACCESSKEY="H"
|
|
>Home</A
|
|
></TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="software.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
>Introduction</TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
> </TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
>Speech Recognition Software</TD
|
|
></TR
|
|
></TABLE
|
|
></DIV
|
|
></BODY
|
|
></HTML
|
|
> |