506 lines
13 KiB
HTML
506 lines
13 KiB
HTML
<HTML
|
|
><HEAD
|
|
><TITLE
|
|
>Speech Recognition Software</TITLE
|
|
><META
|
|
NAME="GENERATOR"
|
|
CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+
|
|
"><LINK
|
|
REL="HOME"
|
|
TITLE="Speech Recognition HOWTO"
|
|
HREF="index.html"><LINK
|
|
REL="PREVIOUS"
|
|
TITLE="Hardware"
|
|
HREF="hardware.html"><LINK
|
|
REL="NEXT"
|
|
TITLE="Inside Speech Recognition"
|
|
HREF="inside.html"></HEAD
|
|
><BODY
|
|
CLASS="SECT1"
|
|
BGCOLOR="#FFFFFF"
|
|
TEXT="#000000"
|
|
LINK="#0000FF"
|
|
VLINK="#840084"
|
|
ALINK="#0000FF"
|
|
><DIV
|
|
CLASS="NAVHEADER"
|
|
><TABLE
|
|
SUMMARY="Header navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TH
|
|
COLSPAN="3"
|
|
ALIGN="center"
|
|
>Speech Recognition HOWTO</TH
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="left"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="hardware.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="80%"
|
|
ALIGN="center"
|
|
VALIGN="bottom"
|
|
></TD
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="right"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="inside.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"></DIV
|
|
><DIV
|
|
CLASS="SECT1"
|
|
><H1
|
|
CLASS="SECT1"
|
|
><A
|
|
NAME="SOFTWARE">5. Speech Recognition Software</H1
|
|
><DIV
|
|
CLASS="SECT2"
|
|
><H2
|
|
CLASS="SECT2"
|
|
><A
|
|
NAME="FREESOFTWARE">5.1. Free Software</H2
|
|
><P
|
|
>Much of the free software listed here is available for download at:
|
|
http://sunsite.uio.no/pub/Linux/sound/apps/speech/
|
|
</P
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="XVOICE">5.1.1. XVoice</H3
|
|
><P
|
|
>XVoice is a dictation/continuous speech recognizer that can be used
|
|
with a variety of XWindow applications. It allows user-defined macros.
|
|
This is a fine program with a definite future. Once setup, it
|
|
performs with adequate accuracy.</P
|
|
><P
|
|
>XVoice requires that you download and install IBM's (free) ViaVoice
|
|
for Linux (See Commercial Section). It also requires the configuration
|
|
of ViaVoice to work correctly. Additionally, Lesstif/Motif (libXm) is
|
|
required. It is also important to note that because this program
|
|
interacts with X windows, you must leave X resources open on your
|
|
machine, so caution should be used if you use this on a networked or
|
|
multi-user machine.</P
|
|
><P
|
|
>This software is primarily for users. An RPM is available. </P
|
|
><P
|
|
>HomePage: http://www.compapp.dcu.ie/~tdoris/Xvoice/
|
|
http://www.zachary.com/creemer/xvoice.html</P
|
|
><P
|
|
>Project: http://xvoice.sourceforge.net</P
|
|
><P
|
|
>Community: http://www.onelist.com/community/xvoice</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="CVOICECONTROL">5.1.2. CVoiceControl/kVoiceControl</H3
|
|
><P
|
|
>CVoiceControl (which stands for Console Voice Control) started its
|
|
life as KVoiceControl (KDE Voice Control). It is a basic speech
|
|
recognition system that allows a user to execute Linux commands by
|
|
using spoken commands. CVoiceControl replaces KVoiceControl.</P
|
|
><P
|
|
>The software includes a microphone level configuration utility,
|
|
a vocabulary "model editor" for adding new commands and utterances,
|
|
and the speech recognition system.</P
|
|
><P
|
|
>CVoiceControl is an excellent starting point for experienced users
|
|
looking to get started in ASR. It is not the most user friendly,
|
|
but once it has been trained correctly, it can be very helpful. Be
|
|
sure to read the documentation while setting up.</P
|
|
><P
|
|
>This software is primarily for users.</P
|
|
><P
|
|
>Homepage: http://www.kiecza.de/daniel/linux/index.html</P
|
|
><P
|
|
>Documents: http://www.kiecza.de/daniel/linux/cvoicecontrol/index.html</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="OPENMIND">5.1.3. Open Mind Speech</H3
|
|
><P
|
|
>Started in late 1999, Open Mind Speech has changed names several times
|
|
(was VoiceControl, then SpeechInput, and then FreeSpeech), and is now
|
|
part of the "Open Mind Initiative". This is an open source project.
|
|
Currently it isn't completely operational and is primarily for developers.</P
|
|
><P
|
|
>This software is primarily for developers.</P
|
|
><P
|
|
>Homepage: http://freespeech.sourceforge.net</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="GVOICE">5.1.4. GVoice</H3
|
|
><P
|
|
>GVoice is a speech ASR library that uses IBM's ViaVoice (free) SDK
|
|
to control Gtk/GNOME applications. It includes libraries for
|
|
initialization, recognition engine, vocabulary manipulation, and panel
|
|
control. Development on this has been idle for over a year.</P
|
|
><P
|
|
>This software is primarily for developers. </P
|
|
><P
|
|
>Homepage: http://www.cse.ogi.edu/~omega/gnome/gvoice/</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="ISIP">5.1.5. ISIP</H3
|
|
><P
|
|
>The Institute for Signal and Information Processing at Mississippi
|
|
State University has made its speech recognition engine available. The
|
|
toolkit includes a front-end, a decoder, and a training module. It's a
|
|
functional toolkit.</P
|
|
><P
|
|
>This software is primarily for developers.</P
|
|
><P
|
|
>The toolkit (and more information about ISIP) is available at:
|
|
http://www.isip.msstate.edu/project/speech/</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="SPHINX">5.1.6. CMU Sphinx</H3
|
|
><P
|
|
>Sphinx originally started at CMU and has recently been released as
|
|
open source. This is a fairly large program that includes a lot of
|
|
tools and information. It is still "in development", but includes
|
|
trainers, recognizers, acoustic models, language models, and some
|
|
limited documentation.</P
|
|
><P
|
|
>This software is primarily for developers.</P
|
|
><P
|
|
>Homepage: http://www.speech.cs.cmu.edu/sphinx/Sphinx.html</P
|
|
><P
|
|
>Source: http://download.sourceforge.net/cmusphinx/sphinx2-0.1a.tar.gz</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="EARS">5.1.7. Ears</H3
|
|
><P
|
|
>Although Ears isn't fully developed, it is a good starting
|
|
point for programmers wishing to start in ASR.</P
|
|
><P
|
|
>This software is primarily for developers.</P
|
|
><P
|
|
>FTP site: ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="NICO">5.1.8. NICO ANN Toolkit</H3
|
|
><P
|
|
>The NICO Artificial Neural Network toolkit is a flexible back
|
|
propagation neural network toolkit optimized for speech recognition
|
|
applications. </P
|
|
><P
|
|
>This software is primarily for developers.</P
|
|
><P
|
|
>Its homepage: http://www.speech.kth.se/NICO/index.html</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="MYERS">5.1.9. Myers' Hidden Markov Model Software</H3
|
|
><P
|
|
>This software by Richard Myers is HMM algorithms written in C++ code.
|
|
It provides an example and learning tool for HMM models described in
|
|
the L. Rabiner book "Fundamentals of Speech Recognition". </P
|
|
><P
|
|
>This software is primarily for developers.</P
|
|
><P
|
|
>Information is available at:
|
|
http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/myers.hmm.html</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="JIALONG">5.1.10. Jialong He's Speech Recognition Research Tool</H3
|
|
><P
|
|
>Although not originally written for Linux, this research tool can be
|
|
compiled on Linux. It contains three different types of recognizers:
|
|
DTW, Dynamic Hidden Markov Model, and a Continuous Density Hidden
|
|
Markov Model. This is for research and development uses, as it is
|
|
not a fully functional ASR system. The toolkit contains some very
|
|
useful tools. </P
|
|
><P
|
|
>This software is primarily for developers.</P
|
|
><P
|
|
>More information is available at:
|
|
http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/jialong.html</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="MOREFREE">5.1.11. More Free Software?</H3
|
|
><P
|
|
>If you know of free software that isn't included in the above list,
|
|
please send me a note at: <A
|
|
HREF="mailto:scook@gear21.com"
|
|
TARGET="_top"
|
|
>scook@gear21.com</A
|
|
>. If you're in the mood,
|
|
you can also send me where to get a copy of the software, and any
|
|
impressions you may have about it. Thanks!</P
|
|
></DIV
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT2"
|
|
><H2
|
|
CLASS="SECT2"
|
|
><A
|
|
NAME="COMSOFTWARE">5.2. Commercial Software</H2
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="VIAVOICE">5.2.1. IBM ViaVoice</H3
|
|
><P
|
|
>IBM has made true on their promise to support Linux with their series
|
|
of ViaVoice products for Linux, though the future of their SDKs aren't
|
|
set in stone (their licensing agreement for developers isn't officially
|
|
released as of this date - more to come). </P
|
|
><P
|
|
>Their commercial (not-free) product, IBM ViaVoice Dictation for Linux
|
|
(available at http://www-4.ibm.com/software/speech/linux/dictation.html)
|
|
performs very well, but has some sizeable system requirements compared
|
|
to the more basic ASR systems (64M RAM and 233MHz Pentium). For the
|
|
$59.95US price tag you also get an Andrea NC-8 microphone. It also
|
|
allows multiple users (but I haven't tried it with multiple users, so
|
|
if anyone has any experience please give me a shout). The package
|
|
includes: documentation (PDF), Trainer, dictation system, and
|
|
installation scripts. Support for additional Linux Distributions based
|
|
on 2.2 kernels is also available in the latest release.</P
|
|
><P
|
|
> The ASR SDK is available for free, and includes IBM's SMAPI, grammar
|
|
API, documentation, and a variety of sample programs. The ViaVoice
|
|
Run Time Kit provides an ASR engine and data files for dictation
|
|
functions, and user utilities. The ViaVoice Command & Control Run Time
|
|
Kit includes the ASR engine and data files for command and control
|
|
functions, and user utilities. The SDK and Kits require 128M RAM and
|
|
a Linux 2.2 or better kernel)</P
|
|
><P
|
|
>The SDKs and Kits are available for free at:
|
|
http://www-4.ibm.com/software/speech/dev/sdk_linux.html</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="VOCALIS">5.2.2. Vocalis Speechware</H3
|
|
><P
|
|
>More information on Vocalis and Vocalis Speechware is available at:
|
|
<A
|
|
HREF="http://www.vocalisspeechware.com"
|
|
TARGET="_top"
|
|
>http://www.vocalisspeechware.com</A
|
|
> and
|
|
<A
|
|
HREF="http://www.vocalis.com"
|
|
TARGET="_top"
|
|
>http://www.vocalis.com</A
|
|
>. </P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="BABELTECH">5.2.3. Babel Technologies</H3
|
|
><P
|
|
>Babel Technologies has a Linux SDK available called Babear. It is a speaker-independent
|
|
system based on Hybrid Markov Models and Artificial Neural Networks technology. They also
|
|
have a variety of products for Text-to-speech, speaker verification, and phoneme analysis.
|
|
More information is available at: http://www.babeltech.com.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="SPEECHWORKS">5.2.4. SpeechWorks</H3
|
|
><P
|
|
>I didn't see anything on their website that specifically mentioned Linux, but their
|
|
"OpenSpeech Recognizer" uses VoiceXML, which is an open standard.
|
|
More information is available at: http://www.speechworks.com.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="NUANCE">5.2.5. Nuance</H3
|
|
><P
|
|
>Nuance offers a speech recognition/natural language product (currently Nuance 8.0) for
|
|
a variety of *nix platforms. It can handle very large vocabularies and uses a unqiue
|
|
distributed architecture for scalability and fault tolerance.
|
|
More information is available at: http://www.nuance.com.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="ABBOT">5.2.6. Abbot/AbbotDemo</H3
|
|
><P
|
|
>Abbot is a very large vocabulary, speaker independent ASR system.
|
|
It was originally developed by the Connectionist Speech Group at
|
|
Cambridge University. It was transferred (commercialized) to
|
|
SoftSound. More information is available at:
|
|
http://www.softsound.com.</P
|
|
><P
|
|
>AbbotDemo is a demonstration package of Abbot. This demo system
|
|
has a vocabulary of about 5000 words and uses the connectionist/HMM
|
|
continuous speech algorithm. This is a demonstration program with no
|
|
source code.</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="ENTROPIC">5.2.7. Entropic</H3
|
|
><P
|
|
>The fine people over at Entropic have been bought out by Micro$oft...
|
|
Their products and support services have all but disappeared. Their
|
|
support for HTK and ESPS/waves+ is gone, and their future is in the
|
|
hands of M$. Their old website as http://www.entropic.com has more
|
|
information.</P
|
|
><P
|
|
>K.K. Chin advised me that the original developers of the HTK (the
|
|
Speech Vision and Robotic Group at Cambridge) are still
|
|
providing support for it. There is also a "free" version
|
|
available at: <A
|
|
HREF="http://htk.eng.cam.ac.uk"
|
|
TARGET="_top"
|
|
>http://htk.eng.cam.ac.uk</A
|
|
>.
|
|
Also note that Microsoft still owns the copyright to the current
|
|
HTK code...
|
|
</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="SECT3"
|
|
><H3
|
|
CLASS="SECT3"
|
|
><A
|
|
NAME="MORECOM">5.2.8. More Commercial Products</H3
|
|
><P
|
|
>There are rumors of more commercial ASR products becoming available
|
|
in the near future (including L&H). I talked with a couple of
|
|
L&H representatives at Comdex 2000 (Vegas) and none of them could give
|
|
me any information on a Linux release, or even if they planned on releasing
|
|
any products for Linux. If you have any further information, please send
|
|
any details to me at <A
|
|
HREF="mailto:scook@gear21.com"
|
|
TARGET="_top"
|
|
>scook@gear21.com</A
|
|
>.</P
|
|
></DIV
|
|
></DIV
|
|
></DIV
|
|
><DIV
|
|
CLASS="NAVFOOTER"
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"><TABLE
|
|
SUMMARY="Footer navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="hardware.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="index.html"
|
|
ACCESSKEY="H"
|
|
>Home</A
|
|
></TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="inside.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
>Hardware</TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
> </TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
>Inside Speech Recognition</TD
|
|
></TR
|
|
></TABLE
|
|
></DIV
|
|
></BODY
|
|
></HTML
|
|
> |