old-www/HOWTO/Speech-Recognition-HOWTO/software.html

506 lines
13 KiB
HTML

<HTML
><HEAD
><TITLE
>Speech Recognition Software</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+
"><LINK
REL="HOME"
TITLE="Speech Recognition HOWTO"
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="Hardware"
HREF="hardware.html"><LINK
REL="NEXT"
TITLE="Inside Speech Recognition"
HREF="inside.html"></HEAD
><BODY
CLASS="SECT1"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Speech Recognition HOWTO</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="hardware.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="inside.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="SOFTWARE">5. Speech Recognition Software</H1
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="FREESOFTWARE">5.1. Free Software</H2
><P
>Much of the free software listed here is available for download at:
http://sunsite.uio.no/pub/Linux/sound/apps/speech/
</P
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="XVOICE">5.1.1. XVoice</H3
><P
>XVoice is a dictation/continuous speech recognizer that can be used
with a variety of XWindow applications. It allows user-defined macros.
This is a fine program with a definite future. Once setup, it
performs with adequate accuracy.</P
><P
>XVoice requires that you download and install IBM's (free) ViaVoice
for Linux (See Commercial Section). It also requires the configuration
of ViaVoice to work correctly. Additionally, Lesstif/Motif (libXm) is
required. It is also important to note that because this program
interacts with X windows, you must leave X resources open on your
machine, so caution should be used if you use this on a networked or
multi-user machine.</P
><P
>This software is primarily for users. An RPM is available. </P
><P
>HomePage: http://www.compapp.dcu.ie/~tdoris/Xvoice/
http://www.zachary.com/creemer/xvoice.html</P
><P
>Project: http://xvoice.sourceforge.net</P
><P
>Community: http://www.onelist.com/community/xvoice</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="CVOICECONTROL">5.1.2. CVoiceControl/kVoiceControl</H3
><P
>CVoiceControl (which stands for Console Voice Control) started its
life as KVoiceControl (KDE Voice Control). It is a basic speech
recognition system that allows a user to execute Linux commands by
using spoken commands. CVoiceControl replaces KVoiceControl.</P
><P
>The software includes a microphone level configuration utility,
a vocabulary "model editor" for adding new commands and utterances,
and the speech recognition system.</P
><P
>CVoiceControl is an excellent starting point for experienced users
looking to get started in ASR. It is not the most user friendly,
but once it has been trained correctly, it can be very helpful. Be
sure to read the documentation while setting up.</P
><P
>This software is primarily for users.</P
><P
>Homepage: http://www.kiecza.de/daniel/linux/index.html</P
><P
>Documents: http://www.kiecza.de/daniel/linux/cvoicecontrol/index.html</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="OPENMIND">5.1.3. Open Mind Speech</H3
><P
>Started in late 1999, Open Mind Speech has changed names several times
(was VoiceControl, then SpeechInput, and then FreeSpeech), and is now
part of the "Open Mind Initiative". This is an open source project.
Currently it isn't completely operational and is primarily for developers.</P
><P
>This software is primarily for developers.</P
><P
>Homepage: http://freespeech.sourceforge.net</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="GVOICE">5.1.4. GVoice</H3
><P
>GVoice is a speech ASR library that uses IBM's ViaVoice (free) SDK
to control Gtk/GNOME applications. It includes libraries for
initialization, recognition engine, vocabulary manipulation, and panel
control. Development on this has been idle for over a year.</P
><P
>This software is primarily for developers. </P
><P
>Homepage: http://www.cse.ogi.edu/~omega/gnome/gvoice/</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="ISIP">5.1.5. ISIP</H3
><P
>The Institute for Signal and Information Processing at Mississippi
State University has made its speech recognition engine available. The
toolkit includes a front-end, a decoder, and a training module. It's a
functional toolkit.</P
><P
>This software is primarily for developers.</P
><P
>The toolkit (and more information about ISIP) is available at:
http://www.isip.msstate.edu/project/speech/</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="SPHINX">5.1.6. CMU Sphinx</H3
><P
>Sphinx originally started at CMU and has recently been released as
open source. This is a fairly large program that includes a lot of
tools and information. It is still "in development", but includes
trainers, recognizers, acoustic models, language models, and some
limited documentation.</P
><P
>This software is primarily for developers.</P
><P
>Homepage: http://www.speech.cs.cmu.edu/sphinx/Sphinx.html</P
><P
>Source: http://download.sourceforge.net/cmusphinx/sphinx2-0.1a.tar.gz</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="EARS">5.1.7. Ears</H3
><P
>Although Ears isn't fully developed, it is a good starting
point for programmers wishing to start in ASR.</P
><P
>This software is primarily for developers.</P
><P
>FTP site: ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="NICO">5.1.8. NICO ANN Toolkit</H3
><P
>The NICO Artificial Neural Network toolkit is a flexible back
propagation neural network toolkit optimized for speech recognition
applications. </P
><P
>This software is primarily for developers.</P
><P
>Its homepage: http://www.speech.kth.se/NICO/index.html</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="MYERS">5.1.9. Myers' Hidden Markov Model Software</H3
><P
>This software by Richard Myers is HMM algorithms written in C++ code.
It provides an example and learning tool for HMM models described in
the L. Rabiner book "Fundamentals of Speech Recognition". </P
><P
>This software is primarily for developers.</P
><P
>Information is available at:
http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/myers.hmm.html</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="JIALONG">5.1.10. Jialong He's Speech Recognition Research Tool</H3
><P
>Although not originally written for Linux, this research tool can be
compiled on Linux. It contains three different types of recognizers:
DTW, Dynamic Hidden Markov Model, and a Continuous Density Hidden
Markov Model. This is for research and development uses, as it is
not a fully functional ASR system. The toolkit contains some very
useful tools. </P
><P
>This software is primarily for developers.</P
><P
>More information is available at:
http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/jialong.html</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="MOREFREE">5.1.11. More Free Software?</H3
><P
>If you know of free software that isn't included in the above list,
please send me a note at: <A
HREF="mailto:scook@gear21.com"
TARGET="_top"
>scook@gear21.com</A
>. If you're in the mood,
you can also send me where to get a copy of the software, and any
impressions you may have about it. Thanks!</P
></DIV
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="COMSOFTWARE">5.2. Commercial Software</H2
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="VIAVOICE">5.2.1. IBM ViaVoice</H3
><P
>IBM has made true on their promise to support Linux with their series
of ViaVoice products for Linux, though the future of their SDKs aren't
set in stone (their licensing agreement for developers isn't officially
released as of this date - more to come). </P
><P
>Their commercial (not-free) product, IBM ViaVoice Dictation for Linux
(available at http://www-4.ibm.com/software/speech/linux/dictation.html)
performs very well, but has some sizeable system requirements compared
to the more basic ASR systems (64M RAM and 233MHz Pentium). For the
$59.95US price tag you also get an Andrea NC-8 microphone. It also
allows multiple users (but I haven't tried it with multiple users, so
if anyone has any experience please give me a shout). The package
includes: documentation (PDF), Trainer, dictation system, and
installation scripts. Support for additional Linux Distributions based
on 2.2 kernels is also available in the latest release.</P
><P
> The ASR SDK is available for free, and includes IBM's SMAPI, grammar
API, documentation, and a variety of sample programs. The ViaVoice
Run Time Kit provides an ASR engine and data files for dictation
functions, and user utilities. The ViaVoice Command &#38; Control Run Time
Kit includes the ASR engine and data files for command and control
functions, and user utilities. The SDK and Kits require 128M RAM and
a Linux 2.2 or better kernel)</P
><P
>The SDKs and Kits are available for free at:
http://www-4.ibm.com/software/speech/dev/sdk_linux.html</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="VOCALIS">5.2.2. Vocalis Speechware</H3
><P
>More information on Vocalis and Vocalis Speechware is available at:
<A
HREF="http://www.vocalisspeechware.com"
TARGET="_top"
>http://www.vocalisspeechware.com</A
> and
<A
HREF="http://www.vocalis.com"
TARGET="_top"
>http://www.vocalis.com</A
>.&#13;</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="BABELTECH">5.2.3. Babel Technologies</H3
><P
>Babel Technologies has a Linux SDK available called Babear. It is a speaker-independent
system based on Hybrid Markov Models and Artificial Neural Networks technology. They also
have a variety of products for Text-to-speech, speaker verification, and phoneme analysis.
More information is available at: http://www.babeltech.com.</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="SPEECHWORKS">5.2.4. SpeechWorks</H3
><P
>I didn't see anything on their website that specifically mentioned Linux, but their
"OpenSpeech Recognizer" uses VoiceXML, which is an open standard.
More information is available at: http://www.speechworks.com.</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="NUANCE">5.2.5. Nuance</H3
><P
>Nuance offers a speech recognition/natural language product (currently Nuance 8.0) for
a variety of *nix platforms. It can handle very large vocabularies and uses a unqiue
distributed architecture for scalability and fault tolerance.
More information is available at: http://www.nuance.com.</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="ABBOT">5.2.6. Abbot/AbbotDemo</H3
><P
>Abbot is a very large vocabulary, speaker independent ASR system.
It was originally developed by the Connectionist Speech Group at
Cambridge University. It was transferred (commercialized) to
SoftSound. More information is available at:
http://www.softsound.com.</P
><P
>AbbotDemo is a demonstration package of Abbot. This demo system
has a vocabulary of about 5000 words and uses the connectionist/HMM
continuous speech algorithm. This is a demonstration program with no
source code.</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="ENTROPIC">5.2.7. Entropic</H3
><P
>The fine people over at Entropic have been bought out by Micro$oft...
Their products and support services have all but disappeared. Their
support for HTK and ESPS/waves+ is gone, and their future is in the
hands of M$. Their old website as http://www.entropic.com has more
information.</P
><P
>K.K. Chin advised me that the original developers of the HTK (the
Speech Vision and Robotic Group at Cambridge) are still
providing support for it. There is also a "free" version
available at: <A
HREF="http://htk.eng.cam.ac.uk"
TARGET="_top"
>http://htk.eng.cam.ac.uk</A
>.
Also note that Microsoft still owns the copyright to the current
HTK code...
</P
></DIV
><DIV
CLASS="SECT3"
><H3
CLASS="SECT3"
><A
NAME="MORECOM">5.2.8. More Commercial Products</H3
><P
>There are rumors of more commercial ASR products becoming available
in the near future (including L&#38;H). I talked with a couple of
L&#38;H representatives at Comdex 2000 (Vegas) and none of them could give
me any information on a Linux release, or even if they planned on releasing
any products for Linux. If you have any further information, please send
any details to me at <A
HREF="mailto:scook@gear21.com"
TARGET="_top"
>scook@gear21.com</A
>.</P
></DIV
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="hardware.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="inside.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Hardware</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Inside Speech Recognition</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>