old-www/HOWTO/AI-Alife-HOWTO-7.html

386 lines
14 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.21">
<TITLE>GNU/Linux AI &amp; Alife HOWTO: Statistical &amp; Machine Learning</TITLE>
<LINK HREF="AI-Alife-HOWTO-8.html" REL=next>
<LINK HREF="AI-Alife-HOWTO-6.html" REL=previous>
<LINK HREF="AI-Alife-HOWTO.html#toc7" REL=contents>
</HEAD>
<BODY>
<A HREF="AI-Alife-HOWTO-8.html">Next</A>
<A HREF="AI-Alife-HOWTO-6.html">Previous</A>
<A HREF="AI-Alife-HOWTO.html#toc7">Contents</A>
<HR>
<H2><A NAME="Statistical &amp; Machine Learning"></A> <A NAME="s7">7.</A> <A HREF="AI-Alife-HOWTO.html#toc7">Statistical &amp; Machine Learning</A> </H2>
<P>All about getting machines to learn to do something rather than
explicitly programming to do it. Tends to deal with pattern matching
a lot and are heavily math and statistically based. Technically
<A HREF="AI-Alife-HOWTO-3.html#Connectionism">Connectionism</A>
falls under this category, but it is such a
large sub-field I'm keeping it in a separate section.</P>
<H2><A NAME="ss7.1">7.1</A> <A HREF="AI-Alife-HOWTO.html#toc7.1">Libraries</A>
</H2>
<P>Libraries or frameworks used for writing machine learning systems.</P>
<P>
<DL>
<P>
<A NAME="CognitiveFoundry"></A> </P>
<DT><B>CognitiveFoundry</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://foundry.sandia.gov/">http://foundry.sandia.gov/</A></LI>
</UL>
</P>
<P>The Cognitive Foundry is a modular Java software library for the
research and development of cognitive systems. It contains many
reusable components for machine learning, statistics, and cognitive
modeling. It is primarily designed to be easy to plug into applications
to provide adaptive behaviors.</P>
<P>
<A NAME="CompLearn"></A> </P>
<DT><B>CompLearn</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://complearn.org/">http://complearn.org/</A></LI>
</UL>
</P>
<P>CompLearn is a software system built to support compression-based
learning in a wide variety of applications. It provides this support in
the form of a library written in highly portable ANSI C that runs in
most modern computer environments with minimal confusion. It also
supplies a small suite of simple, composable command-line utilities as
simple applications that use this library. Together with other commonly
used machine-learning tools such as LibSVM and GraphViz, CompLearn
forms an attractive offering in machine-learning frameworks and
toolkits.</P>
<P>
<A NAME="Elefant"></A> </P>
<DT><B>Elefant</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://elefant.developer.nicta.com.au/">http://elefant.developer.nicta.com.au/</A></LI>
</UL>
</P>
<P>Elefant (Efficient Learning, Large-scale Inference, and Optimisation
Toolkit) is an open source library for machine learning licensed under
the Mozilla Public License (MPL). We develop an open source machine
learning toolkit which provides</P>
<P>
<UL>
<LI>algorithms for machine learning utilising the power of
multi-core/multi-threaded processors/operating systems (Linux,
WIndows, Mac OS X),</LI>
<LI>a graphical user interface for users who want to quickly
prototype machine learning experiments,</LI>
<LI>tutorials to support learning about Statistical Machine
Learning (Statistical Machine Learning at The Australian National
University), and</LI>
<LI>detailed and precise documentation for each of the above.</LI>
</UL>
</P>
<P>
<A NAME="Maximum Entropy Toolkit"></A> </P>
<DT><B>Maximum Entropy Toolkit</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html">http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html</A></LI>
</UL>
</P>
<P>The Maximum Entropy Toolkit provides a set of tools and library for
constructing maximum entropy (maxent) model in either Python or C++.</P>
<P>Maxent Entropy Model is a general purpose machine learning framework
that has proved to be highly expressive and powerful in statistical
natural language processing, statistical physics, computer vision and
many other fields.</P>
<P>
<A NAME="Milk"></A> </P>
<DT><B>Milk</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://packages.python.org/milk/">http://packages.python.org/milk/</A></LI>
<LI>Web site:
<A HREF="https://github.com/luispedro/milk">https://github.com/luispedro/milk</A></LI>
</UL>
</P>
<P>Milk is a machine learning toolkit in Python. It's focus is on
supervised classification with several classifiers available: SVMs
(based on libsvm), k-NN, random forests, decision trees. It also
performs feature selection. These classifiers can be combined in many
ways to form different classification systems. For unsupervised
learning, milk supports k-means clustering and affinity propagation.</P>
<P>
<A NAME="NLTK"></A> </P>
<DT><B>NLTK</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://nltk.org/">http://nltk.org/</A></LI>
</UL>
</P>
<P>NLTK, the Natural Language Toolkit, is a suite of Python libraries and
programs for symbolic and statistical natural language processing.
NLTK includes graphical demonstrations and sample data. It is
accompanied by extensive documentation, including tutorials that
explain the underlying concepts behind the language processing tasks
supported by the toolkit.</P>
<P>NLTK is ideally suited to students who are learning NLP (natural
language processing) or conducting research in NLP or closely related
areas, including empirical linguistics, cognitive science, artificial
intelligence, information retrieval, and machine learning. NLTK has
been used successfully as a teaching tool, as an individual study tool,
and as a platform for prototyping and building research systems.</P>
<P>
<A NAME="peach"></A> </P>
<DT><B>peach</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://code.google.com/p/peach/">http://code.google.com/p/peach/</A></LI>
</UL>
</P>
<P>Peach is a pure-python module, based on SciPy and NumPy to implement
algorithms for computational intelligence and machine learning. Methods
implemented include, but are not limited to, artificial neural
networks, fuzzy logic, genetic algorithms, swarm intelligence and much
more.</P>
<P>The aim of this library is primarily educational. Nonetheless, care was
taken to make the methods implemented also very efficient.</P>
<P>
<A NAME="pebl"></A> </P>
<DT><B>pebl</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://code.google.com/p/pebl-project/">http://code.google.com/p/pebl-project/</A></LI>
</UL>
</P>
<P>Pebl is a python library and command line application for learning the
structure of a Bayesian network given prior knowledge and observations.
Pebl includes the following features:</P>
<P>
<UL>
<LI>Can learn with observational and interventional data</LI>
<LI>Handles missing values and hidden variables using exact and
heuristic methods</LI>
<LI>Provides several learning algorithms; makes creating new ones
simple</LI>
<LI>Has facilities for transparent parallel execution using several
cluster/grid resources</LI>
<LI>Calculates edge marginals and consensus networks</LI>
<LI>Presents results in a variety of formats </LI>
</UL>
</P>
<P>
<A NAME="PyBrain"></A> </P>
<DT><B>PyBrain</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://pybrain.org/">http://pybrain.org/</A></LI>
</UL>
</P>
<P>PyBrain is a modular Machine Learning Library for Python. It's goal is
to offer flexible, easy-to-use yet still powerful algorithms for
Machine Learning Tasks and a variety of predefined environments to test
and compare your algorithms.</P>
<P>PyBrain contains algorithms for neural networks, for reinforcement
learning (and the combination of the two), for unsupervised learning,
and evolution. Since most of the current problems deal with continuous
state and action spaces, function approximators (like neural networks)
must be used to cope with the large dimensionality. Our library is
built around neural networks in the kernel and all of the training
methods accept a neural network as the to-be-trained instance. This
makes PyBrain a powerful tool for real-life tasks.</P>
<P>
<A NAME="MBT"></A> </P>
<DT><B>MBT</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://ilk.uvt.nl/mbt/">http://ilk.uvt.nl/mbt/</A></LI>
</UL>
</P>
<P>MBT is a memory-based tagger-generator and tagger in one. The
tagger-generator part can generate a sequence tagger on the basis of a
training set of tagged sequences; the tagger part can tag new
sequences. MBT can, for instance, be used to generate part-of-speech
taggers or chunkers for natural language processing. It has also been
used for named-entity recognition, information extraction in
domain-specific texts, and disfluency chunking in transcribed speech.</P>
<P>
<A NAME="MLAP book samples"></A> </P>
<DT><B>MLAP book samples</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://seat.massey.ac.nz/personal/s.r.marsland/MLBook.html">http://seat.massey.ac.nz/personal/s.r.marsland/MLBook.html</A></LI>
</UL>
</P>
<P>Not a library per-say, but a whole slew of example machine learning
algorithms from the book "Machine Learning: An Algorithmic Perspective"
by Stephen Marsland. All code is written in python.</P>
<P>
<A NAME="scikits.learn"></A> </P>
<DT><B>scikits.learn</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://scikit-learn.org/stable/">http://scikit-learn.org/stable/</A></LI>
</UL>
</P>
<P>scikits-learn is a Python module integrating classic machine learning
algorithms in the tightly-knit world of scientific Python packages
(numpy, scipy, matplotlib). It aims to provide simple and efficient
solutions to learning problems that are accessible to everybody and
reusable in various contexts: machine-learning as a versatile tool for
science and engineering.</P>
<P>
<A NAME="Shogun"></A> </P>
<DT><B>Shogun</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://www.shogun-toolbox.org/">http://www.shogun-toolbox.org/</A></LI>
</UL>
</P>
<P>The machine learning toolbox's focus is on large scale kernel methods
and especially on Support Vector Machines (SVM). It provides a generic
SVM object interfacing to several different SVM implementations, among
them the state of the art LibSVM and SVMLight. Each of the SVMs can be
combined with a variety of kernels. The toolbox not only provides
efficient implementations of the most common kernels, like the Linear,
Polynomial, Gaussian and Sigmoid Kernel but also comes with a number of
recent string kernels as e.g. the Locality Improved, Fischer, TOP,
Spectrum, Weighted Degree Kernel (with shifts). For the latter the
efficient LINADD optimizations are implemented. Also SHOGUN offers the
freedom of working with custom pre-computed kernels. One of its key
features is the combined kernel which can be constructed by a weighted
linear combination of a number of sub-kernels, each of which not
necessarily working on the same domain. An optimal sub-kernel weighting
can be learned using Multiple Kernel Learning. Currently SVM 2-class
classification and regression problems can be dealt with. However
SHOGUN also implements a number of linear methods like Linear
Discriminant Analysis (LDA), Linear Programming Machine (LPM), (Kernel)
Perceptrons and features algorithms to train hidden markov models. The
input feature-objects can be dense, sparse or strings and of type
int/short/double/char and can be converted into different feature
types. Chains of preprocessors (e.g. substracting the mean) can be
attached to each feature object allowing for on-the-fly pre-processing.</P>
<P>SHOGUN is implemented in C++ and interfaces to Matlab(tm), R, Octave
and Python.</P>
<P>
<A NAME="timbl"></A> </P>
<DT><B>timbl</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://ilk.uvt.nl/timbl/">http://ilk.uvt.nl/timbl/</A></LI>
</UL>
</P>
<P>The Tilburg Memory Based Learner, TiMBL, is a tool for NLP research,
and for many other domains where classification tasks are learned from
examples. It is an efficient implementation of k-nearest neighbor
classifier.</P>
<P>TiMBL's features are:
<UL>
<LI>Fast, decision-tree-based implementation of k-nearest neighbor
lassification;</LI>
<LI>Implementations of IB1 and IB2, IGTree, TRIBL, and TRIBL2
algorithms;</LI>
<LI>Similarity metrics: Overlap, MVDM, Jeffrey Divergence, Dot
product, Cosine;</LI>
<LI>Feature weighting metrics: information gain, gain ratio,
chi squared, shared variance;</LI>
<LI>Distance weighting metrics: inverse, inverse linear,
exponential decay;</LI>
<LI>Extensive verbosity options to inspect nearest neighbor sets;</LI>
<LI>Server functionality and extensive API;</LI>
<LI>Fast leave-one-out testing and internal cross-validation;</LI>
<LI>and Handles user-defined example weighting.</LI>
</UL>
</P>
</DL>
</P>
<H2><A NAME="ss7.2">7.2</A> <A HREF="AI-Alife-HOWTO.html#toc7.2">Applications</A>
</H2>
<P>Full applications that implement various machine learning or statistical
systems oriented toward general learning (i.e., no spam filters and the
like).</P>
<P>
<DL>
<P>
<A NAME="dbacl"></A> </P>
<DT><B>dbacl</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://dbacl.sourceforge.net/">http://dbacl.sourceforge.net/</A></LI>
</UL>
</P>
<P>The dbacl project consist of a set of lightweight UNIX/POSIX utilities
which can be used, either directly or in shell scripts, to classify
text documents automatically, according to Bayesian statistical
principles.</P>
<P>
<A NAME="Torch5"></A> </P>
<DT><B>Torch5</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://torch5.sourceforge.net/">http://torch5.sourceforge.net/</A></LI>
</UL>
</P>
<P>Torch5 provides a matlab-like environment for state-of-the-art machine
learning algorithms. It is easy to use and provides a very efficient
implementation, thanks to a easy and fast scripting language (Lua) and
a underlying C++ implementation. It is distributed under a BSD license.</P>
<P>This is the successor to the
<A HREF="AI-Alife-HOWTO-2.html#Torch">Torch3</A> project.</P>
<P>
<A NAME="Vowpal Wabbit"></A> </P>
<DT><B>Vowpal Wabbit</B><DD><P>
<UL>
<LI>Web site:
<A HREF="http://hunch.net/~vw/">http://hunch.net/~vw/</A></LI>
</UL>
</P>
<P>Vowpal Wabbit is a fast online learning algorithm. It features:</P>
<P>
<UL>
<LI>flexible input data specification</LI>
<LI>speedy learning</LI>
<LI>scalability (bounded memory footprint, suitable for
distributed computation)</LI>
<LI>feature pairing</LI>
</UL>
</P>
<P>The core algorithm is specialist gradient descent (GD) on a loss
function (several are available), The code should be easily usable.</P>
</DL>
</P>
<HR>
<A HREF="AI-Alife-HOWTO-8.html">Next</A>
<A HREF="AI-Alife-HOWTO-6.html">Previous</A>
<A HREF="AI-Alife-HOWTO.html#toc7">Contents</A>
</BODY>
</HTML>