386 lines
14 KiB
HTML
386 lines
14 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.21">
|
|
<TITLE>GNU/Linux AI & Alife HOWTO: Statistical & Machine Learning</TITLE>
|
|
<LINK HREF="AI-Alife-HOWTO-8.html" REL=next>
|
|
<LINK HREF="AI-Alife-HOWTO-6.html" REL=previous>
|
|
<LINK HREF="AI-Alife-HOWTO.html#toc7" REL=contents>
|
|
</HEAD>
|
|
<BODY>
|
|
<A HREF="AI-Alife-HOWTO-8.html">Next</A>
|
|
<A HREF="AI-Alife-HOWTO-6.html">Previous</A>
|
|
<A HREF="AI-Alife-HOWTO.html#toc7">Contents</A>
|
|
<HR>
|
|
<H2><A NAME="Statistical & Machine Learning"></A> <A NAME="s7">7.</A> <A HREF="AI-Alife-HOWTO.html#toc7">Statistical & Machine Learning</A> </H2>
|
|
|
|
|
|
<P>All about getting machines to learn to do something rather than
|
|
explicitly programming to do it. Tends to deal with pattern matching
|
|
a lot and are heavily math and statistically based. Technically
|
|
<A HREF="AI-Alife-HOWTO-3.html#Connectionism">Connectionism</A>
|
|
falls under this category, but it is such a
|
|
large sub-field I'm keeping it in a separate section.</P>
|
|
|
|
|
|
<H2><A NAME="ss7.1">7.1</A> <A HREF="AI-Alife-HOWTO.html#toc7.1">Libraries</A>
|
|
</H2>
|
|
|
|
|
|
<P>Libraries or frameworks used for writing machine learning systems.</P>
|
|
<P>
|
|
<DL>
|
|
<P>
|
|
<A NAME="CognitiveFoundry"></A> </P>
|
|
<DT><B>CognitiveFoundry</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://foundry.sandia.gov/">http://foundry.sandia.gov/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>The Cognitive Foundry is a modular Java software library for the
|
|
research and development of cognitive systems. It contains many
|
|
reusable components for machine learning, statistics, and cognitive
|
|
modeling. It is primarily designed to be easy to plug into applications
|
|
to provide adaptive behaviors.</P>
|
|
|
|
<P>
|
|
<A NAME="CompLearn"></A> </P>
|
|
<DT><B>CompLearn</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://complearn.org/">http://complearn.org/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>CompLearn is a software system built to support compression-based
|
|
learning in a wide variety of applications. It provides this support in
|
|
the form of a library written in highly portable ANSI C that runs in
|
|
most modern computer environments with minimal confusion. It also
|
|
supplies a small suite of simple, composable command-line utilities as
|
|
simple applications that use this library. Together with other commonly
|
|
used machine-learning tools such as LibSVM and GraphViz, CompLearn
|
|
forms an attractive offering in machine-learning frameworks and
|
|
toolkits.</P>
|
|
|
|
<P>
|
|
<A NAME="Elefant"></A> </P>
|
|
<DT><B>Elefant</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://elefant.developer.nicta.com.au/">http://elefant.developer.nicta.com.au/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>Elefant (Efficient Learning, Large-scale Inference, and Optimisation
|
|
Toolkit) is an open source library for machine learning licensed under
|
|
the Mozilla Public License (MPL). We develop an open source machine
|
|
learning toolkit which provides</P>
|
|
<P>
|
|
<UL>
|
|
<LI>algorithms for machine learning utilising the power of
|
|
multi-core/multi-threaded processors/operating systems (Linux,
|
|
WIndows, Mac OS X),</LI>
|
|
<LI>a graphical user interface for users who want to quickly
|
|
prototype machine learning experiments,</LI>
|
|
<LI>tutorials to support learning about Statistical Machine
|
|
Learning (Statistical Machine Learning at The Australian National
|
|
University), and</LI>
|
|
<LI>detailed and precise documentation for each of the above.</LI>
|
|
</UL>
|
|
</P>
|
|
|
|
<P>
|
|
<A NAME="Maximum Entropy Toolkit"></A> </P>
|
|
<DT><B>Maximum Entropy Toolkit</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html">http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>The Maximum Entropy Toolkit provides a set of tools and library for
|
|
constructing maximum entropy (maxent) model in either Python or C++.</P>
|
|
<P>Maxent Entropy Model is a general purpose machine learning framework
|
|
that has proved to be highly expressive and powerful in statistical
|
|
natural language processing, statistical physics, computer vision and
|
|
many other fields.</P>
|
|
|
|
<P>
|
|
<A NAME="Milk"></A> </P>
|
|
<DT><B>Milk</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://packages.python.org/milk/">http://packages.python.org/milk/</A></LI>
|
|
<LI>Web site:
|
|
<A HREF="https://github.com/luispedro/milk">https://github.com/luispedro/milk</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>Milk is a machine learning toolkit in Python. It's focus is on
|
|
supervised classification with several classifiers available: SVMs
|
|
(based on libsvm), k-NN, random forests, decision trees. It also
|
|
performs feature selection. These classifiers can be combined in many
|
|
ways to form different classification systems. For unsupervised
|
|
learning, milk supports k-means clustering and affinity propagation.</P>
|
|
|
|
<P>
|
|
<A NAME="NLTK"></A> </P>
|
|
<DT><B>NLTK</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://nltk.org/">http://nltk.org/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>NLTK, the Natural Language Toolkit, is a suite of Python libraries and
|
|
programs for symbolic and statistical natural language processing.
|
|
NLTK includes graphical demonstrations and sample data. It is
|
|
accompanied by extensive documentation, including tutorials that
|
|
explain the underlying concepts behind the language processing tasks
|
|
supported by the toolkit.</P>
|
|
<P>NLTK is ideally suited to students who are learning NLP (natural
|
|
language processing) or conducting research in NLP or closely related
|
|
areas, including empirical linguistics, cognitive science, artificial
|
|
intelligence, information retrieval, and machine learning. NLTK has
|
|
been used successfully as a teaching tool, as an individual study tool,
|
|
and as a platform for prototyping and building research systems.</P>
|
|
|
|
<P>
|
|
<A NAME="peach"></A> </P>
|
|
<DT><B>peach</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://code.google.com/p/peach/">http://code.google.com/p/peach/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>Peach is a pure-python module, based on SciPy and NumPy to implement
|
|
algorithms for computational intelligence and machine learning. Methods
|
|
implemented include, but are not limited to, artificial neural
|
|
networks, fuzzy logic, genetic algorithms, swarm intelligence and much
|
|
more.</P>
|
|
<P>The aim of this library is primarily educational. Nonetheless, care was
|
|
taken to make the methods implemented also very efficient.</P>
|
|
|
|
<P>
|
|
<A NAME="pebl"></A> </P>
|
|
<DT><B>pebl</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://code.google.com/p/pebl-project/">http://code.google.com/p/pebl-project/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>Pebl is a python library and command line application for learning the
|
|
structure of a Bayesian network given prior knowledge and observations.
|
|
Pebl includes the following features:</P>
|
|
<P>
|
|
<UL>
|
|
<LI>Can learn with observational and interventional data</LI>
|
|
<LI>Handles missing values and hidden variables using exact and
|
|
heuristic methods</LI>
|
|
<LI>Provides several learning algorithms; makes creating new ones
|
|
simple</LI>
|
|
<LI>Has facilities for transparent parallel execution using several
|
|
cluster/grid resources</LI>
|
|
<LI>Calculates edge marginals and consensus networks</LI>
|
|
<LI>Presents results in a variety of formats </LI>
|
|
</UL>
|
|
</P>
|
|
|
|
<P>
|
|
<A NAME="PyBrain"></A> </P>
|
|
<DT><B>PyBrain</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://pybrain.org/">http://pybrain.org/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>PyBrain is a modular Machine Learning Library for Python. It's goal is
|
|
to offer flexible, easy-to-use yet still powerful algorithms for
|
|
Machine Learning Tasks and a variety of predefined environments to test
|
|
and compare your algorithms.</P>
|
|
<P>PyBrain contains algorithms for neural networks, for reinforcement
|
|
learning (and the combination of the two), for unsupervised learning,
|
|
and evolution. Since most of the current problems deal with continuous
|
|
state and action spaces, function approximators (like neural networks)
|
|
must be used to cope with the large dimensionality. Our library is
|
|
built around neural networks in the kernel and all of the training
|
|
methods accept a neural network as the to-be-trained instance. This
|
|
makes PyBrain a powerful tool for real-life tasks.</P>
|
|
|
|
<P>
|
|
<A NAME="MBT"></A> </P>
|
|
<DT><B>MBT</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://ilk.uvt.nl/mbt/">http://ilk.uvt.nl/mbt/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>MBT is a memory-based tagger-generator and tagger in one. The
|
|
tagger-generator part can generate a sequence tagger on the basis of a
|
|
training set of tagged sequences; the tagger part can tag new
|
|
sequences. MBT can, for instance, be used to generate part-of-speech
|
|
taggers or chunkers for natural language processing. It has also been
|
|
used for named-entity recognition, information extraction in
|
|
domain-specific texts, and disfluency chunking in transcribed speech.</P>
|
|
|
|
<P>
|
|
<A NAME="MLAP book samples"></A> </P>
|
|
<DT><B>MLAP book samples</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://seat.massey.ac.nz/personal/s.r.marsland/MLBook.html">http://seat.massey.ac.nz/personal/s.r.marsland/MLBook.html</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>Not a library per-say, but a whole slew of example machine learning
|
|
algorithms from the book "Machine Learning: An Algorithmic Perspective"
|
|
by Stephen Marsland. All code is written in python.</P>
|
|
|
|
<P>
|
|
<A NAME="scikits.learn"></A> </P>
|
|
<DT><B>scikits.learn</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://scikit-learn.org/stable/">http://scikit-learn.org/stable/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>scikits-learn is a Python module integrating classic machine learning
|
|
algorithms in the tightly-knit world of scientific Python packages
|
|
(numpy, scipy, matplotlib). It aims to provide simple and efficient
|
|
solutions to learning problems that are accessible to everybody and
|
|
reusable in various contexts: machine-learning as a versatile tool for
|
|
science and engineering.</P>
|
|
|
|
<P>
|
|
<A NAME="Shogun"></A> </P>
|
|
<DT><B>Shogun</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://www.shogun-toolbox.org/">http://www.shogun-toolbox.org/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>The machine learning toolbox's focus is on large scale kernel methods
|
|
and especially on Support Vector Machines (SVM). It provides a generic
|
|
SVM object interfacing to several different SVM implementations, among
|
|
them the state of the art LibSVM and SVMLight. Each of the SVMs can be
|
|
combined with a variety of kernels. The toolbox not only provides
|
|
efficient implementations of the most common kernels, like the Linear,
|
|
Polynomial, Gaussian and Sigmoid Kernel but also comes with a number of
|
|
recent string kernels as e.g. the Locality Improved, Fischer, TOP,
|
|
Spectrum, Weighted Degree Kernel (with shifts). For the latter the
|
|
efficient LINADD optimizations are implemented. Also SHOGUN offers the
|
|
freedom of working with custom pre-computed kernels. One of its key
|
|
features is the combined kernel which can be constructed by a weighted
|
|
linear combination of a number of sub-kernels, each of which not
|
|
necessarily working on the same domain. An optimal sub-kernel weighting
|
|
can be learned using Multiple Kernel Learning. Currently SVM 2-class
|
|
classification and regression problems can be dealt with. However
|
|
SHOGUN also implements a number of linear methods like Linear
|
|
Discriminant Analysis (LDA), Linear Programming Machine (LPM), (Kernel)
|
|
Perceptrons and features algorithms to train hidden markov models. The
|
|
input feature-objects can be dense, sparse or strings and of type
|
|
int/short/double/char and can be converted into different feature
|
|
types. Chains of preprocessors (e.g. substracting the mean) can be
|
|
attached to each feature object allowing for on-the-fly pre-processing.</P>
|
|
<P>SHOGUN is implemented in C++ and interfaces to Matlab(tm), R, Octave
|
|
and Python.</P>
|
|
|
|
<P>
|
|
<A NAME="timbl"></A> </P>
|
|
<DT><B>timbl</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://ilk.uvt.nl/timbl/">http://ilk.uvt.nl/timbl/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>The Tilburg Memory Based Learner, TiMBL, is a tool for NLP research,
|
|
and for many other domains where classification tasks are learned from
|
|
examples. It is an efficient implementation of k-nearest neighbor
|
|
classifier.</P>
|
|
<P>TiMBL's features are:
|
|
<UL>
|
|
<LI>Fast, decision-tree-based implementation of k-nearest neighbor
|
|
lassification;</LI>
|
|
<LI>Implementations of IB1 and IB2, IGTree, TRIBL, and TRIBL2
|
|
algorithms;</LI>
|
|
<LI>Similarity metrics: Overlap, MVDM, Jeffrey Divergence, Dot
|
|
product, Cosine;</LI>
|
|
<LI>Feature weighting metrics: information gain, gain ratio,
|
|
chi squared, shared variance;</LI>
|
|
<LI>Distance weighting metrics: inverse, inverse linear,
|
|
exponential decay;</LI>
|
|
<LI>Extensive verbosity options to inspect nearest neighbor sets;</LI>
|
|
<LI>Server functionality and extensive API;</LI>
|
|
<LI>Fast leave-one-out testing and internal cross-validation;</LI>
|
|
<LI>and Handles user-defined example weighting.</LI>
|
|
</UL>
|
|
</P>
|
|
|
|
|
|
</DL>
|
|
</P>
|
|
|
|
<H2><A NAME="ss7.2">7.2</A> <A HREF="AI-Alife-HOWTO.html#toc7.2">Applications</A>
|
|
</H2>
|
|
|
|
|
|
<P>Full applications that implement various machine learning or statistical
|
|
systems oriented toward general learning (i.e., no spam filters and the
|
|
like).</P>
|
|
<P>
|
|
<DL>
|
|
<P>
|
|
<A NAME="dbacl"></A> </P>
|
|
<DT><B>dbacl</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://dbacl.sourceforge.net/">http://dbacl.sourceforge.net/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>The dbacl project consist of a set of lightweight UNIX/POSIX utilities
|
|
which can be used, either directly or in shell scripts, to classify
|
|
text documents automatically, according to Bayesian statistical
|
|
principles.</P>
|
|
|
|
<P>
|
|
<A NAME="Torch5"></A> </P>
|
|
<DT><B>Torch5</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://torch5.sourceforge.net/">http://torch5.sourceforge.net/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>Torch5 provides a matlab-like environment for state-of-the-art machine
|
|
learning algorithms. It is easy to use and provides a very efficient
|
|
implementation, thanks to a easy and fast scripting language (Lua) and
|
|
a underlying C++ implementation. It is distributed under a BSD license.</P>
|
|
<P>This is the successor to the
|
|
<A HREF="AI-Alife-HOWTO-2.html#Torch">Torch3</A> project.</P>
|
|
|
|
<P>
|
|
<A NAME="Vowpal Wabbit"></A> </P>
|
|
<DT><B>Vowpal Wabbit</B><DD><P>
|
|
<UL>
|
|
<LI>Web site:
|
|
<A HREF="http://hunch.net/~vw/">http://hunch.net/~vw/</A></LI>
|
|
</UL>
|
|
</P>
|
|
<P>Vowpal Wabbit is a fast online learning algorithm. It features:</P>
|
|
<P>
|
|
<UL>
|
|
<LI>flexible input data specification</LI>
|
|
<LI>speedy learning</LI>
|
|
<LI>scalability (bounded memory footprint, suitable for
|
|
distributed computation)</LI>
|
|
<LI>feature pairing</LI>
|
|
</UL>
|
|
</P>
|
|
<P>The core algorithm is specialist gradient descent (GD) on a loss
|
|
function (several are available), The code should be easily usable.</P>
|
|
|
|
|
|
</DL>
|
|
</P>
|
|
|
|
<HR>
|
|
<A HREF="AI-Alife-HOWTO-8.html">Next</A>
|
|
<A HREF="AI-Alife-HOWTO-6.html">Previous</A>
|
|
<A HREF="AI-Alife-HOWTO.html#toc7">Contents</A>
|
|
</BODY>
|
|
</HTML>
|