API for clj-ml.classifiers

by Antonio Garrote <antoniogarrote@gmail.com>

Usage:
(ns your-namespace
  (:require clj-ml.classifiers))

Overview

This namespace contains several functions for building classifiers using different
classification algorithms: Bayes networks, multilayer perceptron, decission tree or
support vector machines are available. Some of these classifiers have incremental
versions so they can be built without having all the dataset instances in memory.

Functions for evaluating the classifiers built using cross validation or a training
set are also provided.

A sample use of the API for classifiers is shown below:

 (use 'clj-ml.classifiers)

 ; Building a classifier using a  C4.5 decission tree
 (def *classifier* (make-classifier :decission-tree :c45))

 ; We set the class attribute for the loaded dataset.
 ; *dataset* is supposed to contain a set of instances.
 (dataset-set-class *dataset* 4)

 ; Training the classifier
 (classifier-train *classifier* *ds*)

 ; We evaluate the classifier using a test dataset
 (def *evaluation*   (classifier-evaluate classifier  :dataset *dataset* *trainingset*))

 ; We retrieve some data from the evaluation result
 (:kappa *evaluation*)
 (:root-mean-squared-error *evaluation*)
 (:precision *evaluation*)

 ; A trained classifier can be used to classify new instances
 (def *to-classify* (make-instance ds  {:class :Iris-versicolor
                                        :petalwidth 0.2
                                        :petallength 1.4
                                        :sepalwidth 3.5
                                        :sepallength 5.1}))

 ; We retrieve the index of the class value assigned by the classifier
 (classifier-classify *classifier* *to-classify*)

 ; We retrieve a symbol with the value assigned by the classifier
 (classifier-label *classifier* *to-classify*)

A classifier can also be trained using cross-validation:

 (classifier-evaluate *classifier* :cross-validation ds 10)

Finally a classifier can be stored in a file for later use:

 (use 'clj-ml.utils)

 (serialize-to-file *classifier*
  "/Users/antonio.garrote/Desktop/classifier.bin")

Public Variables and Functions



classifier-classify

function
Usage: (classifier-classify classifier instance)
Classifies an instance or data vector using the provided classifier


classifier-evaluate

multimethod
No usage documentation available
Evaluetes a trained classifier using the provided dataset or cross-validation


classifier-label

function
Usage: (classifier-label classifier instance)
Classifies and assign a label to a dataset instance


classifier-train

function
Usage: (classifier-train classifier dataset)
Trains a classifier with the given dataset as the training data


classifier-update

function
Usage: (classifier-update classifier instance-s)
If the classifier is updateable it updates the classifier with the given instance or set of instances


make-classifier

multimethod
No usage documentation available
Creates a new classifier for the given kind algorithm and options.

The first argument identifies the kind of classifier and the second
argument the algorithm to use, e.g. :decission-tree :c45.

The colection of classifiers currently supported are:

  - :decission-tree :c45
  - :bayes :naive
  - :neural-network :mutilayer-perceptron
  - :support-vector-machine :smo

Optionally, a map of options can also be passed as an argument with
a set of classifier specific options.

This is the description of the supported classifiers and the accepted
option parameters for each of them:

 * :decission-tree :c45

   A classifier building a pruned or unpruned C 4.5 decission tree using
   Weka J 4.8 implementation.

   Parameters:

     - :unpruned
         Use unpruned tree. Sample value: true
     - :reduce-error-pruning
         Sample value: true
     - :only-binary-splits
         Sample value: true
     - :no-raising
         Sample value: true
     - :no-cleanup
         Sample value: true
     - :laplace-smoothing
         For predicted probabilities. Sample value: true
     - :pruning-confidence
         Threshold for pruning. Default value: 0.25
     - :minimum-instances
         Minimum number of instances per leave. Default value: 2
     - :pruning-number-folds
         Set number of folds for reduced error pruning. Default value: 3
     - :random-seed
         Seed for random data shuffling. Default value: 1

 * :bayes :naive

   Classifier based on the Bayes' theorem with strong independence assumptions, among the
   probabilistic variables.

   Parameters:

     - :kernel-estimator
         Use kernel desity estimator rather than normal. Sample value: true
     - :supervised-discretization
         Use supervised discretization to to process numeric attributes (see :supervised-discretize
         filter in clj-ml.filters/make-filter function). Sample value: true

 * :neural-network :multilayer-perceptron

   Classifier built using a feedforward artificial neural network with three or more layers
   of neurons and nonlinear activation functions. It is able to distinguish data that is not
   linearly separable.

   Parameters:

     - :no-nominal-to-binary
         A :nominal-to-binary filter will not be applied by default. (see :supervised-nominal-to-binary
         filter in clj-ml.filters/make-filter function). Default value: false
     - :no-numeric-normalization
         A numeric class will not be normalized. Default value: false
     - :no-nomalization
         No attribute will be normalized. Default value: false
     - :no-reset
         Reseting the network will not be allowed. Default value: false
     - :learning-rate-decay
         Learning rate decay will occur. Default value: false
     - :learning-rate
         Learning rate for the backpropagation algorithm. Value should be between [0,1].
         Default value: 0.3
     - :momentum
         Momentum rate for the backpropagation algorithm. Value shuld be between [0,1].
         Default value: 0.2
     - :epochs
         Number of iteration to train through. Default value: 500
     - :percentage-validation-set
         Percentage size of validation set to use to terminate training. If it is not zero
         it takes precende over the number of epochs to finish training. Values should be
         between [0,100]. Default value: 0
     - :random-seed
         Value of the seed for the random generator. Values should be longs greater than
         0. Default value: 0
     - :threshold-number-errors
         The consequetive number of errors allowed for validation testing before the network
         terminates. Values should be greater thant 0. Default value: 20
Logo & site design by Tom Hickey.
Clojure auto-documentation system by Tom Faulhaber.