In earlier posts I showed you can run incremental binary classification on your microcontroller with Stochastic Gradient Descent or Passive-Aggressive classifier. Now it is time to upgrade your toolbelt with a new item: One-vs-One multiclass classifier.

One vs One

Many classifiers are, by nature, binary: they can only distinguish the positive class from the negative one. Many of real-world problems, however, are multiclass: you have 3 or more possible outcomes to distinguish from.

There are a couple of ways to achieve this:

  1. One vs All: if your classifier is able to output a confidence score of its prediction, for N classes you train N classifiers, each able to recognize a single class. During inference, you pick the "most confident" one.
  2. One vs One: for N classes, you train N * (N-1) / 2 classifiers, one for each couple of classes. During inference, each classifier makes a prediction and you pick the class with the highest number of votes.

Since SGD and Passive-Aggressive don't output a confidence score, I implemented the One vs One algorithm to tackle the multiclass classification problem on microcontrollers.

Actually, One vs One is not a new type of classifier: it is really a "coordinator" class that sorts which samples go to which classifier. You can still choose your own classifier type to use.

As SGD and Passive-Aggressive, OneVsOne implements the classifier interface, so you will use the well known fitOne and predict methods.

Want to learn more?

Example code

// Esp32 has some problems with min/max
#define min(a, b) (a) < (b) ? (a) : (b)
#define max(a, b) (a) > (b) ? (a) : (b)
// you will actually need only one of SGD or PassiveAggressive
#include "EloquentSGD.h"
#include "EloquentPassiveAggressive.h"
#include "EloquentOneVsOne.h"
#include "EloquentAccuracyScorer.h"
// this file defines NUM_FEATURES, NUM_CLASSES, TRAIN_SAMPLES and TEST_SAMPLES
#include "dataset.h"

using namespace Eloquent::ML;

void setup() {
  Serial.begin(115200);
  delay(3000);
}

void loop() {
  AccuracyScorer scorer;
  // OneVsOne needs the actual classifier class, the number of features and the number of classes
  OneVsOne<SGD<FEATURES_DIM>, FEATURES_DIM, NUM_CLASSES> clf;

  // clf.set() propagates the configuration to the actual classifiers
  // if a parameter does not exists on the classifier, it does nothing
  // in this example, alpha and momentum refer to SGD, C to Passive-Aggressive
  clf.set("alpha", 1);
  clf.set("momentum", 0.7);
  clf.set("C", 0.1);

  // fit
  // I noticed that repeating the training a few times over the same dataset increases performance  to a certain extent: if you re-train it too much, performance will decay
  for (unsigned int i = 0; i < TRAIN_SAMPLES * 5; i++) {
      clf.fitOne(X_train[i % TRAIN_SAMPLES], y_train[i % TRAIN_SAMPLES]);
  }

  // predict
  for (int i = 0; i < TEST_SAMPLES; i++) {
      int y_true = y_test[i];
      int y_pred = clf.predict(X_test[i]);

      Serial.print("Predicted ");
      Serial.print(y_pred);
      Serial.print(" vs ");
      Serial.println(y_true);
      scorer.scoreOne(y_true, y_pred);
  }

  Serial.print("Accuracy = ");
  Serial.print(scorer.accuracy() * 100);
  Serial.print(" out of ");
  Serial.print(scorer.support());
  Serial.println(" samples");
  delay(30000);
}

If you refer to the previous posts on SGD and Passive-Aggressive, you'll notice that you would be able to replace one with the other and your code will change by 1 single line only. This let's you experiment to find the best configuration for your project without hassle.

Accuracy

Well, accuracy vary.

In my tests, I couldn't get predictable accuracy on all datasets. I couldn't even get acceptable accuracy on the Iris dataset (60% max). But I got 90% accuracy on the Digits dataset from scikit-learn with 6 classes.

You have to experiment. Try Passive-Aggressive with many C values. If it doesn't work, try SGD with varying momentum and alpha. Try to repeat the training over the dataset 5, 10 times.

In a next post I'll report my benchmarks so you can see what works for you and what not.
This is an emerging field for me, so I will need time to master it.


As always, you can find the examle on Github with a the dataset to experiment with.

Help the blow grow