online-learning – Eloquent Arduino Blog http://eloquentarduino.github.io/ Machine learning on Arduino, programming & electronics Fri, 01 May 2020 08:34:15 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.6 Stochastic Gradient Descent on your microcontroller https://eloquentarduino.github.io/2020/04/stochastic-gradient-descent-on-your-microcontroller/ Fri, 10 Apr 2020 17:43:45 +0000 https://eloquentarduino.github.io/?p=1062 Stochastic gradient descent is a well know algorithm to train classifiers in an incremental fashion: that is, as training samples become available. This saves you critical memory on tiny devices while still achieving top performance! Now you can use it on your microcontroller with ease. A brief recap on Stochastic Gradient Descent If you ever […]

L'articolo Stochastic Gradient Descent on your microcontroller proviene da Eloquent Arduino Blog.

]]>
Stochastic gradient descent is a well know algorithm to train classifiers in an incremental fashion: that is, as training samples become available. This saves you critical memory on tiny devices while still achieving top performance! Now you can use it on your microcontroller with ease.

A brief recap on Stochastic Gradient Descent

If you ever worked with Machine learning, you surely know about Gradient descent: it is an iterative algorithm to optimize a loss function.

It is much general-purpose, in the sense that it is not bound to a particular application, but it has been heavily used in Neural networks in the recent years.

Yet, it can be used as a classifier on its own if you set its loss function as the classification error.

Update rule of Gradient descent

This is the core update rule of Gradient descent: quite simple.

As you see, there's a summation in the formula: this means we need to cycle through the entire training set to compute the update to the weights.

In case of large datasets, this can be slow or not possible at all.

And requires a lot of memory.

And we don't have memory on microcontrollers.

So we need Stochastic gradient descent.

Stochastic gradient descent has the same exact update rule, but it is applied on the single training sample.

Imagine the summation goes from 1 to 1, instead of m.

That's it.

How to use

The pattern of use is similar to that of the Passive Aggressive classifier: you have the fitOne and predict methods.

First of all, download the library from Github.

#include <EloquentSGD.h>
#include <EloquentAccuracyScorer.h>
#include "iris.h"

#define VERBOSE

using namespace Eloquent::ML;

void setup() {
    Serial.begin(115200);
    delay(3000);
}

void loop() {
    int trainSamples;
    int retrainingCycles;
    SGD<FEATURES_DIM> clf;
    AccuracyScorer scorer;

    // ....

    // train
    for (uint16_t cycle = 0; cycle < retrainingCycles; cycle++)
        for (uint16_t i = 0; i < trainSamples; i++)
            clf.fitOne(X[i], y[i]);

    // predict
    for (uint16_t i = trainSamples; i < DATASET_SIZE; i++) {
        int predicted = clf.predict(X[i]);
        int actual = y[i];

        scorer.scoreOne(actual, predicted);
    }

    Serial.print("Accuracy: ");
    Serial.print(round(100 * scorer.accuracy()));
    Serial.print("% out of ");
    Serial.print(scorer.support());
    Serial.println(" predictions");
}

In this case we're working with known datasets, so we cycle through them for the training, but if you're learning "on-line", from samples generated over time, it will work exactly the same.

A bit of momentum

Stochastic gradient descent works quite well out of the box in most cases.

Sometimes, however, its updates can start "oscillating".

SGD with and without momentum

To solve this problem, it has been proposed the momentum technique, which can both speed up learning and increase the accuracy.

In my personal tests, I was able to achieve up to +5% in accuracy on the majority of datasets.

To use it, you only need to set a decay factor between 0 and 1.

SGD clf;

clf.momentum(0.5);

Run on your own

On Github you can find the full example with some benchmark datasets to try on your own.

Finding this content useful?

The example is interactive an will ask you how many samples to use for the training and how many times to cycle through them.

This is something you should consider: if you have a training set and can store it somehow (in memory or on Flash for example), re-presenting the same samples to the SGD classifier could (and probably will) increase its performance if done correctly.

This happens because the algorithm needs some time to converge and if it doesn't receive enough samples it won't learn properly.

Of course, if you re-use the same samples over and over again, you're likely to overfit.

L'articolo Stochastic Gradient Descent on your microcontroller proviene da Eloquent Arduino Blog.

]]>
Passive-aggressive classifier for embedded devices https://eloquentarduino.github.io/2020/04/passive-aggressive-classifier-for-embedded-devices/ Sun, 05 Apr 2020 17:04:10 +0000 https://eloquentarduino.github.io/?p=1050 When working with memory constrained devices you may not able to keep all the training data in memory: passive-aggressive classifiers may help solve your memory problems. Batch learning A couple weeks ago I started exploring the possibility to train a machine learning classifier directly on a microcontroller. Since I like SVM, I ported the simplified […]

L'articolo Passive-aggressive classifier for embedded devices proviene da Eloquent Arduino Blog.

]]>
When working with memory constrained devices you may not able to keep all the training data in memory: passive-aggressive classifiers may help solve your memory problems.

Batch learning

A couple weeks ago I started exploring the possibility to train a machine learning classifier directly on a microcontroller. Since I like SVM, I ported the simplified SVM SMO (Sequential Minimal Optimization) algorithm to plain C, ready to be deployed to embedded devices.

Now, that kind of algorithm works in the so-called "batch-mode": it needs all the training data to be available in memory to learn.

This may be a limiting factor on resource-constrained devices, since it poses an upper bound to the number of samples you can train on. And when working with high-dimensional datasets, the number of samples could be not enough to achieve good accuracy.

Enter incremental learning

To solve this limitation, you need a totally different kind of learning algorithms: you need incremental (a.k.a online a.k.a out of core) learning.

Incremental learning works by inspecting one training sample at a time, instead of all at once.

The clear advantage is that you have a tiny memory footprint. And this is a huge advantage.

The clear disadvantage is that you don't have the "big picture" of your data, so:

  • the end result will probably be affected by the order of presentation of the samples
  • you may not be able to achieve top accuracy

Passive-aggressive classifier

Passive-aggressive classification is one of the available incremental learning algorithms and it is very simple to implement, since it has a closed-form update rule.

Please refer to this short explanation on Passive-aggressive classifiers for a nice description with images.

The core concept is that the classifier adjusts it weight vector for each mis-classified training sample it receives, trying to get it correct.

Passive aggressive classifier

Benchmarks

I run a couple benchmark on my Esp32 to assess both accuracy and training time.

First of all: it is fast!. When I say it is fast I mean it takes ~1ms to train on 400 samples x 30 features each.

Talking about accuracy instead... Uhm...

Accuracy vary. Greatly.

You can achieve 100% on some datasets.

And 40% on others. But on those same datasets you can achieve >85% if training on a different number of samples. Or in a different order.

I guess this is the tradeoff for such a simple and space-efficient algorithm.

I report my results in the following table. It is not meant to be an exhaustive benchmark of the classifier, since those number will vary based on the order of presentation, but still you can get an idea of what it is able to achieve.

Dataset size Train samples Accuracy
BREAST CANCER
567 samples 20 62
30 features 40 37
60 63
100 39
150 38
200 64
250 61
300 69
350 73
400 85
IRIS
100 samples 10 50
4 features 20 51
40 100
60 100
80 100
DIGITS
358 samples 20 98
64 features 40 98
60 99
100 100
150 100
200 99
250 98
300 95
CLEVELAND HEART DISEASE
212 samples 20 76
13 features 40 24
60 77
100 19
120 82
140 78
180 88

Time to code

Here I'll report an extract of the example code you can find on Github for this classifier.

#include "EloquentPassiveAggressiveClassifier.h"
#include "EloquentAccuracyScorer.h"
#include "iris.h"

using namespace Eloquent::ML;

void loop() {
    int trainSamples;
    PassiveAggressiveClassifier<FEATURES_DIM> clf;
    AccuracyScorer scorer;

    trainSamples = readSerialNumber("How many samples will you use as training?", DATASET_SIZE - 2);

    if (trainSamples == 0)
        return;

    clf.setC(1);

    // train
    for (uint16_t i = 0; i < trainSamples; i++)
        clf.fitOne(X[i], y[i]);

    // predict
    for (uint16_t i = trainSamples; i < DATASET_SIZE; i++) {
        int predicted = clf.predict(X[i]);
        int actual = y[i] > 0 ? 1 : -1;

        scorer.scoreOne(actual, predicted);
    }

    Serial.print("Accuracy: ");
    Serial.print(round(100 * scorer.accuracy()));
    Serial.print("% out of ");
    Serial.print(scorer.support());
    Serial.println(" predictions");
}

On the project page you will find the code to reproduce these numbers.


L'articolo Passive-aggressive classifier for embedded devices proviene da Eloquent Arduino Blog.

]]>