Even smaller Machine learning models for your MCU: up to -82% code size

So far we've used SVM (Support Vector Machine) as our main classifier to port a Machine learning model to a microcontroller: but recently I found an interesting alternative which could be waaaay smaller, mantaining a similar accuracy.

The current state
A new algorithm: Relevance Vector Machines
Training a classifier
Porting to C
Performace comparison
Size comparison
Troubleshooting
Disclaimer

The current state

I chose SVM as my main focus of intereset for the MicroML framework because I knew the support vector encoding could be very memory efficient once ported to plain C. And it really is.

I was able to port many real-world models (gesture identification, wake word detection) to tiny microcontrollers like the old Arduino Nano (32 kb flash, 2 kb RAM).

The tradeoff of my implementation was to sacrifice the flash space (which is usually quite big) to save as much RAM as possible, which is usually the most limiting factor.

Due to this implementation, if your model grows in size (highly dimensional data or not well separable data), the generated code will still fit in the RAM, but "overflow" the available flash.

In a couple of my previous post I warned that model selection might be a required step before being able to deploy a model to a MCU, since you should first check if it fits. If not, you must train another model hoping to get fewer support vectors, since each of them contributes to the code size increase.

A new algorithm: Relevance Vector Machines

It was by chance that I came across a new algorithm that I never heard of, called Relevance Vector Machine. It was patented by Microsoft until last year (so maybe this is the reason you don't see it in the wild), but now it is free of use as far as I can tell.

Here is the link to the paper if you want to read it, it gives some insights into the development process.

I'm not a mathematician, so I can't describe it accurately, but in a few words it uses the same formulation of SVM (a weightened sum of kernels), applying a Bayesan model.

This serves in the first place to be able to get the probabilities of the classification results, which is something totally missing in SVM.

In the second place, the algorithm tries to learn a much more sparse representation of the support vectors, as you can see in the following picture.

When I first read the paper my first tought was just "wow"! This is exactly what I need for my MicroML framework: a ultra-lightweight model which can still achieve high accuracy.

Training a classifier

Now that I knew this algorithm, I searched for it in the sklearn documentation: it was not there.

It seems that, since it was patented, they didn't have an implementation.

Fortunately, there is an implementation which follows the sklearn paradigm. You have to install it:

pip install Cython
pip install https://github.com/AmazaspShumik/sklearn_bayes/archive/master.zip

Since the interface is the usual fit predict, it is super easy to train a classifier.

from sklearn.datasets import load_iris
from skbayes.rvm_ard_models import RVC
import warnings

# I get tons of boring warnings during training, so turn it off
warnings.filterwarnings("ignore")

iris = load_iris()
X = iris.data
y = iris.target
clf = RVC(kernel='rbf', gamma=0.001)
clf.fit(X, y)
y_predict = clf.predict(X)

The parameters for the constructor are similar to those of the SVC classifier from sklearn:

kernel: one of linear, poly, rbf
degree: if kernel=poly
gamma: if kernel=poly or kernel=rbf

You can read the docs from sklearn to learn more.

Porting to C

Now that we have a trained classifier, we have to port it to plain C that compiles on our microcontroller of choice.

I patched my package micromlgen to do the job for you, so you should install the latest version to get it working.

 pip install --upgrade micromlgen

Now the export part is almost the same as with an SVM classifier.

 from micromlgen import port_rvm

 clf = get_rvm_classifier()
 c_code = port_rvm(clf)
 print(c_code)

And you're done: you have plain C code you can embed in any microcontroller.

Performace comparison

To test the effectiveness of this new algorithm, I applied it to the datasets I built in my previous posts, comparing side by side the size and accuracy of both SVM and RVM.

The results are summarized in the next table.

Dataset	SVM		RVM		Delta
	Flash(byte)	Acc. (%)	Flash(byte)	Acc. (%)	Flash	Acc.
RGB colors	4584	100	3580	100	-22%	-0%
Accelerometer gestures(linear kernel)	36888	92	7056	85	-80%	-7%
Accelerometer gestures(gaussian kernel)	45348	95	7766	95	-82%	-0%
Wifi positioning	4641	100	3534	100	-24%	-0%
Wake word(linear kernel)	18098	86	3602	53	-80%	-33%
Wake word(gaussian kernel)	21788	90	4826	62	-78%	-28%

** the accuracy reported are with default parameters, without any tuning, averaged in 30 runs

As you may see, the results are quite surpising:

you can achieve up to 82% space reduction on highly dimensional dataset without any loss in accuracy (accelerometer gestures with gaussian kernel)
sometimes you may not be able to achieve a decent accuracy (62% at most on the wake word dataset)

As in any situation, you should test which one of the two algorithms works best for your use case, but there a couple of guidelines you may follow:

if you need top accuracy, probably SVM can achieve slighter better performance if you have enough space
if you need tiny space or top speed, test if RVM achieves a satisfiable accuracy
if both SVM and RVM achieve comparable performace, go with RVM: it's much lighter than SVM in most cases and will run faster

Size comparison

As a reference, here is the codes generated for an SVM classifier and an RVM one to classify the IRIS dataset.

uint8_t predict_rvm(double *x) {
    double decision[3] = { 0 };
    decision[0] = -0.6190847299428206;
    decision[1] = (compute_kernel(x,  6.3, 3.3, 6.0, 2.5) - 72.33233 ) * 0.228214 + -2.3609625;
    decision[2] = (compute_kernel(x,  7.7, 2.8, 6.7, 2.0) - 81.0089166 ) * -0.29006 + -3.360963;
    uint8_t idx = 0;
    double val = decision[0];
    for (uint8_t i = 1; i < 3; i++) {
        if (decision[i] > val) {
            idx = i;
            val = decision[i];
        }
    }
    return idx;
}

int predict_svm(double *x) {
    double kernels[10] = { 0 };
    double decisions[3] = { 0 };
    int votes[3] = { 0 };
        kernels[0] = compute_kernel(x,   6.7  , 3.0  , 5.0  , 1.7 );
        kernels[1] = compute_kernel(x,   6.0  , 2.7  , 5.1  , 1.6 );
        kernels[2] = compute_kernel(x,   5.1  , 2.5  , 3.0  , 1.1 );
        kernels[3] = compute_kernel(x,   6.0  , 3.0  , 4.8  , 1.8 );
        kernels[4] = compute_kernel(x,   7.2  , 3.0  , 5.8  , 1.6 );
        kernels[5] = compute_kernel(x,   4.9  , 2.5  , 4.5  , 1.7 );
        kernels[6] = compute_kernel(x,   6.2  , 2.8  , 4.8  , 1.8 );
        kernels[7] = compute_kernel(x,   6.0  , 2.2  , 5.0  , 1.5 );
        kernels[8] = compute_kernel(x,   4.8  , 3.4  , 1.9  , 0.2 );
        kernels[9] = compute_kernel(x,   5.1  , 3.3  , 1.7  , 0.5 );
        decisions[0] = 20.276395502
                    + kernels[0] * 100.0
                    + kernels[1] * 100.0
                    + kernels[3] * -79.351629954
                    + kernels[4] * -49.298850195
                    + kernels[6] * -40.585178082
                    + kernels[7] * -30.764341769
        ;
        decisions[1] = -0.903345464
                    + kernels[2] * 0.743494115
                    + kernels[9] * -0.743494115
        ;
        decisions[2] = -1.507856504
                    + kernels[5] * 0.203695177
                    + kernels[8] * -0.160020702
                    + kernels[9] * -0.043674475
        ;
        votes[decisions[0] > 0 ? 0 : 1] += 1;
        votes[decisions[1] > 0 ? 0 : 2] += 1;
        votes[decisions[2] > 0 ? 1 : 2] += 1;
                int classVal = -1;
        int classIdx = -1;
        for (int i = 0; i < 3; i++) {
            if (votes[i] > classVal) {
                classVal = votes[i];
                classIdx = i;
            }
        }
        return classIdx;
}

As you can see, RVM actually only computes 2 kernels and does 2 multiplications. SVM, on the other hand, computes 10 kernels and does 13 multiplications.

This is a recurring pattern, so RVM is much much faster in the inference process.

Troubleshooting

It can happen that when running micromlgen.port(clf) you get a TemplateNotFound error. To solve the problem, first of all uninstall micromlgen.

pip uninstall micromlgen

Then head to Github, download the package as zip and extract the micromlgen folder into your project.

Disclaimer

micromlgen and in particular port_rvm are work in progress: you may experience some glitches or it may not work in your specific case. Please report any issue on the Github repo.

Help the blow grow

Arduino gesture recognition: the easy way with Machine Learning

Recent Posts

Programming

TAGS