A brand new binary classifier that's tiny and accurate, perfect for embedded scenarios: easily achieve 90+ % accuracy with a minimal memory footprint!

Binary classification - from https://towardsdatascience.com

A few weeks ago I was wandering over arxiv.org looking for insipiration relative to Machine learning on microcontrollers when I found exactly what I was looking for.

SEFR: A Fast Linear-Time Classifier for Ultra-Low Power Devices is a paper from Hamidreza Keshavarz, Mohammad Saniee Abadeh, Reza Rawassizadeh where the authors develop a binary classifier that is:

  • fast during training
  • fast during prediction
  • requires minimal memory

It has been specifically designed for embedded machine learning, so no optimization is required to run in on microcontrollers: it is tiny by design. In short, it uses a combination of the averages of the features as weights plus a bias to distinguish between positive and negative class. If you read the paper you will sure understand it: it's very straightforward.

How to use

The authors both provided a C and Python implementation on Github you can read. I ported the C version "manually" to my Eloquent ML library and created a Python package called sefr copy-pasting from the original repo.

Here's a Python example.

from sefr import SEFR
from sklearn.datasets import load_iris
from sklearn.preprocessing import normalize
from sklearn.model_selection import train_test_split

if __name__ == '__main__':
    iris = load_iris()
    X = normalize(iris.data)
    y = iris.target
    X = X[y < 2]
    y = y[y < 2]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
    clf = SEFR()
    clf.fit(X_train, y_train)
    print('accuracy', (clf.predict(X_test) == y_test).sum() / len(y_test))

How good is it?

DatasetNo. of featuresAccuracy
Iris4100%
Breast cancer3089%
Wine1384%
Digits6499%

Considering that the model only needs 1 weight per feature, I think this results are impressive!

Micromlgen integration

The Python porting was done so I could integrate it easily in my micromlgen package.

How to use it?

from sefr import SEFR
from sklearn.datasets import load_iris
from micromlgen import port

if __name__ == '__main__':
    iris = load_iris()
    X = iris.data
    y = iris.target
    X = X[y < 2]
    y = y[y < 2]
    clf = SEFR()
    clf.fit(X_train, y_train)
    print(port(clf))

The produced code is so compact I will report it here.

Want to learn more?

#pragma once
namespace Eloquent {
    namespace ML {
        namespace Port {
            class SEFR {
                public:
                    /**
                    * Predict class for features vector
                    */
                    int predict(float *x) {
                        return dot(x,   0.084993602632  , -0.106163278477  , 0.488989863684  , 0.687022900763 ) <= 2.075 ? 0 : 1;
                    }

                protected:
                    /**
                    * Compute dot product between features vector and classifier weights
                    */
                    float dot(float *x, ...) {
                        va_list w;
                        va_start(w, 4);
                        float kernel = 0.0;

                        for (uint16_t i = 0; i < 4; i++) {
                            kernel += x[i] * va_arg(w, double);
                        }

                        return kernel;
                    }
                };
            }
        }
    }

In your sketch:

#include "IrisSEFR.h"
#include "IrisTest.h"

void setup() {
    Serial.begin(115200);
}

void loop() {
    Eloquent::ML::Port::SEFR clf;
    Eloquent::ML::Test::IrisTestSet testSet;

    testSet.test(clf);
    Serial.println(testSet.dump());
    delay(5000);
}

You have to clone the Github example to compile the code.

Troubleshooting

It can happen that when running micromlgen.port(clf) you get a TemplateNotFound error. To solve the problem, first of all uninstall micromlgen.

pip uninstall micromlgen

Then head to Github, download the package as zip and extract the micromlgen folder into your project.


That's all for today, I hope you will try this classifier and find a project it fits in: I'm very impressed by the easiness of implementation yet the accuracy it can achieve on benchmark datasets.

In the next weeks I'm thinking in implementing a multi-class version of this and see how it performs, so stay tuned!

Help the blow grow