A brand new binary classifier that's tiny and accurate, perfect for embedded scenarios: easily achieve 90+ % accuracy with a minimal memory footprint!
A few weeks ago I was wandering over arxiv.org looking for insipiration relative to Machine learning on microcontrollers when I found exactly what I was looking for.
SEFR: A Fast Linear-Time Classifier for Ultra-Low Power Devices is a paper from Hamidreza Keshavarz, Mohammad Saniee Abadeh, Reza Rawassizadeh where the authors develop a binary classifier that is:
- fast during training
- fast during prediction
- requires minimal memory
It has been specifically designed for embedded machine learning, so no optimization is required to run in on microcontrollers: it is tiny by design. In short, it uses a combination of the averages of the features as weights plus a bias to distinguish between positive and negative class. If you read the paper you will sure understand it: it's very straightforward.
How to use
The authors both provided a C and Python implementation on Github you can read. I ported the C version "manually" to my Eloquent ML library and created a Python package called sefr copy-pasting from the original repo.
Here's a Python example.
from sefr import SEFR
from sklearn.datasets import load_iris
from sklearn.preprocessing import normalize
from sklearn.model_selection import train_test_split
if __name__ == '__main__':
iris = load_iris()
X = normalize(iris.data)
y = iris.target
X = X[y < 2]
y = y[y < 2]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
clf = SEFR()
clf.fit(X_train, y_train)
print('accuracy', (clf.predict(X_test) == y_test).sum() / len(y_test))
How good is it?
Dataset | No. of features | Accuracy |
---|---|---|
Iris | 4 | 100% |
Breast cancer | 30 | 89% |
Wine | 13 | 84% |
Digits | 64 | 99% |
Considering that the model only needs 1 weight per feature, I think this results are impressive!
Micromlgen integration
The Python porting was done so I could integrate it easily in my micromlgen package.
How to use it?
from sefr import SEFR
from sklearn.datasets import load_iris
from micromlgen import port
if __name__ == '__main__':
iris = load_iris()
X = iris.data
y = iris.target
X = X[y < 2]
y = y[y < 2]
clf = SEFR()
clf.fit(X_train, y_train)
print(port(clf))
The produced code is so compact I will report it here.
#pragma once
namespace Eloquent {
namespace ML {
namespace Port {
class SEFR {
public:
/**
* Predict class for features vector
*/
int predict(float *x) {
return dot(x, 0.084993602632 , -0.106163278477 , 0.488989863684 , 0.687022900763 ) <= 2.075 ? 0 : 1;
}
protected:
/**
* Compute dot product between features vector and classifier weights
*/
float dot(float *x, ...) {
va_list w;
va_start(w, 4);
float kernel = 0.0;
for (uint16_t i = 0; i < 4; i++) {
kernel += x[i] * va_arg(w, double);
}
return kernel;
}
};
}
}
}
In your sketch:
#include "IrisSEFR.h"
#include "IrisTest.h"
void setup() {
Serial.begin(115200);
}
void loop() {
Eloquent::ML::Port::SEFR clf;
Eloquent::ML::Test::IrisTestSet testSet;
testSet.test(clf);
Serial.println(testSet.dump());
delay(5000);
}
You have to clone the Github example to compile the code.
Troubleshooting
It can happen that when running micromlgen.port(clf)
you get a TemplateNotFound
error. To solve the problem, first of all uninstall micromlgen
.
pip uninstall micromlgen
Then head to Github, download the package as zip and extract the micromlgen
folder into your project.
That's all for today, I hope you will try this classifier and find a project it fits in: I'm very impressed by the easiness of implementation yet the accuracy it can achieve on benchmark datasets.
In the next weeks I'm thinking in implementing a multi-class version of this and see how it performs, so stay tuned!