As of now, we know it is possible to run Machine learning inference on tiny microcontrollers thanks to Tensorflow for Micro and my very own library MicroML. What if you could train a classifier directly on the microcontroller, too?

Onboard IRIS dataset training time

When I first started this journey in the world of Machine learning on microcontrollers, one fact was set in stone for me: you train your classifier once and for all on a PC, then deploy it to your microcontroller.

As simple as this.

Training is a heavy process, requires lots of computations and memory. You just want a machine as powerful as possible to carry out this task as fast as possible.

Moreover, it is a one-time task: once your classifier has been trained, it needs not to be updated anymore.

And this yield true until now.

Until my reader Joao Carvalho, in the comments on the post about an alternative to SVM which produces much smaller models that I strongly invite you to read, challenged me with this idea of running the SVM training directly on the microcontroller.

In the past I replied "No way" to people asking about this topic on forums, but Joao was so kind to link me a Javascript implementation of the simplified SVM SMO (Sequential Minimal Optimization) algorithm.

At a first glance it looked quite easy to port from Javascript to C, so I gave it a try.

And in fact it only took me 30 minutes to get it working on my PC.

Then I deployed it to my ESP32 and... it worked!

My first try was with 10 samples from the the IRIS dataset: it only took almost no time to train.

But I know SVM training and inferencing time grows rapidly with the number of training samples.

Execution time will be the most limiting factor for this kind of task, so I created a benchmarking setup to evaluate the performance of the algorithm on different dataset sizes and features dimensions.
The results are summarized in the following table and plots.

Features sizeTraining sizeTraining time (ms)Unit inference time (ms)Accuracy
41000,01180
42040,01385
430110,01490
440380,01791
450470,02086
460800,02587
3010220,01471
302010000,01761
303035000,02066
3040200000,02585
3050328000,03383
3060714000,05080

* all benchmark are obtained on an ESP32 board
** the inference took actually sometimes less than 1ms to run for all the test samples, so it was rounded to 1ms and divided by the number of samples

Onboard IRIS dataset training time. Features dim = 4

Onboard Breast cancer dataset training time. Features dim = 30

We can see from the table that for the Iris dataset, which has 4-dimensional features, the training process is quite fast: 80ms to train on 60 samples.

Things become much different when training on the Breast cancer dataset, with its 30 features per sample. Now we're talking abouts seconds to train and even minutes when increasing the number of samples to 60.

Fortunately the inference time stays almost flat, so you will have real-time predictions.

I run the Iris benchmark on a Seeedstudio Xiao M0 board (32bit 48MHz processor), too. It took 7s to train on 60 samples vs 80ms it took on the ESP32. It is clear some boards are better than others for this task

Want to learn more?

Downsides

Of course there're downsides: it's not a perfect world.

Convergence

As the paper reported: "there is one thing to note, the algorithm (the simplified version) is not guaranteed to converge".

I actually don't know, in practice, what this means. But it sounds like a bad thing.

Binary classification only

As of now this algorithm can only do binary classification.

I hope to implement multi-class classification in the future with the one-vs-all approach, but I don't really know if it would be too inefficient for a microcontroller to run.

Declining accuracy

If you look at the benchmark table above, also, you'll notice the accuracy does not always increase linearly with the training samples size. It seems it reaches an optimum and then starts decreasing.

If you're going to deploy your device in an autonomous scenario, you'll need to monitor your accucary every time you re-train it, or your results are going to go poor.

You should keep track of the optimum you achieved and roll-back to its training set when you register a declining accuracy.

Memory

You will need to keep all your training set in memory for the classifier to both learn and predict. This means RAM will be a limiting factor and we know RAM is an expensive resource on microcontrollers.

You will have to find a good enough compromise between the number of features, the number of samples and the accuracy.

Time to get your hands on

I created a sample project for you to train a color classifier with a super simple setup: you will only need a TCS3200 (color sensor) to follow along.

Dont' have a TCS3200? No problem, you can train a classifier on the IRIS dataset

If you've read so far, please consider letting me know in the comments some useful applications you can think about this new tool. It's a brand new topic for me and I'll appreciate any of your suggestion.


Check the project repo on Github

Help the blow grow