{ "version": "https://jsonfeed.org/version/1.1", "user_comment": "This feed allows you to read the posts from this site in any feed reader that supports the JSON Feed format. To add this feed to your reader, copy the following URL -- https://eloquentarduino.github.io/tag/online-learning/feed/json/ -- and add it your reader.", "home_page_url": "https://eloquentarduino.github.io/tag/online-learning/", "feed_url": "https://eloquentarduino.github.io/tag/online-learning/feed/json/", "language": "en-US", "title": "online-learning – Eloquent Arduino Blog", "description": "Machine learning on Arduino, programming & electronics", "items": [ { "id": "https://eloquentarduino.github.io/?p=1062", "url": "https://eloquentarduino.github.io/2020/04/stochastic-gradient-descent-on-your-microcontroller/", "title": "Stochastic Gradient Descent on your microcontroller", "content_html": "

Stochastic gradient descent is a well know algorithm to train classifiers in an incremental fashion: that is, as training samples become available. This saves you critical memory on tiny devices while still achieving top performance! Now you can use it on your microcontroller with ease.

\n

\n

A brief recap on Stochastic Gradient Descent

\n

If you ever worked with Machine learning, you surely know about Gradient descent: it is an iterative algorithm to optimize a loss function.

\n

It is much general-purpose, in the sense that it is not bound to a particular application, but it has been heavily used in Neural networks in the recent years.

\n

Yet, it can be used as a classifier on its own if you set its loss function as the classification error.

\n

\"Update

\n

This is the core update rule of Gradient descent: quite simple.

\n

As you see, there's a summation in the formula: this means we need to cycle through the entire training set to compute the update to the weights.

\n

In case of large datasets, this can be slow or not possible at all.

\n

And requires a lot of memory.

\n

And we don't have memory on microcontrollers.

\n

So we need Stochastic gradient descent.

\n

Stochastic gradient descent has the same exact update rule, but it is applied on the single training sample.

\n

Imagine the summation goes from 1 to 1, instead of m.

\n

That's it.

\n
\n

How to use

\n

The pattern of use is similar to that of the Passive Aggressive classifier: you have the fitOne and predict methods.

\n

First of all, download the library from Github.

\n
#include <EloquentSGD.h>\n#include <EloquentAccuracyScorer.h>\n#include "iris.h"\n\n#define VERBOSE\n\nusing namespace Eloquent::ML;\n\nvoid setup() {\n    Serial.begin(115200);\n    delay(3000);\n}\n\nvoid loop() {\n    int trainSamples;\n    int retrainingCycles;\n    SGD<FEATURES_DIM> clf;\n    AccuracyScorer scorer;\n\n    // ....\n\n    // train\n    for (uint16_t cycle = 0; cycle < retrainingCycles; cycle++)\n        for (uint16_t i = 0; i < trainSamples; i++)\n            clf.fitOne(X[i], y[i]);\n\n    // predict\n    for (uint16_t i = trainSamples; i < DATASET_SIZE; i++) {\n        int predicted = clf.predict(X[i]);\n        int actual = y[i];\n\n        scorer.scoreOne(actual, predicted);\n    }\n\n    Serial.print("Accuracy: ");\n    Serial.print(round(100 * scorer.accuracy()));\n    Serial.print("% out of ");\n    Serial.print(scorer.support());\n    Serial.println(" predictions");\n}
\n

In this case we're working with known datasets, so we cycle through them for the training, but if you're learning "on-line", from samples generated over time, it will work exactly the same.

\n

A bit of momentum

\n

Stochastic gradient descent works quite well out of the box in most cases.

\n

Sometimes, however, its updates can start "oscillating".

\n

\"SGD

\n

To solve this problem, it has been proposed the momentum technique, which can both speed up learning and increase the accuracy.

\n

In my personal tests, I was able to achieve up to +5% in accuracy on the majority of datasets.

\n

To use it, you only need to set a decay factor between 0 and 1.

\n
SGD clf;\n\nclf.momentum(0.5);
\n

Run on your own

\n

On Github you can find the full example with some benchmark datasets to try on your own.

\n\r\n
\r\n
\r\n
\r\n\t

Finding this content useful?

\r\n
\r\n\t\r\n
\r\n\t
\r\n\t\t
\r\n\t\t
\r\n\t
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n\r\n\n

The example is interactive an will ask you how many samples to use for the training and how many times to cycle through them.

\n

This is something you should consider: if you have a training set and can store it somehow (in memory or on Flash for example), re-presenting the same samples to the SGD classifier could (and probably will) increase its performance if done correctly.

\n

This happens because the algorithm needs some time to converge and if it doesn't receive enough samples it won't learn properly.

\n

Of course, if you re-use the same samples over and over again, you're likely to overfit.

\n

L'articolo Stochastic Gradient Descent on your microcontroller proviene da Eloquent Arduino Blog.

\n", "content_text": "Stochastic gradient descent is a well know algorithm to train classifiers in an incremental fashion: that is, as training samples become available. This saves you critical memory on tiny devices while still achieving top performance! Now you can use it on your microcontroller with ease.\n\nA brief recap on Stochastic Gradient Descent\nIf you ever worked with Machine learning, you surely know about Gradient descent: it is an iterative algorithm to optimize a loss function. \nIt is much general-purpose, in the sense that it is not bound to a particular application, but it has been heavily used in Neural networks in the recent years.\nYet, it can be used as a classifier on its own if you set its loss function as the classification error.\n\nThis is the core update rule of Gradient descent: quite simple.\nAs you see, there's a summation in the formula: this means we need to cycle through the entire training set to compute the update to the weights.\nIn case of large datasets, this can be slow or not possible at all.\nAnd requires a lot of memory.\nAnd we don't have memory on microcontrollers.\nSo we need Stochastic gradient descent.\nStochastic gradient descent has the same exact update rule, but it is applied on the single training sample.\nImagine the summation goes from 1 to 1, instead of m.\nThat's it.\n\nHow to use\nThe pattern of use is similar to that of the Passive Aggressive classifier: you have the fitOne and predict methods.\nFirst of all, download the library from Github.\n#include <EloquentSGD.h>\n#include <EloquentAccuracyScorer.h>\n#include "iris.h"\n\n#define VERBOSE\n\nusing namespace Eloquent::ML;\n\nvoid setup() {\n Serial.begin(115200);\n delay(3000);\n}\n\nvoid loop() {\n int trainSamples;\n int retrainingCycles;\n SGD<FEATURES_DIM> clf;\n AccuracyScorer scorer;\n\n // ....\n\n // train\n for (uint16_t cycle = 0; cycle < retrainingCycles; cycle++)\n for (uint16_t i = 0; i < trainSamples; i++)\n clf.fitOne(X[i], y[i]);\n\n // predict\n for (uint16_t i = trainSamples; i < DATASET_SIZE; i++) {\n int predicted = clf.predict(X[i]);\n int actual = y[i];\n\n scorer.scoreOne(actual, predicted);\n }\n\n Serial.print("Accuracy: ");\n Serial.print(round(100 * scorer.accuracy()));\n Serial.print("% out of ");\n Serial.print(scorer.support());\n Serial.println(" predictions");\n}\nIn this case we're working with known datasets, so we cycle through them for the training, but if you're learning "on-line", from samples generated over time, it will work exactly the same.\nA bit of momentum\nStochastic gradient descent works quite well out of the box in most cases.\nSometimes, however, its updates can start "oscillating".\n\nTo solve this problem, it has been proposed the momentum technique, which can both speed up learning and increase the accuracy.\nIn my personal tests, I was able to achieve up to +5% in accuracy on the majority of datasets.\nTo use it, you only need to set a decay factor between 0 and 1.\nSGD clf;\n\nclf.momentum(0.5);\nRun on your own\nOn Github you can find the full example with some benchmark datasets to try on your own.\n\r\n\r\n\r\n \r\n\tFinding this content useful?\r\n\r\n\t\r\n\r\n\t\r\n\t\t\r\n\t\t\r\n\t \r\n \r\n \r\n \r\n\r\n\r\n\r\n\nThe example is interactive an will ask you how many samples to use for the training and how many times to cycle through them.\nThis is something you should consider: if you have a training set and can store it somehow (in memory or on Flash for example), re-presenting the same samples to the SGD classifier could (and probably will) increase its performance if done correctly.\nThis happens because the algorithm needs some time to converge and if it doesn't receive enough samples it won't learn properly.\nOf course, if you re-use the same samples over and over again, you're likely to overfit.\nL'articolo Stochastic Gradient Descent on your microcontroller proviene da Eloquent Arduino Blog.", "date_published": "2020-04-10T19:43:45+02:00", "date_modified": "2020-04-12T19:31:52+02:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "microml", "online-learning", "Arduino Machine learning" ] }, { "id": "https://eloquentarduino.github.io/?p=1050", "url": "https://eloquentarduino.github.io/2020/04/passive-aggressive-classifier-for-embedded-devices/", "title": "Passive-aggressive classifier for embedded devices", "content_html": "

When working with memory constrained devices you may not able to keep all the training data in memory: passive-aggressive classifiers may help solve your memory problems.

\n

\n

Batch learning

\n

A couple weeks ago I started exploring the possibility to train a machine learning classifier directly on a microcontroller. Since I like SVM, I ported the simplified SVM SMO (Sequential Minimal Optimization) algorithm to plain C, ready to be deployed to embedded devices.

\n

Now, that kind of algorithm works in the so-called "batch-mode": it needs all the training data to be available in memory to learn.

\n

This may be a limiting factor on resource-constrained devices, since it poses an upper bound to the number of samples you can train on. And when working with high-dimensional datasets, the number of samples could be not enough to achieve good accuracy.

\n

Enter incremental learning

\n

To solve this limitation, you need a totally different kind of learning algorithms: you need incremental (a.k.a online a.k.a out of core) learning.

\n

Incremental learning works by inspecting one training sample at a time, instead of all at once.

\n

The clear advantage is that you have a tiny memory footprint. And this is a huge advantage.

\n

The clear disadvantage is that you don't have the "big picture" of your data, so:

\n\n

Passive-aggressive classifier

\n

Passive-aggressive classification is one of the available incremental learning algorithms and it is very simple to implement, since it has a closed-form update rule.

\n

Please refer to this short explanation on Passive-aggressive classifiers for a nice description with images.

\n

The core concept is that the classifier adjusts it weight vector for each mis-classified training sample it receives, trying to get it correct.

\n

\"Passive

\n

Benchmarks

\n

I run a couple benchmark on my Esp32 to assess both accuracy and training time.

\n

First of all: it is fast!. When I say it is fast I mean it takes ~1ms to train on 400 samples x 30 features each.

\n

Talking about accuracy instead... Uhm...

\n

Accuracy vary. Greatly.

\n

You can achieve 100% on some datasets.

\n

And 40% on others. But on those same datasets you can achieve >85% if training on a different number of samples. Or in a different order.

\n

I guess this is the tradeoff for such a simple and space-efficient algorithm.

\n

I report my results in the following table. It is not meant to be an exhaustive benchmark of the classifier, since those number will vary based on the order of presentation, but still you can get an idea of what it is able to achieve.

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Dataset sizeTrain samplesAccuracy
BREAST CANCER
567 samples2062
30 features4037
6063
10039
15038
20064
25061
30069
35073
40085
IRIS
100 samples1050
4 features2051
40100
60100
80100
DIGITS
358 samples2098
64 features4098
6099
100100
150100
20099
25098
30095
CLEVELAND HEART DISEASE
212 samples2076
13 features4024
6077
10019
12082
14078
18088
\n

Time to code

\n

Here I'll report an extract of the example code you can find on Github for this classifier.

\n
#include "EloquentPassiveAggressiveClassifier.h"\n#include "EloquentAccuracyScorer.h"\n#include "iris.h"\n\nusing namespace Eloquent::ML;\n\nvoid loop() {\n    int trainSamples;\n    PassiveAggressiveClassifier<FEATURES_DIM> clf;\n    AccuracyScorer scorer;\n\n    trainSamples = readSerialNumber("How many samples will you use as training?", DATASET_SIZE - 2);\n\n    if (trainSamples == 0)\n        return;\n\n    clf.setC(1);\n\n    // train\n    for (uint16_t i = 0; i < trainSamples; i++)\n        clf.fitOne(X[i], y[i]);\n\n    // predict\n    for (uint16_t i = trainSamples; i < DATASET_SIZE; i++) {\n        int predicted = clf.predict(X[i]);\n        int actual = y[i] > 0 ? 1 : -1;\n\n        scorer.scoreOne(actual, predicted);\n    }\n\n    Serial.print("Accuracy: ");\n    Serial.print(round(100 * scorer.accuracy()));\n    Serial.print("% out of ");\n    Serial.print(scorer.support());\n    Serial.println(" predictions");\n}
\n
\n

On the project page you will find the code to reproduce these numbers.

\n
\n

L'articolo Passive-aggressive classifier for embedded devices proviene da Eloquent Arduino Blog.

\n", "content_text": "When working with memory constrained devices you may not able to keep all the training data in memory: passive-aggressive classifiers may help solve your memory problems.\n\nBatch learning\nA couple weeks ago I started exploring the possibility to train a machine learning classifier directly on a microcontroller. Since I like SVM, I ported the simplified SVM SMO (Sequential Minimal Optimization) algorithm to plain C, ready to be deployed to embedded devices.\nNow, that kind of algorithm works in the so-called "batch-mode": it needs all the training data to be available in memory to learn.\nThis may be a limiting factor on resource-constrained devices, since it poses an upper bound to the number of samples you can train on. And when working with high-dimensional datasets, the number of samples could be not enough to achieve good accuracy.\nEnter incremental learning\nTo solve this limitation, you need a totally different kind of learning algorithms: you need incremental (a.k.a online a.k.a out of core) learning.\nIncremental learning works by inspecting one training sample at a time, instead of all at once.\nThe clear advantage is that you have a tiny memory footprint. And this is a huge advantage.\nThe clear disadvantage is that you don't have the "big picture" of your data, so:\n\nthe end result will probably be affected by the order of presentation of the samples\nyou may not be able to achieve top accuracy\n\nPassive-aggressive classifier\nPassive-aggressive classification is one of the available incremental learning algorithms and it is very simple to implement, since it has a closed-form update rule.\nPlease refer to this short explanation on Passive-aggressive classifiers for a nice description with images.\nThe core concept is that the classifier adjusts it weight vector for each mis-classified training sample it receives, trying to get it correct.\n\nBenchmarks\nI run a couple benchmark on my Esp32 to assess both accuracy and training time.\nFirst of all: it is fast!. When I say it is fast I mean it takes ~1ms to train on 400 samples x 30 features each.\nTalking about accuracy instead... Uhm...\nAccuracy vary. Greatly. \nYou can achieve 100% on some datasets. \nAnd 40% on others. But on those same datasets you can achieve >85% if training on a different number of samples. Or in a different order.\nI guess this is the tradeoff for such a simple and space-efficient algorithm.\nI report my results in the following table. It is not meant to be an exhaustive benchmark of the classifier, since those number will vary based on the order of presentation, but still you can get an idea of what it is able to achieve.\n\n\n\nDataset size\nTrain samples\nAccuracy\n\n\n\n\nBREAST CANCER\n\n\n\n\n567 samples\n20\n62\n\n\n30 features\n40\n37\n\n\n\n60\n63\n\n\n\n100\n39\n\n\n\n150\n38\n\n\n\n200\n64\n\n\n\n250\n61\n\n\n\n300\n69\n\n\n\n350\n73\n\n\n\n400\n85\n\n\nIRIS\n\n\n\n\n100 samples\n10\n50\n\n\n4 features\n20\n51\n\n\n\n40\n100\n\n\n\n60\n100\n\n\n\n80\n100\n\n\nDIGITS\n\n\n\n\n358 samples\n20\n98\n\n\n64 features\n40\n98\n\n\n\n60\n99\n\n\n\n100\n100\n\n\n\n150\n100\n\n\n\n200\n99\n\n\n\n250\n98\n\n\n\n300\n95\n\n\nCLEVELAND HEART DISEASE\n\n\n\n\n212 samples\n20\n76\n\n\n13 features\n40\n24\n\n\n\n60\n77\n\n\n\n100\n19\n\n\n\n120\n82\n\n\n\n140\n78\n\n\n\n180\n88\n\n\n\nTime to code\nHere I'll report an extract of the example code you can find on Github for this classifier.\n#include "EloquentPassiveAggressiveClassifier.h"\n#include "EloquentAccuracyScorer.h"\n#include "iris.h"\n\nusing namespace Eloquent::ML;\n\nvoid loop() {\n int trainSamples;\n PassiveAggressiveClassifier<FEATURES_DIM> clf;\n AccuracyScorer scorer;\n\n trainSamples = readSerialNumber("How many samples will you use as training?", DATASET_SIZE - 2);\n\n if (trainSamples == 0)\n return;\n\n clf.setC(1);\n\n // train\n for (uint16_t i = 0; i < trainSamples; i++)\n clf.fitOne(X[i], y[i]);\n\n // predict\n for (uint16_t i = trainSamples; i < DATASET_SIZE; i++) {\n int predicted = clf.predict(X[i]);\n int actual = y[i] > 0 ? 1 : -1;\n\n scorer.scoreOne(actual, predicted);\n }\n\n Serial.print("Accuracy: ");\n Serial.print(round(100 * scorer.accuracy()));\n Serial.print("% out of ");\n Serial.print(scorer.support());\n Serial.println(" predictions");\n}\n\nOn the project page you will find the code to reproduce these numbers.\n\nL'articolo Passive-aggressive classifier for embedded devices proviene da Eloquent Arduino Blog.", "date_published": "2020-04-05T19:04:10+02:00", "date_modified": "2020-05-01T10:34:15+02:00", "authors": [ { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" } ], "author": { "name": "simone", "url": "https://eloquentarduino.github.io/author/simone/", "avatar": "http://1.gravatar.com/avatar/d670eb91ca3b1135f213ffad83cb8de4?s=512&d=mm&r=g" }, "tags": [ "microml", "online-learning", "Arduino Machine learning" ] } ] }